Calling the W3C Validator API Using Java

Background

The Markup Validator is a free service by W3C that helps check the validity of Web documents.
http://validator.w3.org/about.html

Markup Validator Web Service

Interface applications with the Markup Validator through its experimental API. This is version 0.2, dated May 2007. For a history of the format, see Change Log.
http://validator.w3.org/docs/api.html

To call this service you’ll need to make an HTTP request for http://validator.w3.org/check?output=soap12&uri= with your appended URL. The parameter output tells the API we want a SOAP 1.2 response, which is in XML format. If you remove the parameter, then you will receive an HTML response.

package Tests;

import java.net.URL;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;

public class Blog
{
	private static String url_SiteToTest = "http://www.example.com";

	public static void main(String[] args)
	{
		// Build URL for W3C HTML Markup Validator API
		String siteToTest = "http://validator.w3.org/check?output=soap12&uri=" + url_SiteToTest;

		// Get the response from W3C
		Document doc = GetValidationResultXML(siteToTest);

		if (doc != null)
		{
			// Parse the XML and write the results to the log
			ParseAndLog(doc, "Check HTML Markup");
		}
	}

	private static Document GetValidationResultXML(String siteToTest)
	{
		// Create a Document to hold the result
		Document doc = null;

		try
		{
			// Send request and save the response
			DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
	        DocumentBuilder db = dbf.newDocumentBuilder();
	        doc = db.parse(new URL(siteToTest).openStream());
	        doc.normalize();
		}
		catch (Exception ex)
		{
			System.out.println(ex);
		}

		return doc;
	}

	private static void ParseAndLog(Document doc, String testName)
	{
		// In the interest of brevity I have omitted this code.
	}
}

You’ll notice the code (above) does not include ParseAndLog(). You can parse the XML using org.w3c.dom.