Usage

Help

There is a base tool help for global arguments:

htmlcheck -h

And if you use -h after a command name, you will get its specific help:

htmlcheck version -h

Validate one or many pages

With the command page you can validate one or many pages. Command accept one or many path and each path can be either an URL or a filepath (absolute or relative from your current location):

htmlcheck page ping.html
htmlcheck page http://perdu.com
htmlcheck page ping.html http://perdu.com foo/bar.html

Validate all path from a sitemap

With the command site you can validate every page referenced in a sitemap.xml file. Command accept only one argument for the sitemap path which can be either an URL or a filepath (absolute or relative from your current location).

Note than for a sitemap file, its referenced urls must be absolute or relative to your current location. For a sitemap url, its referenced urls must be an absolute url (with leading http):

htmlcheck site sitemap.xml
htmlcheck site http://perdu.com/sitemap.xml

Warning

Currently Sitemap index format is not supported yet so for sites using it you will have to open their indexes and give basic Sitemap xml URL to this command.

Hint

There is also a support for sitemap in a JSON format which is a very simple:

{
    "name": "Sample",
    "urls": [
        "http://perdu.com/",
        "https://www.google.com/"
    ]
}

As you can see it just needs a name and a simple URL list. Since it is not a standardized format as the Sitemap XML, you can not expect it to be available from most sites however it is a simple way to manage multiple URL ressources in an unique source file.

Manage verbosity

Default commandline verbosity is set to “Info” level, you may set it to “Debug” level to get also some more informations about command line work:

htmlcheck -v 5 site sitemap.xml

Or a totally silent output (beware that not any error will be return to output except commandline critical error):

htmlcheck -v 0 site sitemap.xml

Common options

–destination

Directory path where to write report files. If destination is not given, every files will be printed out. You can use a dot to write files to your current directory, a relative path or an absolute path. Path can start with ~ to point to your user home directory.

–exporter

Select exporter format. Default format is logging, it just printout report messages. There is also a json format to create JSON files for reports. And finally a html format to create HTML files.

–pack/–no-pack

Pack reports into a single file or not. Default is to pack everything in a single file. ‘no-pack’ will create a file for each report and then an export summary. It is recommended to define a destination directory with ‘–destination’ if you don’t plan to use packed export, else every files will just be printed out in an unique output. This option has no effect with logging format.

–safe

Invalid paths won’t break execution of script and it will be able to continue to the end. This is mostly for rare usecase when invalid source encounter a bug from report parsing or from validator.

–split

Execute validation for each path in its own distinct instance. Useful for very large path list which may take too long to display anything until every path has been validated. However, for small or moderate path list it will be longer than packed execution.

–user-agent

A customer user-agent to use for every possible requests.

–Xss

Java thread stack size. Useful in some case where you encounter error ‘StackOverflowError’ from validator. Set it to something like ‘512k’.

Specific formats options

html

–template-dir

A path to a template directory for your custom templates. Your template directory must contains the summary, report and audit main templates and also a main.css file.

Specific ‘site’ options

–sitemap-only

For site command only. This will only get and parse given sitemap path but without validating its items, useful to validate a sitemap before using it for validations.

Alternative

html5validator is a another Python great alternative standing on VNU library, it is best suited to include validation into a CI configuration.

However it does not have some features like HTML report building or Sitemap support.