HTML5 Accessibility Chops: data for the masses

Posted on Tuesday, 10 April 2012 by Steve Faulkner

HTML5One of the stumbling blocks for working out what the effects on the accessibility of new (and old) HTML5 features is not having any publicly accessible usage data. It is difficult without data to argue for the inclusion of features in HTML5 or working out how features should be accessibility supported. I have made an initial attempt to rectify this by collecting the HTML content of the home pages of the the top 10,000 web sites.I spent most of the Easter long weekend collecting the HTML pages. The original source for the “top 10,000” sites URLs was from this URL list I found on paste bin. I used HTTrack website copier to capture the HTML files. The initial pass was somewhat effected by redirects, so I went through the error log and collected a second list of URLs from the captured pages that had resulted in “page has moved” files. The resulting 8915 HTML pages are a result of the 2 sets of URLs. The HTML content (including URL lists) is provided as a zip file:

Top 10000 HTML files zip file – 121 MB (Please only download if you are going to make use of the data)

hgroup element usage

I have only just started to analyse the data. The first analysis is of the new HTML5 hgroup element and this is as yet only a simple gathering of instances of its use. No attempt has been made as yet, for example, to analyse what percentage of its use conforms to HTML5 author conformance requirements.

Of the top 8915 HTML pages, 79 (0.89%) were found to include use of the HTML5 hgroup element. A total of 418 instances of the hgroup were found within the 79 pages.

Instances of hgroup element use in top 10000 web sites – home pages

Inclusion of hgroup in HTML5

Note: I am a proponent of the removal and/or replacement of hgroup in HTML5, there are currently 5 change proposals being reviewed by the W3C HTML working group chairs on this subject:

  1. Change Proposal: replace hgroup with the subline element
  2. Change Proposal: no-change hgroup
  3. Change Proposal: replace hgroup with a simple element
  4. Change Proposal: remove hgroup add an outlineMask attribute
  5. Change Proposal: Replace <hgroup> with an element that has a simple content model and backwards compatibility.

About Steve Faulkner

Steve is the Technical Director at TPG. He joined The Paciello Group in 2006 and was previously a Senior Web Accessibility Consultant at vision australia. He is the creator and lead developer of the Web Accessibility Toolbar accessibility testing tool. Steve is a member of several groups, including the W3C Web Platforms Working Group and the W3C ARIA Working Group. He is an editor of several specifications at the W3C including HTML 5.1, ARIA in HTML, Notes on Using ARIA in HTML and HTML5: Techniques for providing useful text alternatives. He also develops and maintains HTML5accessibility.


  1. Hi Steve,

    It would be interesting to know how many of sites actually use (or try to use) HTML5. For example, how many use a non-HTML4/XHTML doctype.

    You might find that the 0.89% might be 10% of sites using HTML5, or 1%, or 50%…

    It would help put the figures in perspective.

  2. Hi Alastair, I am crunching the data at the moment, and will provide more details soon. I have looked at how many use the HTML5 doctype and found that approx 17% of the sample pages use it.

  3. Hi John, thanks for the heads up, the CSS data will be useful, for instance I want to look at the use of outline:none.

  4. Sounds cool and interesting work you have started here. I have a little “hmmm”, because we all do the same thing when we try to do surveys on the Web. We often try only the Home page of Web sites. Which I guess might create a bias, I wonder if we should add at least for each of these sites a secondary page. The issue then being which one. 🙂

Comments for this post are closed.