Skip to content

running the w3c validator¤

TLDR: the python package isn't the uptodate to validator and we use node.

i use https://validator.w3.org/ a lot in my accessibility testing. it always asks if i am about robot now. i thought it would be better to have a local version i could use. this document is how i got it working after significant trouble shooting.

installation¤

installing the w3c nu validator https://github.com/validator/validator/ is a good candidate for conda because we'll use python, java, and node. i tried using python's html5validator, but the version of the checker that it bundles is old. the best solution for me was to use node cause i can figure it out.

use a conda environment.yml with at least:

channels:
  - conda-forge
dependencies:
  - python=3.11
  - openjdk
  - nodejs

install the jar distributed on npm

npm install -g vnu-jar

running the validator¤

get the path to the jar from the node binary

    import itertools, operator, functools, collections, exceptiongroup, re
    import pathlib, pandas, json, subprocess, shlex
    VNU_JAR = pathlib.Path(subprocess.check_output(
        shlex.split(
            "npm root vnu-jar"
        )
    ).strip().decode()) / "vnu-jar/build/dist/vnu.jar"
    assert VNU_JAR.exists()

validate_html runs the checker and returns the serialized payload.

    def validate_html(*files: pathlib.Path) -> dict:
        return json.loads(subprocess.check_output(
            shlex.split(
                F"java -jar {VNU_JAR} --stdout --format json --exit-zero-always"
            ) + list(files)
        ).decode())

explore the data as a pandas.Dataframe¤

    HTML = pathlib.Path("../../../notebooks-for-all/tests/exports/html/lorenz-executed.html")
    df = pandas.DataFrame(pandas.Series(validate_html(HTML)).messages)
    del df["url"]
    df
type lastLine lastColumn firstColumn subType message extract hiliteStart hiliteLength firstLine
0 info 81 27 4 warning Section lacks heading. Consider using “h2”-“h6... ader">\n <section id="skip-link">\n < 10 24 NaN
1 error 295 126 11 NaN The “role” attribute must not be used on a “tr... </tr>\n <tr aria-labelledby="nb-cell-... 10 127 294.0
2 error 296 40 127 NaN The “role” attribute must not be used on a “td... listitem">\n <td class="nb-anchor" role="... 10 41 295.0
3 error 301 49 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-execution_coun... 10 50 300.0
4 error 306 43 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-cell_type" rol... 10 44 305.0
... ... ... ... ... ... ... ... ... ... ...
173 error 1869 54 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-end" id="cell-... 10 55 1868.0
174 error 1874 40 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-source" role="... 10 41 1873.0
175 error 1897 42 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-metadata" role... 10 43 1896.0
176 error 1912 54 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-loc" id="cell-... 10 55 1911.0
177 error 1915 41 12 NaN The “role” attribute must not be used on a “td... </td>\n <td class="nb-outputs" role=... 10 42 1914.0

178 rows × 10 columns

throwing exceptions¤

we need to collect these results and raise exceptions.

we need to organize the results into something that can be reported.

    results = validate_html(HTML)

group the results the nu error messages and the severity.

    def organize_validator_results(results):
        collect = collections.defaultdict(functools.partial(collections.defaultdict, list))
        for (error, msg), group in itertools.groupby(results["messages"], key=operator.itemgetter("type", "message")):
            for item in group:
                collect[error][msg].append(item)
        return collect

the page we are testing overrides table roles where the validator throws errors. this is a known issue so we already have to ignore some results.

    EXCLUDE = re.compile(
        """or with a “role” attribute whose value is “table”, “grid”, or “treegrid”.$"""
        # https://github.com/validator/validator/issues/1125
    )
    def raise_if_errors(results, exclude=EXCLUDE):
        collect = organize_validator_results(results)
        exceptions = []
        for msg in collect["error"]:
            if not exclude or not exclude.search(msg):
                exceptions.append(exceptiongroup.ExceptionGroup(msg, [Exception(x["extract"]) for x in collect["error"][msg]]))
        if exceptions:
             raise exceptiongroup.ExceptionGroup("nu validator errors", exceptions)

since, i've been hand validating, my page doesn't raise any errors except for the excluded ones. i'm really proud of that.

    raise_if_errors(results)

if we include all the validator errors then we raise an exception group

    raise_if_errors(results, None)
  + Exception Group Traceback (most recent call last):
  |   File "/home/tbone/mambaforge/envs/test-nbconvert-a11y/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3548, in run_code
  |     exec(code_obj, self.user_global_ns, self.user_ns)
  |   File "/tmp/ipykernel_394727/1257168619.py", line 1, in <module>
  |     raise_if_errors(results, None)
  |   File "/tmp/ipykernel_394727/522998515.py", line 8, in raise_if_errors
  |     raise exceptiongroup.ExceptionGroup("nu validator errors", exceptions)
  | ExceptionGroup: nu validator errors (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | ExceptionGroup: The “role” attribute must not be used on a “tr” element which has a “table” ancestor with no “role” attribute, or with a “role” attribute whose value is “table”, “grid”, or “treegrid”. (16 sub-exceptions)
    +-+---------------- 1 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 1 cell-1-cell_type" class="cell markdown" data-index="1" data-loc="1" role="listitem">
      |      
      +---------------- 2 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 2 cell-2-cell_type" class="cell markdown" data-index="2" data-loc="1" role="listitem">
      |      
      +---------------- 3 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 3 cell-3-cell_type" class="cell code" data-index="3" data-loc="2" role="listitem">
      |      
      +---------------- 4 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 4 cell-4-cell_type" class="cell markdown" data-index="4" data-loc="11" role="listitem">
      |      
      +---------------- 5 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 5 cell-5-cell_type" class="cell code" data-index="5" data-loc="3" role="listitem">
      |      
      +---------------- 6 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 6 cell-6-cell_type" class="cell markdown" data-index="6" data-loc="1" role="listitem">
      |      
      +---------------- 7 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 7 cell-7-cell_type" class="cell markdown" data-index="7" data-loc="1" role="listitem">
      |      
      +---------------- 8 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 8 cell-8-cell_type" class="cell code" data-index="8" data-loc="1" role="listitem">
      |      
      +---------------- 9 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 9 cell-9-cell_type" class="cell code" data-index="9" data-loc="1" role="listitem">
      |      
      +---------------- 10 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 10 cell-10-cell_type" class="cell markdown" data-index="10" data-loc="1" role="listitem">
      |      
      +---------------- 11 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 11 cell-11-cell_type" class="cell code" data-index="11" data-loc="1" role="listitem">
      |      
      +---------------- 12 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 12 cell-12-cell_type" class="cell code" data-index="12" data-loc="1" role="listitem">
      |      
      +---------------- 13 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 13 cell-13-cell_type" class="cell markdown" data-index="13" data-loc="1" role="listitem">
      |      
      +---------------- 14 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 14 cell-14-cell_type" class="cell code" data-index="14" data-loc="1" role="listitem">
      |      
      +---------------- 15 ----------------
      | Exception:      </tr>
      |      <tr aria-labelledby="nb-cell-label 15 cell-15-cell_type" class="cell code" data-index="15" data-loc="2" role="listitem">
      |      
      +---------------- ... ----------------
      | and 1 more exception
      +------------------------------------
    +---------------- 2 ----------------
    | ExceptionGroup: The “role” attribute must not be used on a “td” element which has a “table” ancestor with no “role” attribute, or with a “role” attribute whose value is “table”, “grid”, or “treegrid”. (160 sub-exceptions)
    +-+---------------- 1 ----------------
      | Exception: listitem">
      |       <td class="nb-anchor" role="none">
      |      
      +---------------- 2 ----------------
      | Exception:      </td>
      |       <td class="nb-execution_count" role="none">
      |      
      +---------------- 3 ----------------
      | Exception:      </td>
      |       <td class="nb-cell_type" role="none">
      |      
      +---------------- 4 ----------------
      | Exception:      </td>
      |       <td class="nb-toolbar" role="none">
      |      
      +---------------- 5 ----------------
      | Exception:      </td>
      |       <td class="nb-start" id="cell-1-start" role="none">
      |      
      +---------------- 6 ----------------
      | Exception:      </td>
      |       <td class="nb-end" id="cell-1-end" role="none">
      |      
      +---------------- 7 ----------------
      | Exception:      </td>
      |       <td class="nb-source" role="none">
      |  
      +---------------- 8 ----------------
      | Exception:      </td>
      |       <td class="nb-metadata" role="none">
      |      
      +---------------- 9 ----------------
      | Exception:      </td>
      |       <td class="nb-loc" id="cell-1-loc" role="none">
      |      
      +---------------- 10 ----------------
      | Exception:      </td>
      |       <td class="nb-outputs" role="none">
      |      
      +---------------- 11 ----------------
      | Exception: listitem">
      |       <td class="nb-anchor" role="none">
      |      
      +---------------- 12 ----------------
      | Exception:      </td>
      |       <td class="nb-execution_count" role="none">
      |      
      +---------------- 13 ----------------
      | Exception:      </td>
      |       <td class="nb-cell_type" role="none">
      |      
      +---------------- 14 ----------------
      | Exception:      </td>
      |       <td class="nb-toolbar" role="none">
      |      
      +---------------- 15 ----------------
      | Exception:      </td>
      |       <td class="nb-start" id="cell-2-start" role="none">
      |      
      +---------------- ... ----------------
      | and 145 more exceptions
      +------------------------------------