running the w3c validator
TLDR: the python package isn't the uptodate to validator and we use node.
i use https://validator.w3.org/ a lot in my accessibility testing. it always asks if i am about robot now. i thought it would be better to have a local version i could use. this document is how i got it working after significant trouble shooting.
installation
installing the w3c nu validator https://github.com/validator/validator/ is a good candidate for conda
because we'll
use python, java, and node. i tried using python's html5validator
, but the
version of the checker that it bundles is old. the best solution for me was to use
node
cause i can figure it out.
use a conda environment.yml
with at least:
channels:
- conda-forge
dependencies:
- python=3.11
- openjdk
- nodejs
install the jar distributed on npm
npm install -g vnu-jar
running the validator
get the path to the jar from the node binary
import itertools , operator , functools , collections , exceptiongroup , re
import pathlib , pandas , json , subprocess , shlex
VNU_JAR = pathlib . Path ( subprocess . check_output (
shlex . split (
"npm root vnu-jar"
)
) . strip () . decode ()) / "vnu-jar/build/dist/vnu.jar"
assert VNU_JAR . exists ()
validate_html
runs the checker and returns the serialized payload.
def validate_html ( * files : pathlib . Path ) -& gt ; dict :
return json . loads ( subprocess . check_output (
shlex . split (
F "java -jar { VNU_JAR } --stdout --format json --exit-zero-always"
) + list ( files )
) . decode ())
explore the data as a pandas.Dataframe
HTML = pathlib . Path ( "../../../notebooks-for-all/tests/exports/html/lorenz-executed.html" )
df = pandas . DataFrame ( pandas . Series ( validate_html ( HTML )) . messages )
del df [ "url" ]
df
type
lastLine
lastColumn
firstColumn
subType
message
extract
hiliteStart
hiliteLength
firstLine
0
info
81
27
4
warning
Section lacks heading. Consider using “h2”-“h6...
ader">\n <section id="skip-link">\n <
10
24
NaN
1
error
295
126
11
NaN
The “role” attribute must not be used on a “tr...
</tr>\n <tr aria-labelledby="nb-cell-...
10
127
294.0
2
error
296
40
127
NaN
The “role” attribute must not be used on a “td...
listitem">\n <td class="nb-anchor" role="...
10
41
295.0
3
error
301
49
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-execution_coun...
10
50
300.0
4
error
306
43
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-cell_type" rol...
10
44
305.0
...
...
...
...
...
...
...
...
...
...
...
173
error
1869
54
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-end" id="cell-...
10
55
1868.0
174
error
1874
40
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-source" role="...
10
41
1873.0
175
error
1897
42
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-metadata" role...
10
43
1896.0
176
error
1912
54
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-loc" id="cell-...
10
55
1911.0
177
error
1915
41
12
NaN
The “role” attribute must not be used on a “td...
</td>\n <td class="nb-outputs" role=...
10
42
1914.0
178 rows × 10 columns
throwing exceptions
we need to collect these results and raise exceptions.
we need to organize the results
into something that can be reported.
results = validate_html ( HTML )
group the results
the nu error messages and the severity.
def organize_validator_results ( results ):
collect = collections . defaultdict ( functools . partial ( collections . defaultdict , list ))
for ( error , msg ), group in itertools . groupby ( results [ "messages" ], key = operator . itemgetter ( "type" , "message" )):
for item in group :
collect [ error ][ msg ] . append ( item )
return collect
the page we are testing overrides table
roles where the validator throws errors. this is a known issue so we already have to ignore some results.
EXCLUDE = re . compile (
"""or with a “role” attribute whose value is “table”, “grid”, or “treegrid”.$"""
# https://github.com/validator/validator/issues/1125
)
def raise_if_errors ( results , exclude = EXCLUDE ):
collect = organize_validator_results ( results )
exceptions = []
for msg in collect [ "error" ]:
if not exclude or not exclude . search ( msg ):
exceptions . append ( exceptiongroup . ExceptionGroup ( msg , [ Exception ( x [ "extract" ]) for x in collect [ "error" ][ msg ]]))
if exceptions :
raise exceptiongroup . ExceptionGroup ( "nu validator errors" , exceptions )
since, i've been hand validating, my page doesn't raise any errors except for the excluded ones. i'm really proud of that.
if we include all the validator errors then we raise an exception group
raise_if_errors ( results , None )
+ Exception Group Traceback (most recent call last):
| File "/home/tbone/mambaforge/envs/test-nbconvert-a11y/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3548, in run_code
| exec(code_obj, self.user_global_ns, self.user_ns)
| File "/tmp/ipykernel_394727/1257168619.py", line 1, in <module>
| raise_if_errors(results, None)
| File "/tmp/ipykernel_394727/522998515.py", line 8, in raise_if_errors
| raise exceptiongroup.ExceptionGroup("nu validator errors", exceptions)
| ExceptionGroup: nu validator errors (2 sub-exceptions)
+-+---------------- 1 ----------------
| ExceptionGroup: The “role” attribute must not be used on a “tr” element which has a “table” ancestor with no “role” attribute, or with a “role” attribute whose value is “table”, “grid”, or “treegrid”. (16 sub-exceptions)
+-+---------------- 1 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 1 cell-1-cell_type" class="cell markdown" data-index="1" data-loc="1" role="listitem">
|
+---------------- 2 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 2 cell-2-cell_type" class="cell markdown" data-index="2" data-loc="1" role="listitem">
|
+---------------- 3 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 3 cell-3-cell_type" class="cell code" data-index="3" data-loc="2" role="listitem">
|
+---------------- 4 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 4 cell-4-cell_type" class="cell markdown" data-index="4" data-loc="11" role="listitem">
|
+---------------- 5 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 5 cell-5-cell_type" class="cell code" data-index="5" data-loc="3" role="listitem">
|
+---------------- 6 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 6 cell-6-cell_type" class="cell markdown" data-index="6" data-loc="1" role="listitem">
|
+---------------- 7 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 7 cell-7-cell_type" class="cell markdown" data-index="7" data-loc="1" role="listitem">
|
+---------------- 8 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 8 cell-8-cell_type" class="cell code" data-index="8" data-loc="1" role="listitem">
|
+---------------- 9 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 9 cell-9-cell_type" class="cell code" data-index="9" data-loc="1" role="listitem">
|
+---------------- 10 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 10 cell-10-cell_type" class="cell markdown" data-index="10" data-loc="1" role="listitem">
|
+---------------- 11 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 11 cell-11-cell_type" class="cell code" data-index="11" data-loc="1" role="listitem">
|
+---------------- 12 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 12 cell-12-cell_type" class="cell code" data-index="12" data-loc="1" role="listitem">
|
+---------------- 13 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 13 cell-13-cell_type" class="cell markdown" data-index="13" data-loc="1" role="listitem">
|
+---------------- 14 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 14 cell-14-cell_type" class="cell code" data-index="14" data-loc="1" role="listitem">
|
+---------------- 15 ----------------
| Exception: </tr>
| <tr aria-labelledby="nb-cell-label 15 cell-15-cell_type" class="cell code" data-index="15" data-loc="2" role="listitem">
|
+---------------- ... ----------------
| and 1 more exception
+------------------------------------
+---------------- 2 ----------------
| ExceptionGroup: The “role” attribute must not be used on a “td” element which has a “table” ancestor with no “role” attribute, or with a “role” attribute whose value is “table”, “grid”, or “treegrid”. (160 sub-exceptions)
+-+---------------- 1 ----------------
| Exception: listitem">
| <td class="nb-anchor" role="none">
|
+---------------- 2 ----------------
| Exception: </td>
| <td class="nb-execution_count" role="none">
|
+---------------- 3 ----------------
| Exception: </td>
| <td class="nb-cell_type" role="none">
|
+---------------- 4 ----------------
| Exception: </td>
| <td class="nb-toolbar" role="none">
|
+---------------- 5 ----------------
| Exception: </td>
| <td class="nb-start" id="cell-1-start" role="none">
|
+---------------- 6 ----------------
| Exception: </td>
| <td class="nb-end" id="cell-1-end" role="none">
|
+---------------- 7 ----------------
| Exception: </td>
| <td class="nb-source" role="none">
|
+---------------- 8 ----------------
| Exception: </td>
| <td class="nb-metadata" role="none">
|
+---------------- 9 ----------------
| Exception: </td>
| <td class="nb-loc" id="cell-1-loc" role="none">
|
+---------------- 10 ----------------
| Exception: </td>
| <td class="nb-outputs" role="none">
|
+---------------- 11 ----------------
| Exception: listitem">
| <td class="nb-anchor" role="none">
|
+---------------- 12 ----------------
| Exception: </td>
| <td class="nb-execution_count" role="none">
|
+---------------- 13 ----------------
| Exception: </td>
| <td class="nb-cell_type" role="none">
|
+---------------- 14 ----------------
| Exception: </td>
| <td class="nb-toolbar" role="none">
|
+---------------- 15 ----------------
| Exception: </td>
| <td class="nb-start" id="cell-2-start" role="none">
|
+---------------- ... ----------------
| and 145 more exceptions
+------------------------------------