rendering dataframes for screen readers with `pandas`¤

in the document we'll explore what it takes to make pandas.DataFrames accessible. we'll follow Paul J Adam's instructions for making Simple Data Tables.

    %reload_ext pidgy
    import pandas.io.formats.style, bs4, pytest
    shell.weave.reactive = False
    shell.weave.use_async = False
    soup = lambda x: bs4.BeautifulSoup(x, features="html.parser")

    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))
    ).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")

the basic dataframe `df` we use for explanation in this document.
	A	B	C
index
0	nan	nan	nan
1	nan	nan	nan

(df := pandas.DataFrame(
    columns=pandas.Index(list("ABC")), 
    index=pandas.Index(range(2), name="index"))
).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")

`df` is a basic `pandas.DataFrame` because:    

* it has ONE row level 
* it has ONE column level 

we do not need to consider the contents of the cells for the recommendations we are implementing.

df is a basic pandas.DataFrame because:

it has ONE row level
it has ONE column level

we do not need to consider the contents of the cells for the recommendations we are implementing.

applying best practices to pandas¤

### 1. Title of data table is inside the `<caption>` element.

the caption is dependent on the data and effects the visual appearance of the table. 
it is context dependent up to the author to supply.

1. Title of data table is inside the `<caption>` element.¤

the caption is dependent on the data and effects the visual appearance of the table. it is context dependent up to the author to supply.

    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"
    # with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())

def assert_has_caption(object):
    caption = soup(object).select_one("table caption")
    assert caption and caption.string.strip(), "table is missing a <caption>"
# with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())

the caption can be set using the pandas.DataFrame.style.set_caption method. the result is a pandas.io.formats.style.Styler object that let's us modify how the instance is displayed.

set_caption user the pandas.DataFrame.style attribute to set a caption

    def set_caption(df, caption) -&gt; pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)

after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.

def set_caption(df, caption) -&gt; pandas.io.formats.style.Styler:
    return df.style.set_caption(caption)

after we have a styler we are working with a subset of pandas operations. the styler should be the last stop for the data.

the `<table>` below demonstrates how the `<caption>` appears on a `captioned` `pandas.DataFrame`.

{{captioned}}

    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    );

the <table> below demonstrates how the <caption> appears on a captioned pandas.DataFrame.

a value-less dataframe with columns and row indexes
	A	B	C
index
0	nan	nan	nan
1	nan	nan	nan

assert_has_caption(
    captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
);

`pandas.io.formats.style.Styler`¤

pandas.io.formats.style.Styler gives a different html representation that _repr_html_

assert df.style.to_html() != df.to_html()

2. Column headers are inside `<th scope="col">` elements.¤

the scope property improves navigation for screen readers when used properly.

    def assert_has_col_scope(object, selector="thead tr th"):
for basic frames, all the `<th>` tags in `<thead>` should have `scope="col"`

        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_col_scope(captioned)

def assert_has_col_scope(object, selector="thead tr th"):

for basic frames, all the <th> tags in <thead> should have scope="col"

    assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_col_scope(captioned)

    def set_col_scope(object, selector="thead tr th"):
`set_col_scope` automatically remediates missing columns `scope`s

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)

def set_col_scope(object, selector="thead tr th"):

set_col_scope automatically remediates missing columns scopes

    for th in (object := soup(object)).select(selector):
        th.attrs.setdefault("scope", "col")
    return str(object)

these scope has no visual effect, however we can use assert_has_col_scope to verify the scope is correct.

assert_has_col_scope(col_scoped := set_col_scope(captioned));

3. Row headers are inside `<th scope="row">` elements.¤

similar to scope="col", the <th> elements in the body require scope="row"; basically every <th> needs the scope property.

    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
        "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)

def assert_has_row_scope(object, selector="tbody tr th"):
    assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
    "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)

    def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
like the columns, we can deterministically add `scope="row"`

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)

def set_row_scope(object, selector="tbody tr th, tfoot tr th"):

like the columns, we can deterministically add scope="row"

    for th in (object := soup(object)).select(selector):
        th.attrs.setdefault("scope", "row")
    return str(object)

we use set_row_scope to verify the scope because, again, there aren't any visual effects to these changes.

assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

4. Avoid using blank header cells.¤

this instruction to yield a best practice to name the dataframe index. without a name, the index <th> will always be empty.

def assert_no_blank_header(body):
    assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"

# with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

pandas requires some upstream work to satisfy this instruction.

    def set_squashed_th(body):
`set_squashed_th` squashes the table column names. this method on works for the most basic dataframes.

        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)

def set_squashed_th(body):

set_squashed_th squashes the table column names. this method on works for the most basic dataframes.

    table = soup(body)
    tr = bs4.Tag(name="tr")
    col, row = table.select("thead tr")
    for top, bottom in zip(col.select("th"), row.select("th")):
        tr.append(top if top.string.strip() else bottom)
    thead = bs4.Tag(name="thead"); thead.append(tr)
    table.select_one("thead").replace_with(thead)
    return str(table)

the frame below doesn't have empty <th> elements and is denser.

a value-less dataframe with columns and row indexes
index	A	B	C
0	nan	nan	nan
1	nan	nan	nan

assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

### 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

the naming of columns is a context specific screen reader feature.
authors would have to add this information themselves.

5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.¤

the naming of columns is a context specific screen reader feature. authors would have to add this information themselves.

    titles = dict(A="apple", B="banana", C="carrot")
    def set_header_titles(body, titles=titles):
`set_header_titles` is a method that includes the name of the abbreviation in the `title`.

        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)

&gt; a real dataset would make more sense in this example.

titles = dict(A="apple", B="banana", C="carrot")
def set_header_titles(body, titles=titles):

set_header_titles is a method that includes the name of the abbreviation in the title.

    for th in (body := soup(body)).select("thead tr th"):
        name = th.string.strip()
        if name in titles:
            th.attrs["title"] = titles[name]
    return str(body)

> a real dataset would make more sense in this example.

titled = set_header_titles(squashed_th, titles)

    def strip_class_ids(body):
`pandas` adds classes and ids to elements that for the sake of this discussion are superfluous.

        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)

        return str(body)
    final = strip_class_ids(titled)

def strip_class_ids(body):

pandas adds classes and ids to elements that for the sake of this discussion are superfluous.

    for e in (body := soup(body)).select("td, th"): 
        e.attrs.pop("id", None)
        e.attrs.pop("class", None)

    return str(body)
final = strip_class_ids(titled)

inconsistencies in labelled indexes¤

everything goes to hell with named_column which as a column name.

named_column = df.copy()
named_column.columns.name = "letters"

### column and index names

consider the case of `df2` where the `df2.columns` is named and `df2.index` is not. 

{% set df = named_column.style.set_caption(pidgy.filters.md("the `named_column` dataframe with a column name")) %}
{{df}}

screenreader visitors may struggle to interpret the meaning of "letters" relative to the index.
ensurely a proper experience for screen readers will require extra markup to group the `<th> with the columns.

column and index names¤

consider the case of df2 where the df2.columns is named and df2.index is not.

the `named_column` dataframe with a column name
letters	A	B	C
index
0	nan	nan	nan
1	nan	nan	nan

screenreader visitors may struggle to interpret the meaning of "letters" relative to the index. ensurely a proper experience for screen readers will require extra markup to group the ` with the columns.

### column name and no index name

    named_column_no_index = named_column.copy()
    named_column_no_index.index.name = None

{% set df = named_column_no_index.style.set_caption(pidgy.filters.md("the `named_column_no_index` dataframe with a column name and without an index name")) %}
{{df}}

in this conformation, it is possible for a screen reader to misinterpet `letters` as the name of the index column.
when the column index is named, like `letters`, the entry should be `<th scope="row">`. 
it this example we can see how instructions in 2 and 3 differ.

column name and no index name¤

named_column_no_index = named_column.copy()
named_column_no_index.index.name = None

the `named_column_no_index` dataframe with a column name and without an index name
letters	A	B	C
0	nan	nan	nan
1	nan	nan	nan

in this conformation, it is possible for a screen reader to misinterpet letters as the name of the index column. when the column index is named, like letters, the entry should be <th scope="row">. it this example we can see how instructions in 2 and 3 differ.

conclusions¤

for basic dataframes, two practices can be enforced without knowledge of the data:

- [x] Column headers are inside <th scope="col"> elements.
- [x] Row headers are inside <th scope="row"> elements.


the abbreviations and caption are context specific and require knowledge of the data:

- [ ] Title of data table is inside the <caption> element.
- [ ] Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

with `pandas&lt;={{pandas.__version__}}`, it is hard to avoid black header cells without some significant effort.


- [ ] Avoid using blank header cells.


some conventions we can extract from this study is:

* treat the `df.index` as a column that needs to be named
* if an index is superfluous then remove it. this can be down with the styler `df.style.hide(axis=0)`

for basic dataframes, two practices can be enforced without knowledge of the data:

Column headers are inside elements.
Row headers are inside elements.

the abbreviations and caption are context specific and require knowledge of the data:

Title of data table is inside the element.
Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

with pandas<=1.4.2, it is hard to avoid black header cells without some significant effort.

Avoid using blank header cells.

some conventions we can extract from this study is:

treat the df.index as a column that needs to be named
if an index is superfluous then remove it. this can be down with the styler df.style.hide(axis=0)

final frame¤

our final dataframe has:

a value-less dataframe with columns and row indexes
index	A	B	C
0	nan	nan	nan
1	nan	nan	nan

final html source¤

<style type="text/css">
</style>
<table id="T_595b7">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
<tbody>
<tr>
<th scope="row">0</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
<tr>
<th scope="row">1</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
</tbody>
</table>

### about the `scope` attribute

<q cite="https://www.w3schools.com/tags/att_scope.asp">The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows.</q>
<q cite="https://dequeuniversity.com/rules/axe/4.0/scope-attr-valid">The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient. <q>

about the `scope` attribute¤

The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows. The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient.

rendering dataframes for screen readers with pandas¤