Skip to content

rendering dataframes for screen readers with pandas¤

in the document we'll explore what it takes to make pandas.DataFrames accessible. we'll follow Paul J Adam's instructions for making Simple Data Tables.

the basic dataframe df we use for explanation in this document.
  A B C
index      
0 nan nan nan
1 nan nan nan
(df := pandas.DataFrame(
    columns=pandas.Index(list("ABC")), 
    index=pandas.Index(range(2), name="index"))
).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")

df is a basic pandas.DataFrame because:

  • it has ONE row level
  • it has ONE column level

we do not need to consider the contents of the cells for the recommendations we are implementing.

applying best practices to pandas¤

1. Title of data table is inside the <caption> element.¤

the caption is dependent on the data and effects the visual appearance of the table. it is context dependent up to the author to supply.

def assert_has_caption(object):
    caption = soup(object).select_one("table caption")
    assert caption and caption.string.strip(), "table is missing a <caption>"
# with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())

the caption can be set using the pandas.DataFrame.style.set_caption method. the result is a pandas.io.formats.style.Styler object that let's us modify how the instance is displayed.

set_caption user the pandas.DataFrame.style attribute to set a caption

def set_caption(df, caption) -&gt; pandas.io.formats.style.Styler:
    return df.style.set_caption(caption)

after we have a styler we are working with a subset of pandas operations. the styler should be the last stop for the data.

the <table> below demonstrates how the <caption> appears on a captioned pandas.DataFrame.

a value-less dataframe with columns and row indexes
  A B C
index      
0 nan nan nan
1 nan nan nan
assert_has_caption(
    captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
);

pandas.io.formats.style.Styler¤

pandas.io.formats.style.Styler gives a different html representation that _repr_html_

assert df.style.to_html() != df.to_html()

2. Column headers are inside <th scope="col"> elements.¤

the scope property improves navigation for screen readers when used properly.

def assert_has_col_scope(object, selector="thead tr th"):

for basic frames, all the <th> tags in <thead> should have scope="col"

    assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_col_scope(captioned)
def set_col_scope(object, selector="thead tr th"):

set_col_scope automatically remediates missing columns scopes

    for th in (object := soup(object)).select(selector):
        th.attrs.setdefault("scope", "col")
    return str(object)

these scope has no visual effect, however we can use assert_has_col_scope to verify the scope is correct.

assert_has_col_scope(col_scoped := set_col_scope(captioned));

3. Row headers are inside <th scope="row"> elements.¤

similar to scope="col", the <th> elements in the body require scope="row"; basically every <th> needs the scope property.

def assert_has_row_scope(object, selector="tbody tr th"):
    assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
    "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
def set_row_scope(object, selector="tbody tr th, tfoot tr th"):

like the columns, we can deterministically add scope="row"

    for th in (object := soup(object)).select(selector):
        th.attrs.setdefault("scope", "row")
    return str(object)

we use set_row_scope to verify the scope because, again, there aren't any visual effects to these changes.

assert_has_row_scope(row_scoped := set_row_scope(col_scoped))

4. Avoid using blank header cells.¤

this instruction to yield a best practice to name the dataframe index. without a name, the index <th> will always be empty.

def assert_no_blank_header(body):
    assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"
# with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)

pandas requires some upstream work to satisfy this instruction.

def set_squashed_th(body):

set_squashed_th squashes the table column names. this method on works for the most basic dataframes.

    table = soup(body)
    tr = bs4.Tag(name="tr")
    col, row = table.select("thead tr")
    for top, bottom in zip(col.select("th"), row.select("th")):
        tr.append(top if top.string.strip() else bottom)
    thead = bs4.Tag(name="thead"); thead.append(tr)
    table.select_one("thead").replace_with(thead)
    return str(table)

the frame below doesn't have empty <th> elements and is denser.

a value-less dataframe with columns and row indexes
indexABC
0 nan nan nan
1 nan nan nan
assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))

5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.¤

the naming of columns is a context specific screen reader feature. authors would have to add this information themselves.

titles = dict(A="apple", B="banana", C="carrot")
def set_header_titles(body, titles=titles):

set_header_titles is a method that includes the name of the abbreviation in the title.

    for th in (body := soup(body)).select("thead tr th"):
        name = th.string.strip()
        if name in titles:
            th.attrs["title"] = titles[name]
    return str(body)

> a real dataset would make more sense in this example.

titled = set_header_titles(squashed_th, titles)
def strip_class_ids(body):

pandas adds classes and ids to elements that for the sake of this discussion are superfluous.

    for e in (body := soup(body)).select("td, th"): 
        e.attrs.pop("id", None)
        e.attrs.pop("class", None)

    return str(body)
final = strip_class_ids(titled)

inconsistencies in labelled indexes¤

everything goes to hell with named_column which as a column name.

named_column = df.copy()
named_column.columns.name = "letters"

column and index names¤

consider the case of df2 where the df2.columns is named and df2.index is not.

the named_column dataframe with a column name

letters A B C
index      
0 nan nan nan
1 nan nan nan

screenreader visitors may struggle to interpret the meaning of "letters" relative to the index. ensurely a proper experience for screen readers will require extra markup to group the ` with the columns.

column name and no index name¤

named_column_no_index = named_column.copy()
named_column_no_index.index.name = None

the named_column_no_index dataframe with a column name and without an index name

letters A B C
0 nan nan nan
1 nan nan nan

in this conformation, it is possible for a screen reader to misinterpet letters as the name of the index column. when the column index is named, like letters, the entry should be <th scope="row">. it this example we can see how instructions in 2 and 3 differ.

conclusions¤

for basic dataframes, two practices can be enforced without knowledge of the data:

  • Column headers are inside elements.
  • Row headers are inside elements.

the abbreviations and caption are context specific and require knowledge of the data:

  • Title of data table is inside the element.
  • Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

with pandas&lt;=1.4.2, it is hard to avoid black header cells without some significant effort.

  • Avoid using blank header cells.

some conventions we can extract from this study is:

  • treat the df.index as a column that needs to be named
  • if an index is superfluous then remove it. this can be down with the styler df.style.hide(axis=0)

final frame¤

our final dataframe has:

  • <caption>
  • <th scope="col">
  • <th scope="row">
  • no empty <th>
  • <th title=""> for abbreviations
a value-less dataframe with columns and row indexes
indexABC
0 nan nan nan
1 nan nan nan

final html source¤

<style type="text/css">
</style>
<table id="T_595b7">
<caption>a value-less dataframe with columns and row indexes</caption>
<thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
<tbody>
<tr>
<th scope="row">0</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
<tr>
<th scope="row">1</th>
<td>nan</td>
<td>nan</td>
<td>nan</td>
</tr>
</tbody>
</table>

about the scope attribute¤

The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows. The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient.