skip to main content

@tonyfast s notebooks

site navigation
notebook summary
title
rendering dataframes for screen readers with pandas
description
in the document we'll explore what it takes to make pandas.DataFrames accessible. we'll follow Paul J Adam's instructions for making Simple Data Tables.
cells
38 total
34 code
state
executed in order
kernel
Python [conda env:root] *
language
python
name
conda-root-py
lines of code
157
outputs
35
table of contents
  • 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.
  • inconsistencies in labelled indexes
  • column and index names
  • column name and no index name
  • conclusions
  • final frame
  • final html source
  • about the scope attribute
  • links
  • {"kernelspec": {"display_name": "Python [conda env:root] *", "language": "python", "name": "conda-root-py"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13"}, "widgets": {"application/vnd.jupyter.widget-state+json": {"state": {}, "version_major": 2, "version_minor": 0}}, "title": "rendering dataframes for screen readers with pandas", "description": "in the document we'll explore what it takes to make pandas.DataFrames accessible.\nwe'll follow Paul J Adam's instructions for making Simple Data Tables."}
    notebook toolbar
    Activate
    cell ordering

  • [x] Row headers are inside
  • the abbreviations and caption are context specific and require knowledge of the data:

  • [ ] Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.
  • with pandas<=1.4.2 , it is hard to avoid black header cells without some significant effort.

    some conventions we can extract from this study is:

    1

    rendering dataframes for screen readers with

    in the document we'll explore what it takes to make pandas.DataFrame s accessible. we'll follow Paul J Adam's instructions for making Simple Data Tables .

    2 1 outputs.
    3 2 outputs.
    the basic dataframe df we use for explanation in this document.
      A B C
    index      
    0 nan nan nan
    1 nan nan nan
    (df := pandas.DataFrame(
        columns=pandas.Index(list("ABC")), 
        index=pandas.Index(range(2), name="index"))
    ).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")
    
    4 1 outputs.

    df is a basic pandas.DataFrame because:

    • it has ONE row level
    • it has ONE column level

    we do not need to consider the contents of the cells for the recommendations we are implementing.

    5 1 outputs.
    6 1 outputs.

    1. Title of data table is inside the element.

    the caption is dependent on the data and effects the visual appearance of the table. it is context dependent up to the author to supply.

    7 1 outputs.
    def assert_has_caption(object):
        caption = soup(object).select_one("table caption")
        assert caption and caption.string.strip(), "table is missing a <caption>"
    # with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())
    
    8

    the caption can be set using the pandas.DataFrame.style.set_caption method. the result is a pandas.io.formats.style.Styler object that let's us modify how the instance is displayed.

    9 1 outputs.

    set_caption user the pandas.DataFrame.style attribute to set a caption

    10 1 outputs.
    def set_caption(df, caption) -> pandas.io.formats.style.Styler:
        return df.style.set_caption(caption)
    

    after we have a styler we are working with a subset of pandas operations. the styler should be the last stop for the data.

    11 1 outputs.

    the <table> below demonstrates how the <caption> appears on a captioned pandas.DataFrame .

    a value-less dataframe with columns and row indexes
      A B C
    index      
    0 nan nan nan
    1 nan nan nan
    assert_has_caption(
        captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
    );
    
    12 1 outputs.

    pandas.io.formats.style.Styler

    pandas.io.formats.style.Styler gives a different html representation that _repr_html_

    assert df.style.to_html() != df.to_html()
    
    13 1 outputs.

    2. Column headers are inside elements.

    the scope property improves navigation for screen readers when used properly.

    14 1 outputs.
    def assert_has_col_scope(object, selector="thead tr th"):
    

    for basic frames, all the <th> tags in <thead> should have scope="col"

        assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_col_scope(captioned)
    
    15 1 outputs.
    def set_col_scope(object, selector="thead tr th"):
    

    set_col_scope automatically remediates missing columns scope s

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "col")
        return str(object)
    
    16 1 outputs.

    these scope has no visual effect, however we can use assert_has_col_scope to verify the scope is correct.

    assert_has_col_scope(col_scoped := set_col_scope(captioned));
    
    17 1 outputs.

    3. Row headers are inside elements.

    similar to scope="col" , the <th> elements in the body require scope="row" ; basically every <th> needs the scope property.

    18 1 outputs.
    def assert_has_row_scope(object, selector="tbody tr th"):
        assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
        "<th> is missing `scope='col'`"
    # with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
    
    19 1 outputs.
    def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
    

    like the columns, we can deterministically add scope="row"

        for th in (object := soup(object)).select(selector):
            th.attrs.setdefault("scope", "row")
        return str(object)
    
    20 1 outputs.

    we use set_row_scope to verify the scope because, again, there aren't any visual effects to these changes.

    assert_has_row_scope(row_scoped := set_row_scope(col_scoped))
    
    21 1 outputs.

    4. Avoid using blank header cells.

    this instruction to yield a best practice to name the dataframe index . without a name, the index <th> will always be empty.

    22 1 outputs.
    def assert_no_blank_header(body):
        assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"
    
    23 1 outputs.
    # with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)
    
    24 1 outputs.

    pandas requires some upstream work to satisfy this instruction.

    25 1 outputs.
    def set_squashed_th(body):
    

    set_squashed_th squashes the table column names. this method on works for the most basic dataframes.

        table = soup(body)
        tr = bs4.Tag(name="tr")
        col, row = table.select("thead tr")
        for top, bottom in zip(col.select("th"), row.select("th")):
            tr.append(top if top.string.strip() else bottom)
        thead = bs4.Tag(name="thead"); thead.append(tr)
        table.select_one("thead").replace_with(thead)
        return str(table)
    
    26 1 outputs.

    the frame below doesn't have empty <th> elements and is denser.

    a value-less dataframe with columns and row indexes
    index A B C
    0 nan nan nan
    1 nan nan nan
    assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))
    
    27 1 outputs.

    5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.

    the naming of columns is a context specific screen reader feature. authors would have to add this information themselves.

    28 1 outputs.
    titles = dict(A="apple", B="banana", C="carrot")
    def set_header_titles(body, titles=titles):
    

    set_header_titles is a method that includes the name of the abbreviation in the title .

        for th in (body := soup(body)).select("thead tr th"):
            name = th.string.strip()
            if name in titles:
                th.attrs["title"] = titles[name]
        return str(body)
    

    a real dataset would make more sense in this example.

    29 1 outputs.
    titled = set_header_titles(squashed_th, titles)
    
    30 1 outputs.
    def strip_class_ids(body):
    

    pandas adds classes and ids to elements that for the sake of this discussion are superfluous.

        for e in (body := soup(body)).select("td, th"): 
            e.attrs.pop("id", None)
            e.attrs.pop("class", None)
    
        return str(body)
    final = strip_class_ids(titled)
    
    31 1 outputs.

    inconsistencies in labelled indexes

    everything goes to hell with named_column which as a column name.

    named_column = df.copy()
    named_column.columns.name = "letters"
    
    32 1 outputs.

    column and index names

    consider the case of df2 where the df2.columns is named and df2.index is not.

    the named_column dataframe with a column name

    letters A B C
    index      
    0 nan nan nan
    1 nan nan nan

    screenreader visitors may struggle to interpret the meaning of "letters" relative to the index. ensurely a proper experience for screen readers will require extra markup to group the `

    with the columns.
    33 1 outputs.

    column name and no index name

    named_column_no_index = named_column.copy()
    named_column_no_index.index.name = None
    

    the named_column_no_index dataframe with a column name and without an index name

    letters A B C
    0 nan nan nan
    1 nan nan nan

    in this conformation, it is possible for a screen reader to misinterpet letters as the name of the index column. when the column index is named, like letters , the entry should be <th scope="row"> . it this example we can see how instructions in 2 and 3 differ.

    34

    conclusions

    35 1 outputs.

    for basic dataframes, two practices can be enforced without knowledge of the data:

    • [x] Column headers are inside
    elements. elements.
    element.
    36 1 outputs.

    final frame

    our final dataframe has:

    • [x] <caption>
    • [x] <th scope="col">
    • [x] <th scope="row">
    • [x] no empty <th>
    • [x] <th title> for abbreviations
    a value-less dataframe with columns and row indexes
    index A B C
    0 nan nan nan
    1 nan nan nan

    final html source

    <style type="text/css">
    </style>
    <table id="T_595b7">
    <caption>a value-less dataframe with columns and row indexes</caption>
    <thead><tr><th scope="col">index</th><th scope="col" title="apple">A</th><th scope="col" title="banana">B</th><th scope="col" title="carrot">C</th></tr></thead>
    <tbody>
    <tr>
    <th scope="row">0</th>
    <td>nan</td>
    <td>nan</td>
    <td>nan</td>
    </tr>
    <tr>
    <th scope="row">1</th>
    <td>nan</td>
    <td>nan</td>
    <td>nan</td>
    </tr>
    </tbody>
    </table>
    
    37 1 outputs.

    about the attribute

    The scope attribute specifies whether a header cell is a header for a column, row, or group of columns or rows.

    The scope attribute makes table navigation much easier for screen reader users, provided that it is used correctly. Incorrectly used, scope can make table navigation much harder and less efficient.

    38