rendering dataframes for screen readers with pandas
¤
in the document we'll explore what it takes to make pandas.DataFrame
s accessible.
we'll follow Paul J Adam's instructions for making Simple Data Tables.
%reload_ext pidgy
import pandas.io.formats.style, bs4, pytest
shell.weave.reactive = False
shell.weave.use_async = False
soup = lambda x: bs4.BeautifulSoup(x, features="html.parser")
(df := pandas.DataFrame(
columns=pandas.Index(list("ABC")),
index=pandas.Index(range(2), name="index"))
).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")
(df := pandas.DataFrame(
columns=pandas.Index(list("ABC")),
index=pandas.Index(range(2), name="index"))
).style.set_caption("the basic dataframe <var>df</var> we use for explanation in this document. ")
`df` is a basic `pandas.DataFrame` because:
* it has ONE row level
* it has ONE column level
we do not need to consider the contents of the cells for the recommendations we are implementing.
df
is a basic pandas.DataFrame
because:
- it has ONE row level
- it has ONE column level
we do not need to consider the contents of the cells for the recommendations we are implementing.
## applying best practices to pandas
applying best practices to pandas¤
### 1. Title of data table is inside the `<caption>` element.
the caption is dependent on the data and effects the visual appearance of the table.
it is context dependent up to the author to supply.
1. Title of data table is inside the <caption>
element.¤
the caption is dependent on the data and effects the visual appearance of the table. it is context dependent up to the author to supply.
def assert_has_caption(object):
caption = soup(object).select_one("table caption")
assert caption and caption.string.strip(), "table is missing a <caption>"
# with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())
def assert_has_caption(object):
caption = soup(object).select_one("table caption")
assert caption and caption.string.strip(), "table is missing a <caption>"
# with pytest.raises(AssertionError): assert_has_caption(df._repr_html_())
the caption can be set using the pandas.DataFrame.style.set_caption
method.
the result is a pandas.io.formats.style.Styler
object that let's us modify how the
instance is displayed.
`set_caption` user the `pandas.DataFrame.style` attribute to set a caption
set_caption
user the pandas.DataFrame.style
attribute to set a caption
def set_caption(df, caption) -> pandas.io.formats.style.Styler:
return df.style.set_caption(caption)
after we have a styler we are working with a subset of pandas operations.
the styler should be the last stop for the data.
def set_caption(df, caption) -> pandas.io.formats.style.Styler:
return df.style.set_caption(caption)
after we have a styler we are working with a subset of pandas operations. the styler should be the last stop for the data.
the `<table>` below demonstrates how the `<caption>` appears on a `captioned` `pandas.DataFrame`.
{{captioned}}
assert_has_caption(
captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
);
the <table>
below demonstrates how the <caption>
appears on a captioned
pandas.DataFrame
.
A | B | C | |
---|---|---|---|
index | |||
0 | nan | nan | nan |
1 | nan | nan | nan |
assert_has_caption(
captioned := set_caption(df, "a value-less dataframe with columns and row indexes")._repr_html_()
);
#### `pandas.io.formats.style.Styler`
`pandas.io.formats.style.Styler` gives a different html representation that `_repr_html_`
assert df.style.to_html() != df.to_html()
pandas.io.formats.style.Styler
¤
pandas.io.formats.style.Styler
gives a different html representation that _repr_html_
assert df.style.to_html() != df.to_html()
### 2. Column headers are inside `<th scope="col">` elements.
the `scope` property improves navigation for screen readers when used properly.
2. Column headers are inside <th scope="col">
elements.¤
the scope
property improves navigation for screen readers when used properly.
def assert_has_col_scope(object, selector="thead tr th"):
for basic frames, all the `<th>` tags in `<thead>` should have `scope="col"`
assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_col_scope(captioned)
def assert_has_col_scope(object, selector="thead tr th"):
for basic frames, all the <th>
tags in <thead>
should have scope="col"
assert all(th.attrs.get("scope") in {"col", "colgroup"} for th in soup(object).select(selector)), "<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_col_scope(captioned)
def set_col_scope(object, selector="thead tr th"):
`set_col_scope` automatically remediates missing columns `scope`s
for th in (object := soup(object)).select(selector):
th.attrs.setdefault("scope", "col")
return str(object)
def set_col_scope(object, selector="thead tr th"):
set_col_scope
automatically remediates missing columns scope
s
for th in (object := soup(object)).select(selector):
th.attrs.setdefault("scope", "col")
return str(object)
these `scope` has no visual effect, however we can use `assert_has_col_scope` to verify the scope is correct.
assert_has_col_scope(col_scoped := set_col_scope(captioned));
these scope
has no visual effect, however we can use assert_has_col_scope
to verify the scope is correct.
assert_has_col_scope(col_scoped := set_col_scope(captioned));
### 3. Row headers are inside `<th scope="row">` elements.
similar to `scope="col"`, the `<th>` elements in the body require `scope="row"`;
basically every `<th>` needs the scope property.
3. Row headers are inside <th scope="row">
elements.¤
similar to scope="col"
, the <th>
elements in the body require scope="row"
;
basically every <th>
needs the scope property.
def assert_has_row_scope(object, selector="tbody tr th"):
assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
"<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
def assert_has_row_scope(object, selector="tbody tr th"):
assert all(th.attrs.get("scope") in {"row", "rowgroup"} for th in soup(object).select(selector)),\
"<th> is missing `scope='col'`"
# with pytest.raises(AssertionError): assert_has_row_scope(col_scoped)
def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
like the columns, we can deterministically add `scope="row"`
for th in (object := soup(object)).select(selector):
th.attrs.setdefault("scope", "row")
return str(object)
def set_row_scope(object, selector="tbody tr th, tfoot tr th"):
like the columns, we can deterministically add scope="row"
for th in (object := soup(object)).select(selector):
th.attrs.setdefault("scope", "row")
return str(object)
we use `set_row_scope` to verify the `scope` because, again, there aren't any visual effects to these changes.
assert_has_row_scope(row_scoped := set_row_scope(col_scoped))
we use set_row_scope
to verify the scope
because, again, there aren't any visual effects to these changes.
assert_has_row_scope(row_scoped := set_row_scope(col_scoped))
### 4. Avoid using blank header cells.
this instruction to yield a best practice to __name the dataframe index__.
without a name, the index `<th>` will always be empty.
4. Avoid using blank header cells.¤
this instruction to yield a best practice to name the dataframe index.
without a name, the index <th>
will always be empty.
def assert_no_blank_header(body):
assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"
def assert_no_blank_header(body):
assert all(th.string.strip() for th in soup(body).select("th")), "there is a blank <th>"
# with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)
# with pytest.raises(AssertionError): assert_no_blank_header(row_scoped)
`pandas` requires some upstream work to satisfy this instruction.
pandas
requires some upstream work to satisfy this instruction.
def set_squashed_th(body):
`set_squashed_th` squashes the table column names. this method on works for the most basic dataframes.
table = soup(body)
tr = bs4.Tag(name="tr")
col, row = table.select("thead tr")
for top, bottom in zip(col.select("th"), row.select("th")):
tr.append(top if top.string.strip() else bottom)
thead = bs4.Tag(name="thead"); thead.append(tr)
table.select_one("thead").replace_with(thead)
return str(table)
def set_squashed_th(body):
set_squashed_th
squashes the table column names. this method on works for the most basic dataframes.
table = soup(body)
tr = bs4.Tag(name="tr")
col, row = table.select("thead tr")
for top, bottom in zip(col.select("th"), row.select("th")):
tr.append(top if top.string.strip() else bottom)
thead = bs4.Tag(name="thead"); thead.append(tr)
table.select_one("thead").replace_with(thead)
return str(table)
the frame below doesn't have empty `<th>` elements and is denser.
{{squashed_th}}
assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))
the frame below doesn't have empty <th>
elements and is denser.
index | A | B | C |
---|---|---|---|
0 | nan | nan | nan |
1 | nan | nan | nan |
assert_no_blank_header(squashed_th := set_squashed_th(row_scoped))
### 5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.
the naming of columns is a context specific screen reader feature.
authors would have to add this information themselves.
5. Header cells with text abbreviations that need expansion use the title attribute with the expanded text set as the value.¤
the naming of columns is a context specific screen reader feature. authors would have to add this information themselves.
titles = dict(A="apple", B="banana", C="carrot")
def set_header_titles(body, titles=titles):
`set_header_titles` is a method that includes the name of the abbreviation in the `title`.
for th in (body := soup(body)).select("thead tr th"):
name = th.string.strip()
if name in titles:
th.attrs["title"] = titles[name]
return str(body)
> a real dataset would make more sense in this example.
titles = dict(A="apple", B="banana", C="carrot")
def set_header_titles(body, titles=titles):
set_header_titles
is a method that includes the name of the abbreviation in the title
.
for th in (body := soup(body)).select("thead tr th"):
name = th.string.strip()
if name in titles:
th.attrs["title"] = titles[name]
return str(body)
> a real dataset would make more sense in this example.
titled = set_header_titles(squashed_th, titles)
titled = set_header_titles(squashed_th, titles)
def strip_class_ids(body):
`pandas` adds classes and ids to elements that for the sake of this discussion are superfluous.
for e in (body := soup(body)).select("td, th"):
e.attrs.pop("id", None)
e.attrs.pop("class", None)
return str(body)
final = strip_class_ids(titled)
def strip_class_ids(body):
pandas
adds classes and ids to elements that for the sake of this discussion are superfluous.
for e in (body := soup(body)).select("td, th"):
e.attrs.pop("id", None)
e.attrs.pop("class", None)
return str(body)
final = strip_class_ids(titled)
## inconsistencies in labelled indexes
everything goes to hell with `named_column` which as a column name.
named_column = df.copy()
named_column.columns.name = "letters"
inconsistencies in labelled indexes¤
everything goes to hell with named_column
which as a column name.
named_column = df.copy()
named_column.columns.name = "letters"
### column and index names
consider the case of `df2` where the `df2.columns` is named and `df2.index` is not.
{% set df = named_column.style.set_caption(pidgy.filters.md("the `named_column` dataframe with a column name")) %}
{{df}}
screenreader visitors may struggle to interpret the meaning of "letters" relative to the index.
ensurely a proper experience for screen readers will require extra markup to group the `<th> with the columns.
column and index names¤
consider the case of df2
where the df2.columns
is named and df2.index
is not.
letters | A | B | C |
---|---|---|---|
index | |||
0 | nan | nan | nan |
1 | nan | nan | nan |
screenreader visitors may struggle to interpret the meaning of "letters" relative to the index. ensurely a proper experience for screen readers will require extra markup to group the `
### column name and no index name
named_column_no_index = named_column.copy()
named_column_no_index.index.name = None
{% set df = named_column_no_index.style.set_caption(pidgy.filters.md("the `named_column_no_index` dataframe with a column name and without an index name")) %}
{{df}}
in this conformation, it is possible for a screen reader to misinterpet `letters` as the name of the index column.
when the column index is named, like `letters`, the entry should be `<th scope="row">`.
it this example we can see how instructions in 2 and 3 differ.