proper html tables with multiple indexes¤
our goal is reduce the empty cells in tables, especially where headers should. empty cells diminish the experience for assistive technology users. through this study we'll design some accessible options we could generically use to represent dataframes.
import pandas, bs4, enum, numpy, midgy
get_ipython().display_formatter.formatters["text/html"].for_type(bs4.BeautifulSoup, str);
%%
<style>
:is(.jp-OutputArea-output.jp-RenderedHTMLCommon, .nb-outputs) :is(td,th) {
border: 1px solid;
}
</style>
create a sample dataframe to work with that has multiple indexes on both axes. this facilitates our study because it is easier to remove axes than add them later. the code snippet below provides our expected outcome.
index = pandas.MultiIndex.from_product([
["A", "Z"], ["M", "N", "O"], [1, 2, 3]
], names=[*"JKL"])
(df := pandas.DataFrame(columns=index, index=index).rename_axis(columns=[10, 100, 1000]).head())
single = df.droplevel((0, 1), 0).droplevel((0, 1), 1).rename_axis(None, axis=1).rename_axis(None, axis=0)
df
accessibility html recommendations¤
there is a long history of html table layouts, they have existed since html 3.2 in january 1997, these standards precede a lot of the history of mass data literacy.
table
s are introduced to present 2-D data structures
> VISICALC represented a new idea of a way to use a computer and a new way of thinking about the world. Where conventional programming was thought of as a sequence of steps, this new thing was no longer sequential in effect: When you made a change in one place, all other things changed instantly and automatically. >> Ted Nelson[13]
considering the accessibility of table
s means we need extend the visual representation to the tactile and audible experiences.
let's start with some popular advice from rachele ditullio about 5 ways to improve table accessibility
- Caption that table
- Include header text for every column
- Use alt attributes meaningfully
- Have data in every table cell
- Check your (con)text
this is where we start because these recommendations represent users needs before needs. accessibility requires us to center a user's visual, audible, and tactile experience when working with data.
%% -u
## testing an actual dataframe
based off these suggestions we can connect dataframe parlance to the consistent standards of html.
what follows are comments on how each all 5 of the suggestions apply to `pandas.DataFrame` objects.
1. `pandas` tables typically lack a `caption` unless the code author is aware of `df.style.set_caption`.
the `caption` element provides an aria label that` gives assistive technology users more context as they navigate information.
df.style.set_caption("the public api for adding a caption to a dataframe.")
2. as of v2.2, there are conformations pandas columns and indexes that generate representations containing empty headers.
an accessible, assistive experience will avoid empty cells, especially header cells.
the first cell in the table is empty for the cases where `count_empty_cell` reveals non-zero results.
this means that assistive technology users will find empty cells in most of the pandas dataframe
representations available online. this oversight is costly because technologies like screen readers
and braille displays require parsing information serially rather than our parallel vision experience.
def count_empty_cell(df):
`count_empty_cell` will count the empty `th` and `td` elements in a rendered dataframe.
it was created to demonstrate the different conditions on dataframe indexes and columns
that influence the current visual form of the dataframe.
* our test dataframe has empty cells because the index and columns are unnamed.
>>> assert count_empty_cell(df) > 0
* a dataframe with a named column index has no empty cells.
>>> assert count_empty_cell(single.rename_axis(columns="upper")) == 0
* there are empty cells when the index is named because the index name is given its own row.
>>> assert count_empty_cell(single.rename_axis(index="lower")) > 0
table_cells = pandas.Series(bs4.BeautifulSoup(df.to_html(), features="lxml").select("th,td"))
return table_cells.apply(any).__invert__().sum()
4. adding alt text to images is out of scope for this investigation. it is very valid. pandas dataframes
may contain various mimetypes of content and their representations should be assistive.
however, for this study, the index and columns are the primary axes for building an accessible table substrate.
5. `null` and `Nan` need semantically meaningful representations.
the programming collquialisms for empty content may not translate to assistive technology.
_what about braille_?
it is unlikely there is a best placeholder for this values so this value should be configurable.
placeholder = "not a number"
df.fillna(F"<span class="sro">{placeholder}</span>").style
6. yes abbreviations and punctuation should be considered,
but this is an advanced technique that requires manually screen reader testing literacies.
the comparison between rachel's advice and `pandas` dataframes is just a start down the rabbit hole.
we'll begin to bring in other articles, standards, and specifications to design ARIA first rule implementations
of pandas tidy frames.
testing an actual dataframe¤
based off these suggestions we can connect dataframe parlance to the consistent standards of html.
what follows are comments on how each all 5 of the suggestions apply to pandas.DataFrame
objects.
-
pandas
tables typically lack acaption
unless the code author is aware ofdf.style.set_caption
. thecaption
element provides an aria label that` gives assistive technology users more context as they navigate information.df.style.set_caption("the public api for adding a caption to a dataframe.")
-
as of v2.2, there are conformations pandas columns and indexes that generate representations containing empty headers. an accessible, assistive experience will avoid empty cells, especially header cells. the first cell in the table is empty for the cases where
count_empty_cell
reveals non-zero results. this means that assistive technology users will find empty cells in most of the pandas dataframe representations available online. this oversight is costly because technologies like screen readers and braille displays require parsing information serially rather than our parallel vision experience.def count_empty_cell(df):
count_empty_cell
will count the emptyth
andtd
elements in a rendered dataframe. it was created to demonstrate the different conditions on dataframe indexes and columns that influence the current visual form of the dataframe.-
our test dataframe has empty cells because the index and columns are unnamed.
>>> assert count_empty_cell(df) > 0
-
a dataframe with a named column index has no empty cells.
>>> assert count_empty_cell(single.rename_axis(columns="upper")) == 0
-
there are empty cells when the index is named because the index name is given its own row.
>>> assert count_empty_cell(single.rename_axis(index="lower")) > 0 table_cells = pandas.Series(bs4.BeautifulSoup(df.to_html(), features="lxml").select("th,td")) return table_cells.apply(any).__invert__().sum()
-
-
adding alt text to images is out of scope for this investigation. it is very valid. pandas dataframes may contain various mimetypes of content and their representations should be assistive. however, for this study, the index and columns are the primary axes for building an accessible table substrate.
-
null
andNan
need semantically meaningful representations. the programming collquialisms for empty content may not translate to assistive technology. what about braille? it is unlikely there is a best placeholder for this values so this value should be configurable.placeholder = "not a number" df.fillna(F"<span class="sro">{placeholder}</span>").style
-
yes abbreviations and punctuation should be considered, but this is an advanced technique that requires manually screen reader testing literacies.
the comparison between rachel's advice and pandas
dataframes is just a start down the rabbit hole.
we'll begin to bring in other articles, standards, and specifications to design ARIA first rule implementations
of pandas tidy frames.
more accessible tables¤
our next resource provides more dos and don'ts that correspond to accessible table experiences.
-
✅ Designate at least one row and/or column header using the table formatting tools in your web content management system or document creation software.
pandas
dataframes should be represented with a named index or column. -
✅ Use the
<th>
element to mark up table headers in HTML. -
❌ Table headers should never be empty. This is particularly of concern for the top-left cell of some tables.
our primary task is to remove empty cells from the
thead
porition of the dataframe representation. currently, it is very common for screen reader users to find empty first cells in a table. imagine how much it sucks. -
✅ If you do create a complex data table on a webpage, use the
<scope>
tag to programmatically associate the data cells with the appropriate headers. -
❌ Don't merge cells.
merging cells creates ambiguities.
td
s in dataframes will NEVER be spanned.
Accessible table do and don't¤
https://accessibility.umn.edu/what-you-can-do/start-7-core-skills/tables
Use Tables to Display Data
* ✅ Use tables to present information in a grid, or matrix, with columns or rows that show the meaning of the information. - ❌ Don't use tables to make your webpage look a particular way. Layout tables on webpages do not pose inherent accessibility issues, but it is more difficult to make sure screen reader software reads the cells in the proper order. - ❌ Never use tables as a means of laying out a page in a Google or Microsoft Word document. While these tables can be hidden from visual users by simply eliminating the borders between cells, they cannot be hidden from screen readers.Designate Row and/or Column Headers
* ✅ Designate at least one row and/or column header using the table formatting tools in your web content management system or document creation software. * ✅ Use the `Avoid or Simplify Complex Tables
* ✅ Include a maximum of one header row and one header column. * ✅ Spell out abbreviations or acronyms, or use the `` or `` tags in HTML to ensure accessibility. * ✅ If your table has multiple header rows, merged cells, or another table embedded in it, split it into two or more simple tables. * ✅ If you do create a complex data table on a webpage, use the `Provide Contextual Information
* ✅ Associate descriptive text about a table with its respective table by including a `Include Content in All Cells
* ✅ Include text such as "not applicable," "none," etc. to indicate that there is no data in empty cells. - ❌ Don't leave any table cells empty.better tables¤
%%
def index_span(index: pandas.Index) -> pandas.DataFrame:
we need to tidy our indexes that may have grouped indexes.
`index_span` defines the logic for diffing and labelling the index to measure the column
and row spans for an index.
return pandas.concat(
dict(
diff=(diff := index.to_frame().pipe(diff_shift)),
label=(label := diff.cumsum()),
span=label.apply(
lambda s: s.drop_duplicates().apply(s.value_counts().get), axis=0
)
), axis=1
).replace({numpy.nan: None})
def diff_shift(df: pandas.DataFrame) -> pandas.DataFrame:
shift a data by a row to determine the nearest change in the index when determining spanning metrics.
return pandas.DataFrame(
numpy.concatenate((numpy.array([[True]*df.shape[1]]), df.values[:-1] != df.values[1:]), 0),
columns=df.columns
)
def index_span(index: pandas.Index) -> pandas.DataFrame:
we need to tidy our indexes that may have grouped indexes.
index_span
defines the logic for diffing and labelling the index to measure the column
and row spans for an index.
return pandas.concat(
dict(
diff=(diff := index.to_frame().pipe(diff_shift)),
label=(label := diff.cumsum()),
span=label.apply(
lambda s: s.drop_duplicates().apply(s.value_counts().get), axis=0
)
), axis=1
).replace({numpy.nan: None})
def diff_shift(df: pandas.DataFrame) -> pandas.DataFrame:
shift a data by a row to determine the nearest change in the index when determining spanning metrics.
return pandas.DataFrame(
numpy.concatenate((numpy.array([[True]*df.shape[1]]), df.values[:-1] != df.values[1:]), 0),
columns=df.columns
)
%%
def column_major(df: pandas.DataFrame, caption=None, SPAN=True) -> bs4.BeautifulSoup:
convert a dataframe to a `column_major` html representation that presents the column index names first.
soup = bs4.BeautifulSoup(features="html.parser")
soup.append(table := soup.new_tag("table"))
if caption:
table.append(cap := soup.new_tag("caption"))
cap.append(caption)
ROWS, COLS = any(df.index.names), any(df.columns.names)
pre-compute the grouping structure of the indexes
row_span, col_span = index_span(df.index), index_span(df.columns)
for col_level, col_name in enumerate(df.columns.names):
1. show the column index names
table.append(tr := soup.new_tag("tr"))
if COLS:
attrs = dict(scope="row")
if df.index.nlevels > 1:
attrs.update(colspan=df.index.nlevels)
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(col_name) or F"level {col_level}")
for col_index, col_value in enumerate(df.columns.get_level_values(col_level)):
1. show the column index values
attrs = dict(scope="col")
span = col_span["span"].iloc[col_index, col_level] if SPAN else 1
if span:
if span > 1:
attrs.update(colspan=int(span))
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(col_value))
if ROWS:
1. insert the row names below the column names
table.append(tr := soup.new_tag("tr"))
attrs = dict(scope="col")
for row_level, row_name in enumerate(df.index.names):
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(row_name) or F"index {row_level}")
for col_value in df.columns.get_level_values(col_level):
followed by a blank row, a blank row is suboptimal for assistive technology.
attrs = dict(scope="col")
tr.append(td := soup.new_tag("td"))
for row_index in range(df.shape[0]):
1. write the row index headers
table.append(tr := soup.new_tag("tr"))
for row_level in range(df.index.nlevels):
span = row_span["span"].iloc[row_index, row_level] if SPAN else 1
if span:
attrs = dict(scope="row")
if span > 1:
attrs.update(rowspan=int(span))
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(df.index.get_level_values(row_level)[row_index]))
for value in df.iloc[row_index].values:
1. write the values of the dataframe
tr.append(td := soup.new_tag("td"))
td.append(str(value))
return soup
def column_major(df: pandas.DataFrame, caption=None, SPAN=True) -> bs4.BeautifulSoup:
convert a dataframe to a column_major
html representation that presents the column index names first.
soup = bs4.BeautifulSoup(features="html.parser")
soup.append(table := soup.new_tag("table"))
if caption:
table.append(cap := soup.new_tag("caption"))
cap.append(caption)
ROWS, COLS = any(df.index.names), any(df.columns.names)
pre-compute the grouping structure of the indexes
row_span, col_span = index_span(df.index), index_span(df.columns)
for col_level, col_name in enumerate(df.columns.names):
- show the column index names
table.append(tr := soup.new_tag("tr")) if COLS: attrs = dict(scope="row") if df.index.nlevels > 1: attrs.update(colspan=df.index.nlevels) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(col_name) or F"level {col_level}") for col_index, col_value in enumerate(df.columns.get_level_values(col_level)):
-
show the column index values
attrs = dict(scope="col") span = col_span["span"].iloc[col_index, col_level] if SPAN else 1 if span: if span > 1: attrs.update(colspan=int(span)) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(col_value))
if ROWS: 1. insert the row names below the column names
table.append(tr := soup.new_tag("tr")) attrs = dict(scope="col") for row_level, row_name in enumerate(df.index.names): tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(row_name) or F"index {row_level}") for col_value in df.columns.get_level_values(col_level):
followed by a blank row, a blank row is suboptimal for assistive technology.
attrs = dict(scope="col") tr.append(td := soup.new_tag("td"))
for row_index in range(df.shape[0]): 1. write the row index headers
table.append(tr := soup.new_tag("tr")) for row_level in range(df.index.nlevels): span = row_span["span"].iloc[row_index, row_level] if SPAN else 1 if span: attrs = dict(scope="row") if span > 1: attrs.update(rowspan=int(span)) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(df.index.get_level_values(row_level)[row_index])) for value in df.iloc[row_index].values:
- write the values of the dataframe
tr.append(td := soup.new_tag("td")) td.append(str(value)) return soup
- write the values of the dataframe
-
%%
def row_major(df, caption=None, SPAN=True):
a `row_major` representation that presents the row index names first.
soup = bs4.BeautifulSoup(features="lxml")
soup.append(table := soup.new_tag("table"))
if caption:
table.append(cap := soup.new_tag("caption"))
cap.append(caption)
ROWS, COLS = any(df.index.names), any(df.columns.names)
1. precompute the row and column index spans
row_span, col_span = index_span(df.index), index_span(df.columns)
for col_level, col_name in enumerate(df.columns.names):
table.append(tr := soup.new_tag("tr"))
if not col_level:
1. write the index names on the first pass of the header rows.
if ROWS or not COLS:
attrs = dict(scope="col")
if df.columns.nlevels > 1:
attrs.update(rowspan=df.columns.nlevels)
for row_level, row_name in enumerate(df.index.names):
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(row_name) or F"index {row_level}")
if COLS:
1. include the column index names if they exist
attrs = dict(scope="row")
if not ROWS and df.index.nlevels > 1:
attrs.update(colspan=df.index.nlevels)
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(col_name) or F"level {col_level}")
for col_index, col_value in enumerate(df.columns.get_level_values(col_level)):
1. write the values for the column index
attrs = dict(scope="col")
span = col_span["span"].iloc[col_index, col_level] if SPAN else 1
if span:
attrs = dict(scope="col")
if span > 1:
attrs.update(colspan=int(span))
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(col_value))
for row_index in range(df.shape[0]):
1. write the index header values
table.append(tr := soup.new_tag("tr"))
for row_level in range(df.index.nlevels):
span = row_span["span"].iloc[row_index, row_level] if SPAN else 1
if span:
attrs = dict(scope="row")
if span > 1:
attrs.update(rowspan=int(span))
tr.append(th := soup.new_tag("th", attrs=attrs))
th.append(str(df.index.get_level_values(row_level)[row_index]))
if ROWS and COLS:
1. insert an empty column if we have column names
tr.append(td := soup.new_tag("td"))
for value in df.iloc[row_index].values:
1. write the data
tr.append(td := soup.new_tag("td"))
td.append(str(value))
return soup
def row_major(df, caption=None, SPAN=True):
a row_major
representation that presents the row index names first.
soup = bs4.BeautifulSoup(features="lxml")
soup.append(table := soup.new_tag("table"))
if caption:
table.append(cap := soup.new_tag("caption"))
cap.append(caption)
ROWS, COLS = any(df.index.names), any(df.columns.names)
- precompute the row and column index spans
row_span, col_span = index_span(df.index), index_span(df.columns) for col_level, col_name in enumerate(df.columns.names): table.append(tr := soup.new_tag("tr")) if not col_level:
-
write the index names on the first pass of the header rows.
if ROWS or not COLS: attrs = dict(scope="col") if df.columns.nlevels > 1: attrs.update(rowspan=df.columns.nlevels) for row_level, row_name in enumerate(df.index.names): tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(row_name) or F"index {row_level}") if COLS:
-
include the column index names if they exist
attrs = dict(scope="row") if not ROWS and df.index.nlevels > 1: attrs.update(colspan=df.index.nlevels) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(col_name) or F"level {col_level}")
for col_index, col_value in enumerate(df.columns.get_level_values(col_level)): 1. write the values for the column index
attrs = dict(scope="col") span = col_span["span"].iloc[col_index, col_level] if SPAN else 1 if span: attrs = dict(scope="col") if span > 1: attrs.update(colspan=int(span)) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(col_value))
for row_index in range(df.shape[0]): 1. write the index header values
table.append(tr := soup.new_tag("tr")) for row_level in range(df.index.nlevels): span = row_span["span"].iloc[row_index, row_level] if SPAN else 1 if span: attrs = dict(scope="row") if span > 1: attrs.update(rowspan=int(span)) tr.append(th := soup.new_tag("th", attrs=attrs)) th.append(str(df.index.get_level_values(row_level)[row_index])) if ROWS and COLS:
-
insert an empty column if we have column names
tr.append(td := soup.new_tag("td"))
for value in df.iloc[row_index].values: 1. write the data
tr.append(td := soup.new_tag("td")) td.append(str(value)) return soup
-
-
single index names¤
row_major(df.head().rename_axis((None, None, None), axis=1).droplevel((0, 1), axis=1).droplevel((0,1), axis=0),
"a single index row major")
column_major(df.head().rename_axis((None, None, None), axis=0).droplevel((0, 1), axis=0).droplevel((0,1), axis=1),
"a single index column major")
row_major(df.head().rename_axis((None, None, None), axis=1).droplevel(0, axis=1).droplevel((0,1), axis=0),
"a multi index row major")
row_major(df.head().rename_axis((None, None, None), axis=0).droplevel(0, axis=0).droplevel((0,1), axis=1),
"a multi index column major")
one named index spanning¤
row_major(df.head().rename_axis((None, None, None), axis=1), "spanning multiple index row major")
column_major(df.head().rename_axis((None, None, None), axis=0), "spanning multiple index column major")
spanning two named indexes¤
row_major(df.head(), "spanning multiple indexes row major")
column_major(df.head(), "spanning multiple index column major")
it is hard to imagine a way to constructing a dataframe with two named indexes without empty cells. we'll likely include an empty row or column. this might seem like this is authoring choice, but we can't know our viewers intent when they interrogate data. the most flexible, compromising approach would be allow for this to change on the client, where the only author choice represents the steady state.
non-spanning frames¤
so far we have only illustrated spanning examples meaning that the row/column index may span multiple rows; only headers will span for dataframes while data will not. this experience can really suck for assistive technology introducing navigation ambiguities and complications.
row_major(df.head(), "non-spanning multiple indexes row major", False)
column_major(df.head(), "non-spanning multiple index column major", False)
%%
## visual and nonvisual shape
rows and columns references in a `table` are the not the same as a dataframe.
convention holds that dataframes show their shape, the shape of the data.
the nominal references we use for dataframes are shifted ordinal references
when the shape is shared along side a screen reader.
the shape of the table to the screen reader inclused the rows and columns.
to assistive technology, the shape of the dataframe is computed by:
df.shape[0] + df.columns.nlevels, df.shape[1] + df.index.nlevels
this naive heuristic is only true for certain combinations of multi indexes. a more rigorous implementation would handle these edge cases.
these inconsistencies mean that screen reader users may be referencing an different indexing system than sighted users.
we improve the captioning with the nominal shape vs the actual shape.
mentioning the row and columns levels would help parse this content.
if we are requiring folks to do math in their heads then we'll want an adaptive approach to discussing shape.
visual and nonvisual shape¤
rows and columns references in a table
are the not the same as a dataframe.
convention holds that dataframes show their shape, the shape of the data.
the nominal references we use for dataframes are shifted ordinal references
when the shape is shared along side a screen reader.
the shape of the table to the screen reader inclused the rows and columns.
to assistive technology, the shape of the dataframe is computed by:
df.shape[0] + df.columns.nlevels, df.shape[1] + df.index.nlevels
this naive heuristic is only true for certain combinations of multi indexes. a more rigorous implementation would handle these edge cases.
these inconsistencies mean that screen reader users may be referencing an different indexing system than sighted users.
we improve the captioning with the nominal shape vs the actual shape. mentioning the row and columns levels would help parse this content. if we are requiring folks to do math in their heads then we'll want an adaptive approach to discussing shape.
todo¤
- AT shape vs nominal shape information
- dataframes larger than trunacted thresholds where we will have to introduce aria