Skip to content

a semantic dataframe¤

it is reductive to view at dataframe as an html table. a table representation is the most minimal data visualization of a dataframe. and, as a visualization, context is critical.

my hypothesis is that a dataframe is really a figure. we are going attempt to treat the dataframe as table inside a figure.

    import bs4 
    soup, T, C = (soup:= bs4.BeautifulSoup()), soup.new_tag,bs4.Comment 

start by treating the entire dataframe as a figure containing a table.

    figure = T("figure")
    figure.append(table := T("table"))

tables have a caption that usually is above the form. the caption is used to present context specific information.

    table.append(caption := T("caption"))

and, in general, tables should have labels. the variable name or code repr would be a good default label.

    caption.append(T("label"))

build the rest of the table parts.

    table.append(colgroup := T("colgroup"))
    table.append(thead := T("thead"))
    table.append(tbody := T("tbody"))

the footer is under utilized and would be specifically useful in the case of selections.

    table.append(tfoot := T("tfoot"))

we have another caption for the table and context specific caption. the caption includes details and summary about the dataframe. currently, pandas provides the shape of the dataframe which could be a good place to start with the content in the summary

    figure.append(figcaption := T("figcaption"))
    figcaption.append(details := T("details"))
    details.append(summary := T("summary"))
  • represent the tables dtypes as definition list
  • identifiers can point back to thead for enrichement
    figure.append(dl := T("dl"))

a sketch of a semantically meaningful dataframe table.

    print(figure.prettify())
<figure>
 <table>
  <caption>
   <label>
   </label>
  </caption>
  <colgroup>
  </colgroup>
  <thead>
  </thead>
  <tbody>
  </tbody>
  <tfoot>
  </tfoot>
 </table>
 <figcaption>
  <details>
   <summary>
   </summary>
  </details>
 </figcaption>
 <dl>
 </dl>
</figure>

extending this to mermaid¤

in theory a diagram/graph is two dataframes. so we could default to showing a node table and edge table. since we have mermaid we can do better. we'll need to replace the table part of the figure with a svg or image created by mermaid. both svg and image have natural accessibility affordances that should be used.

    diagram = T("figure")
    diagram.append(svg := T("svg"))
    diagram.append(diagramcaption := T("figcaption"))
    diagramcaption.append(deets := T("details"))
    deets.append(summ := T("summary"))

we use the figcaption to share extra metadata about the graph and its properties. we could include nodes, edges, cycles, subgraphs, lots of things; some might be possible to get from mermaid.

    summ.append("A flowchart diagram with 100 nodes and 2 edges")

the mermaid source code would follow the summary, and would only be shown when a user asks for input.

    deets.append(T("code"))
    print(diagram.prettify())
<figure>
 <svg>
 </svg>
 <figcaption>
  <details>
   <summary>
    A flowchart diagram with 100 nodes and 2 edges
   </summary>
   <code>
   </code>
  </details>
 </figcaption>
</figure>