pre-formatted html representations of code, in browse or focus mode, ignore the semantic of the source language.
this leaves readers consuming preformatted text splattered with some color.
screen reader users with a stream of unstructured text despite the fact that programming languages
have richer semantics.
# accessibile semantics for python code
pre-formatted html representations of code, in browse or focus mode, ignore the semantic of the source language.
this leaves readers consuming preformatted text splattered with some color.
screen reader users with a stream of unstructured text despite the fact that programming languages
have richer semantics.
in this document, we consider a semantically meaningful representation of python code (this is not a general approach)
that aligns the structure of the annotation object model closer to the semantics of python programming language.
some flexible changes we'll propose are:
*function and class definition blocks are landmarks
*function and class definitions are headings
*top-level expressions/comments are grouped
in this document, we imagine an annotation object model for code that provides landmarks, headings, and other aria semantics.
pre-formatted html representations of code, in browse or focus mode, ignore the semantic of the source language.
this leaves readers consuming preformatted text splattered with some color.
screen reader users with a stream of unstructured text despite the fact that programming languages
have richer semantics.
in this document, we consider a semantically meaningful representation of python code (this is not a general approach)
that aligns the structure of the annotation object model closer to the semantics of python programming language.
some flexible changes we'll propose are:
function and class definition blocks are landmarks
function and class definitions are headings
top-level expressions/comments are grouped
in this document, we imagine an annotation object model for code that provides landmarks, headings, and other aria semantics.
## synthesizing multiple representations.`pygments` is the primary way we display syntax highlighted code, it is a language agnostic tool.
to achieve our goals of a semantically meaningful code structure we need more details than `pygments` lexical analysis provides.
our semantic solution merges three streams of tokenized python source:
1. the `pygments` tokens provides html classes with reusable style sheets
2. the `ast` module provides nesting information of expressions
3.`tokenize` is used to discover comments in the python code because the `ast` module ignores them.
pygments
is the primary way we display syntax highlighted code, it is a language agnostic tool.
to achieve our goals of a semantically meaningful code structure we need more details than
pygments
lexical analysis provides.
our semantic solution merges three streams of tokenized python source:
the
pygments
tokens provides html classes with reusable style sheets
the
ast
module provides nesting information of expressions
tokenize
is used to discover comments in the python code because the
ast
module ignores them.
## capturing structure from the `ast` and `tokenize`
we use the `ast` module to capture the block nature of the python source code.
we synthesize the block line numbers with the `tokenize` tokens to capture comments, `ast` does not capture comments.
the line number and tag attributes are yielded for the regions of interest (eg expression and comment blocks) sorted by line number.
we use the
ast
module to capture the block nature of the python source code.
we synthesize the block line numbers with the
tokenize
tokens to capture comments,
ast
does not capture comments.
the line number and tag attributes are yielded for the regions of interest (eg expression and comment blocks) sorted by line number.
defget_sorted_regions(source:str)->Iterator[tuple[int,dict]]:nodes:ast.AST=ast.parse(source)nested=[]fori,sinsorted(itertools.chain(get_limits_from_ast(nodes),get_comments_from_tokenize(source)),key=lambdax:(x[0],notbool(x[1]))):ifsisnotNone:ifisinstance(s,str):nested.append(s)dots=".".join(filter(bool,nested))yieldi,dict(id=dots,role="region",**{"aria-label":dots})elifisinstance(s,ast.AST):yieldi,dict(id=F"{type(s).__name__}-L{s.lineno}",role="group",**{"aria-label":F"{type(s).__name__} Line {s.lineno}"})nested.append(None)elifisinstance(s,list):yieldi,dict(**{"aria-label":F"Comment Line {s[0].start[0]}"})nested.append(None)else:nestedandnested.pop()yieldi,None
### capturing nesting structure of the semantics with the `ast` module
the `ast` module allows us to capture the line numbers encapsulating expressions, functions and classes.
to add structure the semantics:
1. the top level expressions and statements in the module are grouped
2. all functions and classes are grouped
### capturing comments with `tokenize`
comments are effectively paragraphs in code. they should be more readable and specifically demarcated as non-code.
## custom `pygments` formatter`pygments` drives the translation of source code to html. our custom renderer merges the `pygments`, `ast`, and `tokenize` streams together. the outcome is a semantically meaningfully representation of the code source.
pygments
drives the translation of source code to html. our custom renderer merges the
pygments
,
ast
, and
tokenize
streams together. the outcome is a semantically meaningfully representation of the code source.
### more semantics by post processing the html
out of covenience, we add a post processing step to modify the highlighted html. these changes make:
*functions and classes headings
*functions and classes declarations links
the headings and links will now be included redundantly in screen reader navigation.
our `sample` source is randomly taken as the source of the `tokenize` module. _if this notebook is live then you can change the source._
we include accessible `pygments` themes extended from eric bailey's accessible theme palettes.
3
our
sample
source is randomly taken as the source of the
tokenize
module.
if this notebook is live then you can change the source.
we include accessible
pygments
themes extended from eric bailey's accessible theme palettes.
sample=inspect.getsource(tokenize)formatter=Html(style="github-light-high-contrast")page=post_highlight(pygments.highlight(sample,pygments.lexers.get_lexer_by_name("python"),formatter))HTML(F"""<details><summary>expand this to see the highlighted code </summary>{page}</details>""")
## a page/document of code
all of this can be combined into a complete page of that treats code as an accessible document.
we can event include a heading for navigation.
### flexible configuration
the implementation of semantic structure for html should be flexible. some settings that make sense for pure code documents would not apply for notebooks. for example, cell landmarks would be preferred to expressions level landmarks. a rough configuration of the code semantis would abide the schema below.
the implementation of semantic structure for html should be flexible. some settings that make sense for pure code documents would not apply for notebooks. for example, cell landmarks would be preferred to expressions level landmarks. a rough configuration of the code semantis would abide the schema below.
frompydanticimportBaseModel,FieldclassSettings(BaseModel):all_expressions_are_grouped:bool=Field(True,description="each top level expression or statement is grouped")functions_and_classes_are_headings:bool=Field(True,description="function and classes in code blocks are treated as headings")functions_and_classes_are_landmarks:bool=Field(True,description="function and classes in code blocks are treated as headings")functions_and_classes_are_labelled:bool=Field(True,description="function and classes in code blocks are treated as headings")JSON(Settings.schema(),root="Semantic code settings")
## conclusions*all of this is hand wavy bullshit cause i'm the only disabled person to test this and i'm not an experienced screen reader user.
*some structure is better, too much structure would be bad.
*reading code is a more practical literacy than writing code. assistive technology users should have an easier time reading code.
*for long code documents, line numbers are challenging to navigate with a screen reader. better semantics can improve code navigation.
*the visual structure of the annotation object model is more navigable.
*audibly, this is better for me when testing notebooks on a screen reader. code cells with more than 5 lines of code are garbled and unstructured. more structure and interactive elements can improve the comprehension of coded elements.
all of this is hand wavy bullshit cause i'm the only disabled person to test this and i'm not an experienced screen reader user.
some structure is better, too much structure would be bad.
reading code is a more practical literacy than writing code. assistive technology users should have an easier time reading code.
for long code documents, line numbers are challenging to navigate with a screen reader. better semantics can improve code navigation.
the visual structure of the annotation object model is more navigable.
audibly, this is better for me when testing notebooks on a screen reader. code cells with more than 5 lines of code are garbled and unstructured. more structure and interactive elements can improve the comprehension of coded elements.