transforming mast notebooks into nbconvert templates.¤
this is rough, first pass of an end-to-end sketch of the MAST notebooks using dataframes and nbconvert a11y templates.
to build docs or a book from nbconvert templates we need to either shim into mkdocs
or build our own thing. sphinx can't because of how docutils
is used.
its just easier to build our own thing. hubris, eh? lets find out.
import nbconvert, nbformat, io, midgy
from toolz.curried import *
from nobook import *
from IPython.display import *
%%
start with a string mapping the github repository to disc
s = Series({(repo :=
https://github.com/spacetelescope/mast_notebooks
): "mast_notebooks"}).path()
if not s.path.exists().any():
the whole repository is pretty tedious to clone. lots of pictures? do notebooks suck that bad over a long time?
better to use the `--depth` arg for control
!git clone $repo --depth 1
start with a string mapping the github repository to disc
s = Series({(repo :=
https://github.com/spacetelescope/mast_notebooks
): "mast_notebooks"}).path()
if not s.path.exists().any():
the whole repository is pretty tedious to clone. lots of pictures? do notebooks suck that bad over a long time?
better to use the --depth
arg for control
!git clone $repo --depth 1
%%
this example reuses the mast notebook table of contents for jupyter book. on a mkdocs site we could do the same with mkdocs.yml.
toc = (
await Index(["mast_notebooks/_toc.yml"], name="path").apath().apath.load()
).aseries()
this example reuses the mast notebook table of contents for jupyter book. on a mkdocs site we could do the same with mkdocs.yml.
toc = (
await Index(["mast_notebooks/_toc.yml"], name="path").apath().apath.load()
).aseries()
config = (
await Index(["mast_notebooks/_config.yml"], name="path").apath().apath.load()
).aseries().T.iloc[:,0]
chapters = toc.parts.enumerate("chapter").series()
sections = chapters.chapters.enumerate("section").series()
files = sections.sections.dropna().enumerate("section").series().combine_first(
sections[["file"]].set_index(Index([0]*len(sections), name="section"), append=True)
)
%%
explode the chapters, sections, files, and ultimately discover the paths of the notebooks in the documents.
chapters = toc.parts.enumerate("chapter").series()
sections = chapters.chapters.enumerate("section").aseries()
files = sections.sections.dropna().enumerate("subsection").aseries().combine_first(
sections[["file"]].set_index(Index([0]*len(sections), name="subsection"), append=True)
)
paths = ("mast_notebooks" / files.file.apath())
print(F"{(~paths.path().path.exists()).sum()} files missing")
paths = paths[await paths.apath().apath.exists()].pipe(Index)
files
explode the chapters, sections, files, and ultimately discover the paths of the notebooks in the documents.
chapters = toc.parts.enumerate("chapter").series()
sections = chapters.chapters.enumerate("section").aseries()
files = sections.sections.dropna().enumerate("subsection").aseries().combine_first(
sections[["file"]].set_index(Index([0]*len(sections), name="subsection"), append=True)
)
paths = ("mast_notebooks" / files.file.apath())
print(F"{(~paths.path().path.exists()).sum()} files missing")
paths = paths[await paths.apath().apath.exists()].pipe(Index)
files
gathering and executing notebooks¤
notebooks = (await paths.apath().apath.load())
filter notebooks to and currently ignore the markdown files in the mix. the markdown files can be represents as a notebook with a markdown cell.
import nbformat, nbclient
notebooks = notebooks[notebooks.index.path.suffix.eq(".ipynb")].apply(
nbformat.from_dict
).to_frame("nb")
%%
## dependencies
i wanted to see if could build an environment that all these notebooks could run in.
we collect the dependencies from the requirements.txt files in the mast notebook directory.
we structure the dependencies using a regular expresssion so we can extract verion information too.
dependencies = await (await (
Index(["mast_notebooks/"]).apath().apath.rglob("requirements.txt")
)).pipe(Index).apath.read_text()
versions = dependencies.apply(str.splitlines).explode().str.extract(
"^(?P<package>[a-z|A-Z|_|-|0-9]+)\s*(?P<constraint>[\>|\<|=]*)?\s*(?P<version>\S*)?"
)
dependencies¤
i wanted to see if could build an environment that all these notebooks could run in. we collect the dependencies from the requirements.txt files in the mast notebook directory. we structure the dependencies using a regular expresssion so we can extract verion information too.
dependencies = await (await (
Index(["mast_notebooks/"]).apath().apath.rglob("requirements.txt")
)).pipe(Index).apath.read_text()
versions = dependencies.apply(str.splitlines).explode().str.extract(
"^(?P<package>[a-z|A-Z|_|-|0-9]+)\s*(?P<constraint>[\>|\<|=]*)?\s*(?P<version>\S*)?"
)
%%
create an environment.yml file from the verions information previously collected
import yaml; from pathlib import Path
deps = versions.package.dropna().drop_duplicates().tolist()
deps = [{"git": "GitPython"}.get(x,x) for x in deps ]
Path("environment.yml").write_text(yaml.safe_dump(dict(
name="mast_notebooks",
channels=["conda-forge"],
dependencies=["pip", dict(
pip=deps+ ["ipykernel", "astrocut"]
)]
)))
uncomment the code below to create or update the environment the environment
```bash
mamba env create -p.mast_nb -f environment.yml
mamba env update -p.mast_nb -f environment.yml
```
create a kernel for the environment to run the notebooks in
```bash
mamba run -p.mast_nb python -m ipykernel install --user --name mast_nb
```
create an environment.yml file from the verions information previously collected
import yaml; from pathlib import Path
deps = versions.package.dropna().drop_duplicates().tolist()
deps = [{"git": "GitPython"}.get(x,x) for x in deps ]
Path("environment.yml").write_text(yaml.safe_dump(dict(
name="mast_notebooks",
channels=["conda-forge"],
dependencies=["pip", dict(
pip=deps+ ["ipykernel", "astrocut"]
)]
)))
uncomment the code below to create or update the environment the environment
mamba env create -p.mast_nb -f environment.yml
mamba env update -p.mast_nb -f environment.yml
mamba run -p.mast_nb python -m ipykernel install --user --name mast_nb
%%
### executing the notebooks
execute the notebooks using the `nbclient` library
client = notebooks.nb.apply(nbformat.from_dict).apply(
nbclient.NotebookClient, kernel_name="mast_nb", allow_errors=True
)
recombine the executed notebooks with the original ones.
notebooks.nb = (
await client.head(3).apply(nbclient.NotebookClient.async_execute).gather()
).combine_first(notebooks.nb)
executing the notebooks¤
execute the notebooks using the nbclient
library
client = notebooks.nb.apply(nbformat.from_dict).apply(
nbclient.NotebookClient, kernel_name="mast_nb", allow_errors=True
)
recombine the executed notebooks with the original ones.
notebooks.nb = (
await client.head(3).apply(nbclient.NotebookClient.async_execute).gather()
).combine_first(notebooks.nb)
%%
merge our default async jinja exporter with the legacy nbconvert importer. the async template is a massive improvement of the sync nbconvert.
exporter = nbconvert.get_exporter("a11y")()
generate the default `resources` object for templating the notebook
_, resources = exporter.from_notebook_node(nbformat.reads("""{"cells": [], "metadata": {}}""", 4))
ours,theirs = Series().template.environment,exporter.environment
ours.loader = theirs.loader
ours.filters.update({**theirs.filters, **ours.filters})
ours.globals.update({**theirs.globals, **ours.globals})
merge our default async jinja exporter with the legacy nbconvert importer. the async template is a massive improvement of the sync nbconvert.
exporter = nbconvert.get_exporter("a11y")()
generate the default resources
object for templating the notebook
_, resources = exporter.from_notebook_node(nbformat.reads("""{"cells": [], "metadata": {}}""", 4))
ours,theirs = Series().template.environment,exporter.environment
ours.loader = theirs.loader
ours.filters.update({**theirs.filters, **ours.filters})
ours.globals.update({**theirs.globals, **ours.globals})
%%
create the footer for all of the pages. currently there is no specific license identified,
but if there were we should use the [license microformat](https://microformats.org/wiki/rel-license).
footer = ours.filters["markdown"](F"""
By {config.author}
© Copyright {config.copyright}
""")
footer
create the footer for all of the pages. currently there is no specific license identified, but if there were we should use the license microformat.
footer = ours.filters["markdown"](F"""
By {config.author}
© Copyright {config.copyright}
""")
footer
%%
add site navigation in the post processing step
previous and next still needs to be added previous and next is relative.
we'll want the config to do this work too. it needs to be passed to the template to construct license and link information
add site navigation in the post processing step previous and next still needs to be added previous and next is relative. we'll want the config to do this work too. it needs to be passed to the template to construct license and link information
htmls = (
await notebooks.head().template.render_template(
"a11y/table.html.j2", resources=resources, config=config,
footer=footer
)
).apply(exporter.post_process_html)
htmls.to_frame().T
view the generated documentation as inline iframes
iframe = """<iframe height="600" srcdoc="{}" width="100%"></iframe>"""
iframes = htmls.apply(compose_left(__import__("html").escape, iframe.format))
display(*iframes.apply(HTML))
closing thoughts¤
these notebooks have unexecuted code and nbconvert-a11y
had to updated to handle that case. cf https://github.com/deathbeds/nbconvert-a11y/issues/33
currently i cannot handle site navigation~~ and licensing~~, but that is on the roadmap. (we want to think about them accessibly). i'd like to use the jupyter toc as the toc to generate these templates. this format for the documentation is a lot more flexible to modify than standard documentation systems. we are dealing with our documentation as an intermediate table to offers introspection and manipulation.
search is a last thing to implement are the colab and binder links even valid? we'll need templates to handle myst admonitions