Skip to content

transforming mast notebooks into nbconvert templates.¤

this is rough, first pass of an end-to-end sketch of the MAST notebooks using dataframes and nbconvert a11y templates.

to build docs or a book from nbconvert templates we need to either shim into mkdocs or build our own thing. sphinx can't because of how docutils is used.

its just easier to build our own thing. hubris, eh? lets find out.

    import nbconvert, nbformat, io, midgy
    from toolz.curried import *
    from nobook import *
    from IPython.display import *
%%
start with a string mapping the github repository to disc

    s = Series({(repo := 
https://github.com/spacetelescope/mast_notebooks

    ): "mast_notebooks"}).path()
    if not s.path.exists().any():
the whole repository is pretty tedious to clone. lots of pictures? do notebooks suck that bad over a long time?
better to use the `--depth` arg for control

        !git clone $repo --depth 1

start with a string mapping the github repository to disc

s = Series({(repo :=

https://github.com/spacetelescope/mast_notebooks

): "mast_notebooks"}).path()
if not s.path.exists().any():

the whole repository is pretty tedious to clone. lots of pictures? do notebooks suck that bad over a long time? better to use the --depth arg for control

    !git clone $repo --depth 1
%%
this example reuses the mast notebook table of contents for jupyter book. on a mkdocs site we could do the same with mkdocs.yml.

    toc = (
        await Index(["mast_notebooks/_toc.yml"], name="path").apath().apath.load()
    ).aseries()

this example reuses the mast notebook table of contents for jupyter book. on a mkdocs site we could do the same with mkdocs.yml.

toc = (
    await Index(["mast_notebooks/_toc.yml"], name="path").apath().apath.load()
).aseries()
    config = (
        await Index(["mast_notebooks/_config.yml"], name="path").apath().apath.load()
    ).aseries().T.iloc[:,0]
    chapters = toc.parts.enumerate("chapter").series()
    sections = chapters.chapters.enumerate("section").series()   
    files = sections.sections.dropna().enumerate("section").series().combine_first(
        sections[["file"]].set_index(Index([0]*len(sections), name="section"), append=True)
    )
%%
explode the chapters, sections, files, and ultimately discover the paths of the notebooks in the documents.

    chapters = toc.parts.enumerate("chapter").series()
    sections = chapters.chapters.enumerate("section").aseries()   
    files = sections.sections.dropna().enumerate("subsection").aseries().combine_first(
        sections[["file"]].set_index(Index([0]*len(sections), name="subsection"), append=True)
    )
    paths = ("mast_notebooks" / files.file.apath())
    print(F"{(~paths.path().path.exists()).sum()} files missing")
    paths = paths[await paths.apath().apath.exists()].pipe(Index)
    files

explode the chapters, sections, files, and ultimately discover the paths of the notebooks in the documents.

chapters = toc.parts.enumerate("chapter").series()
sections = chapters.chapters.enumerate("section").aseries()   
files = sections.sections.dropna().enumerate("subsection").aseries().combine_first(
    sections[["file"]].set_index(Index([0]*len(sections), name="subsection"), append=True)
)
paths = ("mast_notebooks" / files.file.apath())
print(F"{(~paths.path().path.exists()).sum()} files missing")
paths = paths[await paths.apath().apath.exists()].pipe(Index)
files
4 files missing
file
path chapter section subsection
mast_notebooks/_toc.yml 0 0 0 notebooks/astrocut/making_tess_cubes_and_cutou...
1 0 0 notebooks/astroquery/intro.md
1 0 notebooks/astroquery/beginner_search/beginner_...
1 notebooks/astroquery/beginner_zcut/beginner_zc...
2 notebooks/astroquery/large_downloads/large_dow...
... ... ... ...
10 1 2 notebooks/TESS/asteroid_rotation/asteroid_rota...
2 0 notebooks/TESS/interm_tesscut_dss_overlay/inte...
1 notebooks/TESS/interm_tesscut_requests/interm_...
3 0 notebooks/TESS/interm_tess_prf_retrieve/interm...
1 notebooks/TESS/removing_scattered_light_using_...

65 rows × 1 columns

gathering and executing notebooks¤

    notebooks = (await paths.apath().apath.load())

filter notebooks to and currently ignore the markdown files in the mix. the markdown files can be represents as a notebook with a markdown cell.

    import nbformat, nbclient
    notebooks = notebooks[notebooks.index.path.suffix.eq(".ipynb")].apply(
        nbformat.from_dict
    ).to_frame("nb")
%%
## dependencies

i wanted to see if could build an environment that all these notebooks could run in.
we collect the dependencies from the requirements.txt files in the mast notebook directory.
we structure the dependencies using a regular expresssion so we can extract verion information too. 

    dependencies = await (await (
        Index(["mast_notebooks/"]).apath().apath.rglob("requirements.txt")
    )).pipe(Index).apath.read_text()
    versions = dependencies.apply(str.splitlines).explode().str.extract(
        "^(?P<package>[a-z|A-Z|_|-|0-9]+)\s*(?P<constraint>[\&gt;|\&lt;|=]*)?\s*(?P<version>\S*)?"
    )

dependencies¤

i wanted to see if could build an environment that all these notebooks could run in. we collect the dependencies from the requirements.txt files in the mast notebook directory. we structure the dependencies using a regular expresssion so we can extract verion information too.

dependencies = await (await (
    Index(["mast_notebooks/"]).apath().apath.rglob("requirements.txt")
)).pipe(Index).apath.read_text()
versions = dependencies.apply(str.splitlines).explode().str.extract(
    "^(?P<package>[a-z|A-Z|_|-|0-9]+)\s*(?P<constraint>[\&gt;|\&lt;|=]*)?\s*(?P<version>\S*)?"
)
%%
create an environment.yml file from the verions information previously collected

    import yaml; from pathlib import Path
    deps = versions.package.dropna().drop_duplicates().tolist()
    deps = [{"git": "GitPython"}.get(x,x) for x in deps ]
    Path("environment.yml").write_text(yaml.safe_dump(dict(
        name="mast_notebooks",
        channels=["conda-forge"],
        dependencies=["pip", dict(
            pip=deps+ ["ipykernel", "astrocut"]
        )]
    )))

uncomment the code below to create or update the environment the environment

```bash
mamba env create -p.mast_nb -f environment.yml
mamba env update -p.mast_nb -f environment.yml
```
create a kernel for the environment to run the notebooks in

```bash
mamba run  -p.mast_nb python -m ipykernel install --user --name mast_nb
```

create an environment.yml file from the verions information previously collected

import yaml; from pathlib import Path
deps = versions.package.dropna().drop_duplicates().tolist()
deps = [{"git": "GitPython"}.get(x,x) for x in deps ]
Path("environment.yml").write_text(yaml.safe_dump(dict(
    name="mast_notebooks",
    channels=["conda-forge"],
    dependencies=["pip", dict(
        pip=deps+ ["ipykernel", "astrocut"]
    )]
)))

uncomment the code below to create or update the environment the environment

mamba env create -p.mast_nb -f environment.yml
mamba env update -p.mast_nb -f environment.yml
create a kernel for the environment to run the notebooks in

mamba run  -p.mast_nb python -m ipykernel install --user --name mast_nb
%%
### executing the notebooks

execute the notebooks using the `nbclient` library

    client = notebooks.nb.apply(nbformat.from_dict).apply(
        nbclient.NotebookClient, kernel_name="mast_nb", allow_errors=True
    )

recombine the executed notebooks with the original ones.

    notebooks.nb = (
        await client.head(3).apply(nbclient.NotebookClient.async_execute).gather()
    ).combine_first(notebooks.nb)

executing the notebooks¤

execute the notebooks using the nbclient library

client = notebooks.nb.apply(nbformat.from_dict).apply(
    nbclient.NotebookClient, kernel_name="mast_nb", allow_errors=True
)

recombine the executed notebooks with the original ones.

notebooks.nb = (
    await client.head(3).apply(nbclient.NotebookClient.async_execute).gather()
).combine_first(notebooks.nb)
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
%%
merge our default async jinja exporter with the legacy nbconvert importer. the async template is a massive improvement of the sync nbconvert.


    exporter = nbconvert.get_exporter("a11y")()
generate the default `resources` object for templating the notebook

    _, resources = exporter.from_notebook_node(nbformat.reads("""{"cells": [], "metadata": {}}""", 4))    
    ours,theirs = Series().template.environment,exporter.environment
    ours.loader = theirs.loader
    ours.filters.update({**theirs.filters, **ours.filters})
    ours.globals.update({**theirs.globals, **ours.globals})

merge our default async jinja exporter with the legacy nbconvert importer. the async template is a massive improvement of the sync nbconvert.

exporter = nbconvert.get_exporter("a11y")()

generate the default resources object for templating the notebook

_, resources = exporter.from_notebook_node(nbformat.reads("""{"cells": [], "metadata": {}}""", 4))    
ours,theirs = Series().template.environment,exporter.environment
ours.loader = theirs.loader
ours.filters.update({**theirs.filters, **ours.filters})
ours.globals.update({**theirs.globals, **ours.globals})
%%
create the footer for all of the pages. currently there is no specific license identified, 
but if there were we should use the [license microformat](https://microformats.org/wiki/rel-license).

    footer = ours.filters["markdown"](F"""
By {config.author}

© Copyright {config.copyright}

    """)
    footer

create the footer for all of the pages. currently there is no specific license identified, but if there were we should use the license microformat.

footer = ours.filters["markdown"](F"""

By {config.author}

© Copyright {config.copyright}

""")
footer
'<p>By STScI</p>\n<p>© Copyright 2022-2024</p>\n'
%%
add site navigation in the post processing step
previous and next still needs to be added previous and next is relative.
we'll want the config to do this work too. it needs to be passed to the template to construct license and link information

add site navigation in the post processing step previous and next still needs to be added previous and next is relative. we'll want the config to do this work too. it needs to be passed to the template to construct license and link information

    htmls = (
        await notebooks.head().template.render_template(
            "a11y/table.html.j2", resources=resources, config=config,
            footer=footer
        )
    ).apply(exporter.post_process_html)
    htmls.to_frame().T
file mast_notebooks/notebooks/astrocut/making_tess_cubes_and_cutouts/making_tess_cubes_and_cutouts.ipynb mast_notebooks/notebooks/astroquery/beginner_search/beginner_search.ipynb mast_notebooks/notebooks/astroquery/beginner_zcut/beginner_zcut.ipynb mast_notebooks/notebooks/astroquery/large_downloads/large_downloads.ipynb mast_notebooks/notebooks/astroquery/historic_quasar_observations/historic_quasar_observations.ipynb
0 <!DOCTYPE html>\n<html lang="en">\n <head>\n ... <!DOCTYPE html>\n<html lang="en">\n <head>\n ... <!DOCTYPE html>\n<html lang="en">\n <head>\n ... <!DOCTYPE html>\n<html lang="en">\n <head>\n ... <!DOCTYPE html>\n<html lang="en">\n <head>\n ...

view the generated documentation as inline iframes

    iframe = """<iframe height="600" srcdoc="{}" width="100%"></iframe>"""
    iframes = htmls.apply(compose_left(__import__("html").escape, iframe.format))
    display(*iframes.apply(HTML))

closing thoughts¤

these notebooks have unexecuted code and nbconvert-a11y had to updated to handle that case. cf https://github.com/deathbeds/nbconvert-a11y/issues/33

currently i cannot handle site navigation~~ and licensing~~, but that is on the roadmap. (we want to think about them accessibly). i'd like to use the jupyter toc as the toc to generate these templates. this format for the documentation is a lot more flexible to modify than standard documentation systems. we are dealing with our documentation as an intermediate table to offers introspection and manipulation.

search is a last thing to implement are the colab and binder links even valid? we'll need templates to handle myst admonitions