Skip to content

automating blog posts to work with jupyter-lite¤

there is a rare occasion that i write notebooks completely in lite. most often i write in a conventional environment then need to ammend the content to work when we are in jupyterlite.

%reload_ext pidgy

what do we need to do to make a post work in lite?¤

  • explicitly defined dependencies.

    working on a virtual file system is different than your standard file system. normally we don't have to define our environment each time, but independent of a real file system - in the browser - we need to install packages each time.

  • patching shit

    if we use requests then we should used https://github.com/koenvo/pyodide-http

  • dealing with pidgy and extensions.

  • some packages won't work in lite so we will throw a warning when we know this fo

    we can infer this information or provide it explicitly in the metadata

sometimes i forget imports

depfinder to find packages in a notebook for python¤

some of my personal style choices might fail like when i use __import__, maybe this is a way to cut dependencies from the list.

from pathlib import Path
import depfinder, pandas
__import__("requests_cache").install_cache()
def get_files(dir="", glob="*.ipynb") -> pandas.Index:
    return pandas.Index(Path(dir).rglob(glob)).rename("files")
def get_cells(files: pandas.Index) -> pandas.DataFrame:
    df = (
        files.to_series().apply(Path.read_text)
        .apply(json.loads).apply(pandas.Series)
        .cells.apply(pandas.Series).stack().apply(pandas.Series)
    )
    return df.join(get_has_pidgy(df))

can haz pidgy?¤

some of these posts are in pidgy, i'll use %reload_ext pidgy when that is the situation. peek in the cells to find pidgy notebooks.

def get_has_pidgy(cells):
    return cells[cells.cell_type.eq("code")].source.apply("".join).groupby(
        pandas.Grouper(level=0)
    ).apply(lambda df: df.str.contains("%[re]*load_ext pidgy").any()).rename("pidgy")
cells = get_cells(get_files())

get the imports¤

def get_import(row: pandas.Series) -> dict:

get_import normalizes the cell source code for analysis by depfinder. this method catches those situations or returns the attributes of depfinder.inspection.ImportFinder

    source = "".join(row.source)
    if row.pidgy:
        source = midgy.python.Python().render(source)
    try:
        return vars(depfinder.inspection.get_imported_libs(textwrap.dedent(source), row.name[0]))
    except BaseException as e:
        return None

evaluate the sources¤

import depfinder, pandas, midgy
__import__("requests_cache").install_cache()
Ø = __name__ == "__main__" and "__file__" not in locals()
def get_modules(cells):
    return (
        (
            results:=
            cells[cells.cell_type.eq("code")].apply(get_import, axis=1)
            .dropna().apply(functools.partial(pandas.Series, dtype="O"))
        )[results.columns[results.columns.str.endswith("_modules")]]
    )

a snapshot of the modules import within the content

pandas pathlib requests tonyfast IPython json midgy functools toolz re depfinder shlex importnb dataclasses typer pidgy nbconvert ipywidgets info textwrap typing traitlets doit dask pluggy sys requests_cache jsonref playwright nbformat inspect types ast uritemplate __ jinja2 numpy io gc pyld orjson jsonpointer urllib.parse linecache operator poser matplotlib rich mpl_toolkits importlib anyio unittest.mock __static_notebook_tags nbconvert_html5 bs4 mkdocs icalendar yaml hvplot jupyter_core shutil arrow __11_12_async_import doctest unittest __09_pyproject_analysis html tomli pytest __12_09_pyproject_analysis warnings _022_10_21_markdown_future a_little_markdown_program click markdown sympy __better_dask_shape asyncio dis traceback micropip __pycache__ ibis duckdb pyarrow nest_asyncio collections abc tqdm
0 41 30 14 14 14 13 12 12 11 10 10 7 7 7 7 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
if Ø:
    (cells := get_cells(get_files()))
    (cells := cells.join(get_modules(cells)))
    modules = cells[cells.columns[cells.columns.str.endswith("_modules")]]
    modules = modules.stack().apply(list).apply(pandas.Series, dtype="O").stack()
    return HTML(modules.value_counts().to_frame().T.to_html())

todo¤

  • inject the imports back into the notebooks. where though?
  • find magics