skip to main content

@tonyfast s notebooks

site navigation
notebook summary
title
jupyterlite blog integration
description
i've always thought of blog posts as a means, not an ends. my dream has always been content that myself and others can execute themselves. often this goal has been hindered by the need for infrastructure. advances in the jupyterlite have made it possible to realize this vision without infrastructure.
cells
24 total
10 code
state
executed in order
kernel
Python [conda env:root] *
language
python
name
conda-root-py
lines of code
119
outputs
2
table of contents
{"kernelspec": {"display_name": "Python [conda env:root] *", "language": "python", "name": "conda-root-py"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13"}, "title": "jupyterlite blog integration", "description": "i've always thought of blog posts as a means, not an ends.\nmy dream has always been content that myself and others can execute themselves.\noften this goal has been hindered by the need for infrastructure.\nadvances in the jupyterlite have made it possible to realize this vision without infrastructure."}
notebook toolbar
Activate
cell ordering
1

jupyterlite

i've always thought of blog posts as a means, not an ends. my dream has always been content that myself and others can execute themselves. often this goal has been hindered by the need for infrastructure. advances in the jupyterlite have made it possible to realize this vision without infrastructure.

jupyterlite is jupyterlab running completely in the browser without the need for a local server. this means that folks can redirect from a static post into an interactive version they can try themselves.

jupyterlite

when working with a standard jupyter implementation we are interacting with a server within an implicit environment. when we write our notebooks we assume that we can import modules because they are in our environment. however, in jupyterlite we need to explicitly define our dependencies for every notebook.

we define our dependencies by inserting the pip line magic:

%pip install my dependencies

this statement is superfluous under normal circumstances so it doesn't need to exist in the source. instead we use the depfinder project to infer the projects imported by our notebook. the inferred dependencies are then inserted in to the first line of the first code cell of the notebook.

2

the implementation

in the code that follow we define a doit task that:

  1. builds a jupyterlite site for this blog
  2. make the dependencies compatible with jupyterlite
3
    def task_lite():
        """build the jupyter lite site and append requirements"""
        return dict(
            actions=[
                "jupyter lite build --contents tonyfast --output-dir site/run",
                (set_files_imports, (pathlib.Path("site/run/files"),))
            ],
            clean=["rm -rf site/run"]
        )

    import typing, tonyfast, pathlib, textwrap, re, json
4

discovering imports with

following sections we'll build the methods for discovering imports with depfinder

5

set_files_imports iterates through a directory and amends notebooks to work in jupyterlite

6
    def set_files_imports(FILES: typing.Iterable[pathlib.Path]=(
        FILES := (WHERE := pathlib.Path(tonyfast.__file__).parent.parent) / "site/run/files"
    )) -> None:
        for file in FILES.rglob("*.ipynb"):  set_file_imports(file)
7

get_imports finds the imports in each cell

8
    def get_imports(cell: dict, pidgy=False) -> set:
        import depfinder
        __import__("requests_cache").install_cache()
        source = "".join(cell["source"])
        if pidgy:
            source = tangle.render(source)
        source = textwrap.dedent(source)
        try:
            found = depfinder.inspection.get_imported_libs(source)
            return found.required_modules.union(found.sketchy_modules)
        except BaseException as e:
            return
9

get_deps transforms inputs to dependencies.

some dependencies may require extra features to work in jupyterlite and they are appended here.

10
    mapping = dict(bs4="beautifulsoup4")
    def get_deps(deps: set) -> set:
        if "requests" in deps: deps.add("pyodide-http")
        if "pandas" in deps: deps.add("jinja2")
        return {
            mapping.get(x, x) for x in deps 
            if not x.startswith("_") or x not in {"tonyfast"}
        }
11

handling documents

some documents might use [pidgy] syntax that need to be dealt with.

12
    PIDGY = re.compile("^\s*%(re)?load_ext\s*pidgy")
    from midgy import Python; tangle = Python()
    def has_pidgy(nb: dict):
        yes = False
        for _, cell in iter_code_cells(nb):
            yes = yes or PIDGY.match("".join(cell["source"])) and True
        return yes
13

updating the notebooks

these methods are meant to operate on the contents of a jupyterlite not the raw notebooks.

14

set_file_imports operates in one file discovers dependencies and writes code back to the source.

15
    def set_file_imports(file: pathlib.Path) -> None:
        data = json.loads(file.read_text())
        deps, first = set(), None
        pidgy = has_pidgy(data)
        for no, cell in iter_code_cells(data):
            if first is None:
                first = no
            if pidgy:
                data["cells"][no]["metadata"].setdefault("jupyter", {})["source_hidden"] = True
            deps.update(get_imports(cell, pidgy) or [])
            
        deps = get_deps(deps)
        if pidgy:
            deps.add("pidgy")
        if deps and (first is not None):
            cell = data["cells"][first]
            was_str = isinstance(cell["source"], str)
            if was_str:
                cell["source"] = cell["source"].splitlines(1)
            for i, line in enumerate(list(cell["source"])):
                if (left := line.lstrip()):
                    if left.startswith(("%pip install",)):
                        break
                    indent = len(line) - len(left)                    
                    if "pyodide-http" in deps:
                        data["cells"][first]["source"].insert(i, " "*indent + "__import__('pyodide_http').patch_all()\n")
                    data["cells"][first]["source"].insert(i, " "*indent + "%pip install " + " ".join(deps) +"\n")
                    print(F"writing {len(set(deps))} pip requirements to {file}")
                    file.write_text(json.dumps(data, indent=2))
                    break
        else:
            print(F'no deps for {file}')
16

set_files_imports sets the dependencies for a lot of files.

17
    def set_files_imports(FILES: typing.Iterable[pathlib.Path]=FILES):
        for file in FILES.rglob("*.ipynb"):
            set_file_imports(file)
            
18

iter_code_cells iterates through just the code cells.

19
    def iter_code_cells(nb: dict) -> typing.Iterator[tuple[int, dict]]:
        for i, cell in enumerate(nb["cells"]):
            if cell["cell_type"] == "code":
                yield i, cell
20

usage

  • from the tonyfast module, requires deps
21
    if (I := '__file__' not in locals()):
        !python -m tonyfast tasks info lite
1 outputs.

lite

build the jupyter lite site and append requirements

status     : run
 * The task has no dependencies.

22
  • from post with importnb
23
    if (I := '__file__' not in locals()):
        !importnb -t 2022-12-21-lite-build.ipynb list
1 outputs.
lite   build the jupyter lite site and append requirements

24
  • run this task from hatch in the root of the project. the hatch environment has all the necessary dependencies defined.

      hatch run lite:build