jupyterlite blog integration¤
i've always thought of blog posts as a means, not an ends.
my dream has always been content that myself and others can execute themselves.
often this goal has been hindered by the need for infrastructure.
advances in the jupyterlite have made it possible to realize this vision without infrastructure.
jupyterlite is jupyterlab running completely in the browser without the need for a local server.
this means that folks can redirect from a static post into an interactive version they can try themselves.
jupyterlite notebooks are different¤
when working with a standard jupyter implementation we are interacting with a server within an implicit environment. when we write our notebooks we assume that we can import modules because they are in our environment. however, in jupyterlite we need to explicitly define our dependencies for every notebook.
we define our dependencies by inserting the pip line magic:
%pip install my dependencies
this statement is superfluous under normal circumstances so it doesn't need to exist in the source.
instead we use the depfinder project to infer the projects imported by our notebook.
the inferred dependencies are then inserted in to the first line of the first code cell of the notebook.
the doit lite implementation¤
in the code that follow we define a doit task that:
1. builds a jupyterlite site for this blog
2. make the dependencies compatible with jupyterlite
def task_lite():
"""build the jupyter lite site and append requirements"""
return dict(
actions=[
"jupyter lite build --contents tonyfast --output-dir site/run",
(set_files_imports, (pathlib.Path("site/run/files"),))
],
clean=["rm -rf site/run"]
)
import typing, tonyfast, pathlib, textwrap, re, json
discovering imports with depfinder¤
following sections we'll build the methods for discovering imports with depfinder
set_files_imports iterates through a directory and amends notebooks to work in jupyterlite
def set_files_imports(FILES: typing.Iterable[pathlib.Path]=(
FILES := (WHERE := pathlib.Path(tonyfast.__file__).parent.parent) / "site/run/files"
)) -> None:
for file in FILES.rglob("*.ipynb"): set_file_imports(file)
get_imports finds the imports in each cell
def get_imports(cell: dict, pidgy=False) -> set:
import depfinder
__import__("requests_cache").install_cache()
source = "".join(cell["source"])
if pidgy:
source = tangle.render(source)
source = textwrap.dedent(source)
try:
found = depfinder.inspection.get_imported_libs(source)
return found.required_modules.union(found.sketchy_modules)
except BaseException as e:
return
get_deps transforms inputs to dependencies.
some dependencies may require extra features to work in jupyterlite and they are appended here.
mapping = dict(bs4="beautifulsoup4")
def get_deps(deps: set) -> set:
if "requests" in deps: deps.add("pyodide-http")
if "pandas" in deps: deps.add("jinja2")
return {
mapping.get(x, x) for x in deps
if not x.startswith("_") or x not in {"tonyfast"}
}
handling pidgy documents¤
some documents might use [pidgy] syntax that need to be dealt with.
PIDGY = re.compile("^\s*%(re)?load_ext\s*pidgy")
from midgy import Python; tangle = Python()
def has_pidgy(nb: dict):
yes = False
for _, cell in iter_code_cells(nb):
yes = yes or PIDGY.match("".join(cell["source"])) and True
return yes
updating the jupyterlite notebooks¤
these methods are meant to operate on the contents of a jupyterlite not the raw notebooks.
set_file_imports operates in one file discovers dependencies and writes code back to the source.
def set_file_imports(file: pathlib.Path) -> None:
data = json.loads(file.read_text())
deps, first = set(), None
pidgy = has_pidgy(data)
for no, cell in iter_code_cells(data):
if first is None:
first = no
if pidgy:
data["cells"][no]["metadata"].setdefault("jupyter", {})["source_hidden"] = True
deps.update(get_imports(cell, pidgy) or [])
deps = get_deps(deps)
if pidgy:
deps.add("pidgy")
if deps and (first is not None):
cell = data["cells"][first]
was_str = isinstance(cell["source"], str)
if was_str:
cell["source"] = cell["source"].splitlines(1)
for i, line in enumerate(list(cell["source"])):
if (left := line.lstrip()):
if left.startswith(("%pip install",)):
break
indent = len(line) - len(left)
if "pyodide-http" in deps:
data["cells"][first]["source"].insert(i, " "*indent + "__import__('pyodide_http').patch_all()\n")
data["cells"][first]["source"].insert(i, " "*indent + "%pip install " + " ".join(deps) +"\n")
print(F"writing {len(set(deps))} pip requirements to {file}")
file.write_text(json.dumps(data, indent=2))
break
else:
print(F'no deps for {file}')
set_files_imports sets the dependencies for a lot of files.
def set_files_imports(FILES: typing.Iterable[pathlib.Path]=FILES):
for file in FILES.rglob("*.ipynb"):
set_file_imports(file)
iter_code_cells iterates through just the code cells.
def iter_code_cells(nb: dict) -> typing.Iterator[tuple[int, dict]]:
for i, cell in enumerate(nb["cells"]):
if cell["cell_type"] == "code":
yield i, cell
usage¤
- from the
tonyfastmodule, requires deps
if (I := '__file__' not in locals()):
!python -m tonyfast tasks info lite
- from post with
importnb
if (I := '__file__' not in locals()):
!importnb -t 2022-12-21-lite-build.ipynb list
- run this task from
hatchin the root of the project. thehatchenvironment has all the necessary dependencies defined.hatch run lite:build