jupyterlite
blog integration¤
i've always thought of blog posts as a means, not an ends.
my dream has always been content that myself and others can execute themselves.
often this goal has been hindered by the need for infrastructure.
advances in the jupyterlite
have made it possible to realize this vision without infrastructure.
jupyterlite
is jupyterlab running completely in the browser without the need for a local server.
this means that folks can redirect from a static post into an interactive version they can try themselves.
jupyterlite
notebooks are different¤
when working with a standard jupyter implementation we are interacting with a server within an implicit environment. when we write our notebooks we assume that we can import modules because they are in our environment. however, in jupyterlite
we need to explicitly define our dependencies for every notebook.
we define our dependencies by inserting the pip
line magic:
%pip install my dependencies
this statement is superfluous under normal circumstances so it doesn't need to exist in the source.
instead we use the depfinder
project to infer the projects imported by our notebook.
the inferred dependencies are then inserted in to the first line of the first code cell of the notebook.
the doit lite
implementation¤
in the code that follow we define a doit
task that:
1. builds a jupyterlite
site for this blog
2. make the dependencies compatible with jupyterlite
def task_lite():
"""build the jupyter lite site and append requirements"""
return dict(
actions=[
"jupyter lite build --contents tonyfast --output-dir site/run",
(set_files_imports, (pathlib.Path("site/run/files"),))
],
clean=["rm -rf site/run"]
)
import typing, tonyfast, pathlib, textwrap, re, json
discovering imports with depfinder
¤
following sections we'll build the methods for discovering imports with depfinder
set_files_imports
iterates through a directory and amends notebooks to work in jupyterlite
def set_files_imports(FILES: typing.Iterable[pathlib.Path]=(
FILES := (WHERE := pathlib.Path(tonyfast.__file__).parent.parent) / "site/run/files"
)) -> None:
for file in FILES.rglob("*.ipynb"): set_file_imports(file)
get_imports
finds the imports in each cell
def get_imports(cell: dict, pidgy=False) -> set:
import depfinder
__import__("requests_cache").install_cache()
source = "".join(cell["source"])
if pidgy:
source = tangle.render(source)
source = textwrap.dedent(source)
try:
found = depfinder.inspection.get_imported_libs(source)
return found.required_modules.union(found.sketchy_modules)
except BaseException as e:
return
get_deps
transforms inputs to dependencies.
some dependencies may require extra features to work in jupyterlite
and they are appended here.
mapping = dict(bs4="beautifulsoup4")
def get_deps(deps: set) -> set:
if "requests" in deps: deps.add("pyodide-http")
if "pandas" in deps: deps.add("jinja2")
return {
mapping.get(x, x) for x in deps
if not x.startswith("_") or x not in {"tonyfast"}
}
handling pidgy
documents¤
some documents might use [pidgy] syntax that need to be dealt with.
PIDGY = re.compile("^\s*%(re)?load_ext\s*pidgy")
from midgy import Python; tangle = Python()
def has_pidgy(nb: dict):
yes = False
for _, cell in iter_code_cells(nb):
yes = yes or PIDGY.match("".join(cell["source"])) and True
return yes
updating the jupyterlite
notebooks¤
these methods are meant to operate on the contents of a jupyterlite
not the raw notebooks.
set_file_imports
operates in one file discovers dependencies and writes code back to the source.
def set_file_imports(file: pathlib.Path) -> None:
data = json.loads(file.read_text())
deps, first = set(), None
pidgy = has_pidgy(data)
for no, cell in iter_code_cells(data):
if first is None:
first = no
if pidgy:
data["cells"][no]["metadata"].setdefault("jupyter", {})["source_hidden"] = True
deps.update(get_imports(cell, pidgy) or [])
deps = get_deps(deps)
if pidgy:
deps.add("pidgy")
if deps and (first is not None):
cell = data["cells"][first]
was_str = isinstance(cell["source"], str)
if was_str:
cell["source"] = cell["source"].splitlines(1)
for i, line in enumerate(list(cell["source"])):
if (left := line.lstrip()):
if left.startswith(("%pip install",)):
break
indent = len(line) - len(left)
if "pyodide-http" in deps:
data["cells"][first]["source"].insert(i, " "*indent + "__import__('pyodide_http').patch_all()\n")
data["cells"][first]["source"].insert(i, " "*indent + "%pip install " + " ".join(deps) +"\n")
print(F"writing {len(set(deps))} pip requirements to {file}")
file.write_text(json.dumps(data, indent=2))
break
else:
print(F'no deps for {file}')
set_files_imports
sets the dependencies for a lot of files.
def set_files_imports(FILES: typing.Iterable[pathlib.Path]=FILES):
for file in FILES.rglob("*.ipynb"):
set_file_imports(file)
iter_code_cells
iterates through just the code cells.
def iter_code_cells(nb: dict) -> typing.Iterator[tuple[int, dict]]:
for i, cell in enumerate(nb["cells"]):
if cell["cell_type"] == "code":
yield i, cell
usage¤
- from the
tonyfast
module, requires deps
if (I := '__file__' not in locals()):
!python -m tonyfast tasks info lite
- from post with
importnb
if (I := '__file__' not in locals()):
!importnb -t 2022-12-21-lite-build.ipynb list
- run this task from
hatch
in the root of the project. thehatch
environment has all the necessary dependencies defined.hatch run lite:build