restart and run all or it didn't happen¤
my golden rule for notebooks is Restart and run all or it didn't happen
. RARA||GTFO for short.
the RARA criteria is satisfied when:
- all of the cells are executed. conversely, there are not any unexecuted cells.
- all of the cells prompt numbers increase monotonically. out-of-order prompts means out-of-order execution and the possibility of hidden state.
%reload_ext pidgy
rara = F"<abbr>RARA</abbr>"
RARA = "__Restart and run all__"
RARA = "__Restart and run all__"
## benefits of <abbr _")}}"="" title="{{RARA.strip(">RARA</abbr>||<abbr title="Get The Fart Out">GTFO</abbr>
a computational notebook is a [literate program] that composites narrative and code;
it simultaneously has the qualities of __literature__ and a __program__.
literate programs introduce a computational qualities to documents,
and the compute implied the authoring system had [state].
[a common complaint is that literate notebook programs contain hidden states][complaint].
hidden state is happens when code cells are executed out-of-order,
or not executed at all.
executing code cells ensures that code can be read by at least one machine; the code input worked at some point.
ordered execution is an objective measure of non-hidden state [^tamper].
in the literature, [Exploration and Explanation in Computational Notebooks], a large scale analysis of notebooks,
they distinguish between linear notebooks and non-linear notebooks.
non-linear notebooks are most likely to have [hidden state](https://ploomber.io/blog/nbs-myths/#hidden-state).
at scale, {{RARA.lower()}} is a distinguishing quality of computational notebooks.
> [The most prevalent expression of literate computing right now is the computational notebook.][landscape]
[fernando perez] - co-founder of [Project Jupyter] - acknowledged that notebooks are literate programs with something extra.
he describes notebook authoring as [literate computing] where live computation is woven into the document.
the notebook document is stateful, statefulness is a quality of this new media.
the {{RARA.lower()}} criteria verifies that code inputs work and there is not hidden state.
[^tamper]: it is possible that inputs were tampered with, but let us assume the best intentions.
[literate computing]: https://web.archive.org/web/20220510083647/http://blog.fperez.org/2013/04/literate-computing-and-computational.html
[landscape]: https://www.ppig.org/files/2019-PPIG-30th-fog.pdf
[complaint]: https://towardsdatascience.com/the-case-against-the-jupyter-notebook-d4da17e97243
[story in notebook]: https://marybethkery.com/projects/Verdant/Kery-The-Story-in-the-Notebook-Exploratory-Data-Science-using-a-Literate-Programming-Tool.pdf
[literate program]: https://en.wikipedia.org/wiki/Literate_programming
[fernando perez]: https://en.wikipedia.org/wiki/Fernando_P%C3%A9rez_(software_developer)
[Project Jupyter]: https://jupyter.org/
[Exploration and Explanation in Computational Notebooks]: https://adamrule.com/files/papers/chi_2018_computational_notebooks_camera_ready.pdf
[state]: https://en.wikipedia.org/wiki/State_(computer_science)
benefits of RARA||GTFO¤
a computational notebook is a literate program that composites narrative and code; it simultaneously has the qualities of literature and a program. literate programs introduce a computational qualities to documents, and the compute implied the authoring system had state.
a common complaint is that literate notebook programs contain hidden states. hidden state is happens when code cells are executed out-of-order, or not executed at all. executing code cells ensures that code can be read by at least one machine; the code input worked at some point. ordered execution is an objective measure of non-hidden state 1.
in the literature, Exploration and Explanation in Computational Notebooks, a large scale analysis of notebooks, they distinguish between linear notebooks and non-linear notebooks. non-linear notebooks are most likely to have hidden state. at scale, restart and run all is a distinguishing quality of computational notebooks.
> The most prevalent expression of literate computing right now is the computational notebook.
fernando perez - co-founder of Project Jupyter - acknowledged that notebooks are literate programs with something extra. he describes notebook authoring as literate computing where live computation is woven into the document. the notebook document is stateful, statefulness is a quality of this new media.
the restart and run all criteria verifies that code inputs work and there is not hidden state.
### executed code in computational notebooks
when notebooks are observed as documents,
code objects are literary devices actively participating in the narrative.
these code objects have two distinct states:
1. NOT executed code
un-executed reverses back into normal language. it is [arbitrary symbol] propping up the narrative.
2. executed code
executed code encapsulates a form of pre-formatted input code and corresponding output respresentations.
the input and outputs cooperate - through text and form - to bring meaning to the coded object.
NOT executed code has arbitrary meaning, trust it like you would code returned from [chatgpt](https://openai.com/blog/chatgpt/).
on the other hand, executed code has inputs, outputs and prompts.
when the prompts are populated we know a cell has been executed.
when all the cells execute in order we satisfy the {{RARA.lower()}} criteria.
[arbitrary symbol]: https://psychologydictionary.org/arbitrary-symbol/
executed code in computational notebooks¤
when notebooks are observed as documents, code objects are literary devices actively participating in the narrative. these code objects have two distinct states:
-
NOT executed code
un-executed reverses back into normal language. it is arbitrary symbol propping up the narrative.
-
executed code
executed code encapsulates a form of pre-formatted input code and corresponding output respresentations. the input and outputs cooperate - through text and form - to bring meaning to the coded object.
NOT executed code has arbitrary meaning, trust it like you would code returned from chatgpt. on the other hand, executed code has inputs, outputs and prompts. when the prompts are populated we know a cell has been executed. when all the cells execute in order we satisfy the restart and run all criteria.
## {{RARA}} criteria implementation
def restart_and_run_all(file, code_cell_count=0):
`restart_and_run_all` is violated when we iterate through the the notebook cells
for i, cell in enumerate((notebook := read_file(file)).cells):
if cell["cell_type"] == "code":
to discover a code cell:
code_cell_count += 1
try:
* that has not been executed because the prompt is not an integer
current_count = int(cell.execution_count)
except ValueError: raise UnexecutedCell(F"cell {i} not executed")
if code_cell_count != current_count:
* has cell prompts that do not align with current cell count
raise OutOfOrderExecution(F"cell {i} is executed out of order")
Restart and run all criteria implementation¤
def restart_and_run_all(file, code_cell_count=0):
restart_and_run_all
is violated when we iterate through the the notebook cells
for i, cell in enumerate((notebook := read_file(file)).cells):
if cell["cell_type"] == "code":
to discover a code cell:
code_cell_count += 1
try:
- that has not been executed because the prompt is not an integer
current_count = int(cell.execution_count) except ValueError: raise UnexecutedCell(F"cell {i} not executed") if code_cell_count != current_count:
- has cell prompts that do not align with current cell count
raise OutOfOrderExecution(F"cell {i} is executed out of order")
- has cell prompts that do not align with current cell count
### {{RARA}} exceptions
we assign two formal `BaseException`s for `restart_and_run_all`: `OutOfOrderExection` and `UnexecutedCell`
#### unexecuted cells
unexecuted cells are representations only by the preformatted text input.
they lack an outputs or prompt number.
without these features the code stands on its own.
it reverts back to pseudocode.
class UnexecutedCell(BaseException):
`UnexecutedCell`s raises when the code cell execution counts is a non-integer.
#### out of order execution
there are some tools that are designed to be reactive or executed out of order.
the RARA criterion does not dismiss these cases rather it could be a starting state.
we may suspect hidden state
class OutOfOrderExection:
`OutOfOrderExection` raises when code cell execution counts are not monotonically increasing.
def read_file(file):
if not isinstance(file, str): file = file.__file__
with open(file) as f: return nbformat.v4.reads(f.read())
### extra preservatives
notebooks are swiss army knives documents; they have application as tests, modules, scripts and documents.
the {{RARA.lower()}} criteria combined with the [different cell execution states][states]: `INTERACTIVE`, `SCRIPT`, or `MODULE`.
assert restart_and_run_all("2023-01-09-notebooks-states.ipynb") is None,\
errors are raised when failing, and i don't want to demostrate failure here
when we assert that a previous notebook will {{RARA.lower()}}
we become slightly more confident about the code in that document.
[state]: 2023-01-09-notebooks-states.ipynb
Restart and run all exceptions¤
we assign two formal BaseException
s for restart_and_run_all
: OutOfOrderExection
and UnexecutedCell
unexecuted cells¤
unexecuted cells are representations only by the preformatted text input. they lack an outputs or prompt number. without these features the code stands on its own. it reverts back to pseudocode.
class UnexecutedCell(BaseException):
UnexecutedCell
s raises when the code cell execution counts is a non-integer.
out of order execution¤
there are some tools that are designed to be reactive or executed out of order. the RARA criterion does not dismiss these cases rather it could be a starting state. we may suspect hidden state
class OutOfOrderExection:
OutOfOrderExection
raises when code cell execution counts are not monotonically increasing.
def read_file(file):
if not isinstance(file, str): file = file.__file__
with open(file) as f: return nbformat.v4.reads(f.read())
extra preservatives¤
notebooks are swiss army knives documents; they have application as tests, modules, scripts and documents.
the restart and run all criteria combined with the [different cell execution states][states]: INTERACTIVE
, SCRIPT
, or MODULE
.
assert restart_and_run_all("2023-01-09-notebooks-states.ipynb") is None,\
errors are raised when failing, and i don't want to demostrate failure here
when we assert that a previous notebook will restart and run all we become slightly more confident about the code in that document.
## impacts of {{RARA}}-ability
### {{RARA}} and reusability
restart and run all is critical to reusing python notebooks as [modules, tests or scripts].
there are a few ways of [testing notebooks], and none of these methods suceed without notebooks that restart and run all.
{{RARA.lower()}} as a practice will encourage you to write notebooks you'll be able to return to.
[modules, tests or scripts]: 2023-01-09-notebooks-states.ipynb
[testing notebooks]: https://nbviewer.org/gist/tonyfast/e7dd7ff3d808d57b77e9765626a73a91
impacts of Restart and run all-ability¤
Restart and run all and reusability¤
restart and run all is critical to reusing python notebooks as modules, tests or scripts. there are a few ways of testing notebooks, and none of these methods suceed without notebooks that restart and run all.
restart and run all as a practice will encourage you to write notebooks you'll be able to return to.
### <abbr>RARA</abbr> conflicts
#### too many ideas
it is easy to put [too many units of thought into a notebook](#Restart-and-run-all-roots).
this struggle can be striking when sticking to the restart and run all principle.
when we write documents about single units of thought we are treading in formal testing territory.
#### compute intensive operations
a push against {{RARA.lower()}} is having large datasets or other compute intensive operations.
we apply the <abbr title="cache rules everything around me">CREAM</abbr> principle and cache our
compute intensive tasks.
in these situations, authors will benefit data engineering there work to avoid
costly repetitive tasks; you'll save your future self time.
RARA conflicts¤
too many ideas¤
it is easy to put too many units of thought into a notebook. this struggle can be striking when sticking to the restart and run all principle. when we write documents about single units of thought we are treading in formal testing territory.
compute intensive operations¤
a push against restart and run all is having large datasets or other compute intensive operations. we apply the CREAM principle and cache our compute intensive tasks. in these situations, authors will benefit data engineering there work to avoid costly repetitive tasks; you'll save your future self time.
## 📣 write notebooks that endure
{{rara}} || <abbr>GTFO</abbr> helps us trust our's and other's notebooks more.
we can be slightly more confident in future uses of that work.
{{rara}}. i'm cheering for y'all. i see y'all beating the snot out of every line of code.
let's make these efforts endure. remember to tell you friends <q>{{RARA.strip("_")}} or <abbr>GTFO</abbr>.</q>
📣 write notebooks that endure¤
RARA || GTFO helps us trust our's and other's notebooks more. we can be slightly more confident in future uses of that work.
RARA. i'm cheering for y'all. i see y'all beating the snot out of every line of code.
let's make these efforts endure. remember to tell you friends Restart and run all or GTFO.
## devils 😈 share
### {{RARA}} roots
paco nathan was the first person i heard this concept from.
at [Jupyter Day 2016], at the time, he was using notebooks as medium
for immersive professional publications. these oriole notebooks
were educational tools that embedded the teacher and the runtime together.
there were some beautiful videos produced with some big names like:...
[paco] presented what his team learned when [producing teaching videos with notebooks][oriole].
> 1. focus on a concise "unit of thought"
> 1. invest the time and editorial effort to create a good introduction
> 1. keep your narrative simple and reasonably linear
> 1. "chunk" both the text and the code into understandable parts
> 1. alternate between text, code, output, further links, etc.
> 1. leverage markdown by providing interesting links for background, deep-dive, etc.
> 1. code cells should not be long, < 10 lines
> 1. code cells must show that they've run, producing at least some output
> 1. load data from the container, not the network
> 1. __clear all output then "Run All" -- or it didn't happen__
> 1. video narratives: there's text, and there's subtext...
> 1. pause after each "beat" -- smile, breathe, allow people to follow you
and there, clear as day, in the tenth bullet, we find __clear all output then "Run All" -- or it didn't happen__.
i love that this pattern was discovered when producing video content.
it means that {{RARA}} is critical to communicating our ideas with notebooks as the substrate.
[oriole]: https://github.com/ceteri/oriole_jupyterday_atl/blob/master/oriole_talk.ipynb
[paco]: https://github.com/ceteri
[Jupyter Day 2016]: https://jupyterday-atlanta-2016.github.io/#paconathan
devils 😈 share¤
Restart and run all roots¤
paco nathan was the first person i heard this concept from. at Jupyter Day 2016, at the time, he was using notebooks as medium for immersive professional publications. these oriole notebooks were educational tools that embedded the teacher and the runtime together. there were some beautiful videos produced with some big names like:...
paco presented what his team learned when producing teaching videos with notebooks.
> 1. focus on a concise "unit of thought" > 1. invest the time and editorial effort to create a good introduction > 1. keep your narrative simple and reasonably linear > 1. "chunk" both the text and the code into understandable parts > 1. alternate between text, code, output, further links, etc. > 1. leverage markdown by providing interesting links for background, deep-dive, etc. > 1. code cells should not be long, < 10 lines > 1. code cells must show that they've run, producing at least some output > 1. load data from the container, not the network > 1. clear all output then "Run All" -- or it didn't happen > 1. video narratives: there's text, and there's subtext... > 1. pause after each "beat" -- smile, breathe, allow people to follow you
and there, clear as day, in the tenth bullet, we find clear all output then "Run All" -- or it didn't happen. i love that this pattern was discovered when producing video content. it means that Restart and run all is critical to communicating our ideas with notebooks as the substrate.
-
it is possible that inputs were tampered with, but let us assume the best intentions. ↩