Storing local copies of gists.

This notebook uses doit to download copies of my gists.

In [1]:
%load_ext doit 
from toolz.curried import *; from pandas import DataFrame, concat, Series # Can't wildcard pandas when using doit.
from doit.tools import create_folder; from pathlib import Path; import requests as r
  • Access user information to determine the number of gists.
In [2]:
__user__ = __import__('requests').get('https://api.github.com/users/tonyfast').json()
  • Iterate over the index of gists provided by Github returning a dataframe
In [3]:
df = concat([
    DataFrame(__import__('requests').get(__user__['gists_url'].replace('{/gist_id}', '?page=')+str(i+1)).json()) for i in range(__user__['public_gists']//30)]).set_index('id')

files = concat([
    df.files.apply(compose(Series, list, pluck(_), dict.values)).stack().rename(_)
    for _ in ['filename', 'raw_url']], axis=1)
In [10]:
df.sample(2)
Out[10]:
comments comments_url commits_url created_at description files forks_url git_pull_url git_push_url html_url owner public truncated updated_at url user
id
3e8c3f3f175498cb555c 0 https://api.github.com/gists/3e8c3f3f175498cb5... https://api.github.com/gists/3e8c3f3f175498cb5... 2016-02-04T20:33:45Z {'Untitled49.ipynb': {'size': 5032, 'filename'... https://api.github.com/gists/3e8c3f3f175498cb5... https://gist.github.com/3e8c3f3f175498cb555c.git https://gist.github.com/3e8c3f3f175498cb555c.git https://gist.github.com/3e8c3f3f175498cb555c {'login': 'tonyfast', 'id': 4236275, 'repos_ur... True False 2016-02-04T21:01:30Z https://api.github.com/gists/3e8c3f3f175498cb555c None
50be713105222e420f34fe78dc7a94bc 0 https://api.github.com/gists/50be713105222e420... https://api.github.com/gists/50be713105222e420... 2016-08-13T17:50:42Z {'readme.md': {'size': 24, 'filename': 'readme... https://api.github.com/gists/50be713105222e420... https://gist.github.com/50be713105222e420f34fe... https://gist.github.com/50be713105222e420f34fe... https://gist.github.com/50be713105222e420f34fe... {'login': 'tonyfast', 'id': 4236275, 'repos_ur... True False 2016-08-13T17:55:45Z https://api.github.com/gists/50be713105222e420... None
In [11]:
len(df)
Out[11]:
390
In [22]:
def download(url, to):
    """Download a url and write it to file."""
    print(to)
    try: to.write_text(r.get(url).text); return True
    except: return False
  • Establish a doit task to download only the notebooks. Effort can be placed here to make a command line api.
In [25]:
def task_store_nb_gists():    
    for (name, i), s in files.drop_duplicates().iterrows():
        if Path(s.loc['filename']).suffix == '.ipynb':
            i = str(i)
            name = Path('~/gists') / name / i / s.loc['filename']
            yield dict(
                name=name, targets=[name], actions=[
                    (create_folder, [str(name.parent)]),
                    (download, [s.loc['raw_url'], name])])
    
  • Run the task and download all the notebooks
In [ ]:
%doit store_nb_gists