Skip to content

adding an http attribrute to pandas series¤

i really like fluent programming styles which is why i love using pandas objects as first class citizens in my programs. i've been adding accessors to pandas over time to improve my practice. they do things like jinja templates, subprocess calls, and awaitables.

a feature i have not added is the ability to call http methods from a series of urls. also, an appropriately schematized dataframe could support more fine tuning. we ignore the dataframe case in this notebook and work off the series case.|

    import midgy
    from nobook.utils import *
    from nobook.utils import Accessor, SERIES, INDEX, loads
    from functools import partial

use aiohttpx as an async aware http requests tool with a caching client for edge buzz buzz.

    class _http(Accessor, types=SERIES | INDEX, name="http"):
        def __init__(self, object):
            import aiohttp, aiohttp_client_cache
            super().__init__(object)
            self._client = aiohttp.ClientSession
            self._client_cache = partial(
                aiohttp_client_cache.CachedSession, cache=aiohttp_client_cache.SQLiteBackend(".pandas.cache.sqlite")
            )

        def client(self, cache=True):
            if cache:
                return self._client_cache()
            return self._client()

        async def get(self, mime=None, cache=True):
            async with self.client(cache) as session:
                data = await apply(self.object, self._get_one, mime=mime, session=session)
            return data

        async def _get_one(self, url, mime=None, session=None):
            async with session.get(url) as response:
                if mime is None:
                    mime = response.headers.get("content-type", "text/plain")
                    mime, *_ = mime.partition(";")
                data = loads(await response.text(), mime)
            return data

example grabbing data about my gist history.

    gists = (
        await (
            "https://api.github.com/users/tonyfast/gists" + Index(range(1, 5)).map("?page={}".format)
        ).http.get("application/json")
    ).explode().series().set_index("id", append=True)
    (files := gists.files.series().stack().series())
filename type language raw_url size
id
https://api.github.com/users/tonyfast/gists?page=1 90c41d4994f75c594db804aeba56fc26 first_and_second_laws_of_thermodynamics.ipynb first_and_second_laws_of_thermodynamics.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/90... 20469
aa3b16c5a284150e3d727a843b6cefec axe_types.py axe_types.py application/x-python Python https://gist.githubusercontent.com/tonyfast/aa... 8442
713ae6c57602c0f85d011421b20d5ea0 BinaryExamples.ipynb BinaryExamples.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/71... 352154
c004044b4fe641735031ecf2069cf595 aom.json aom.json application/json JSON https://gist.githubusercontent.com/tonyfast/c0... 52330
e17946facd998a931527467d646cc822 README.md README.md text/markdown Markdown https://gist.githubusercontent.com/tonyfast/e1... 3044
... ... ... ... ... ... ... ...
https://api.github.com/users/tonyfast/gists?page=4 cfb55f41f5452ef33ec6fbb4e0bda991 doctest-myst.ipynb doctest-myst.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/cf... 5731
357b09758b5fb0f31fd2c0e7f7cf3967 readme.ipynb readme.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/35... 1117725
e0468f0decb0a454d8fae4de8511d794 df-df-api.ipynb df-df-api.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/e0... 22500
292377ff60690bff0bf866e93a36eaf2 reprish.ipynb reprish.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/29... 6115
c72d948631f3d38702d2876942ceb57e custom_repr.ipynb custom_repr.ipynb text/plain Jupyter Notebook https://gist.githubusercontent.com/tonyfast/c7... 62577

149 rows × 5 columns

conclusion¤

  • this is the densest i've ever writtent his practice. i love it.
  • aiohttp includes a server that could also be attached to the dataframe to serve it; imagine df.serve().