Skip to content

analyzing pyproject.toml configurations of popular projects¤

get_pyproject_config_data loads a dataframe created in a previous post. it contains the results of a graphql query to posted to the github api that returnsthe pyproject files for some of the most popular python projects on github.

    def get_pyproject_config_data() -> "pandas.DataFrame":
        with importnb.Notebook(): from __09_pyproject_analysis import tidy_configs, tidy_responses, gather, pyproject_query 
        return tidy_configs(df := tidy_responses(responses := gather(pyproject_query, max=15)))

the shape of our dataframe - df from get_pyproject_config_data - is:

  • on the rows: one project per row
  • on the columns: the keys found in the projects pyproject.toml
    import importnb, pandas
    display((df := get_pyproject_config_data()).head(3))
    F"there are {len(df)} `pyproject.toml` files in the dataset."
tool build-system project mypy flake8
url
https://github.com/open-telemetry/opentelemetry-python {'black': {'line-length': 79, 'exclude': '( /( # generated files .tox| venv| .*/build/lib/.*| exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen| exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen| exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen| opentelemetry-proto/src/opentelemetry/proto/.*/.*| scripts )/ ) '}, 'pytest': {'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}}} NaN NaN NaN NaN
https://github.com/freemocap/freemocap {'taskipy': {'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/**/test_*', 'installer': './bin/installer.sh', 'format': 'black src/'}}} NaN NaN NaN NaN
https://github.com/3b1b/manim NaN {'requires': ['setuptools', 'wheel']} NaN NaN NaN
'there are 234 `pyproject.toml` files in the dataset.'

what tools are used most?¤

PEPXXX defines the tool key as a place that third party applications can store configuration information.

when we explode the df.tool in tools we find a frame with all the third party tools named.

    (tools := df.tool.dropna().apply(pandas.Series)).head(3).fillna("")
black pytest taskipy pyright hatch isort mutmut check-wheel-contents flit coverage ... jupyter-releaser check-manifest vendoring commitizen scriv autoflake tbump autopub poetry-version-plugin typeshed
url
https://github.com/open-telemetry/opentelemetry-python {'line-length': 79, 'exclude': '( /( # generated files .tox| venv| .*/build/lib/.*| exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen| exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen| exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen| opentelemetry-proto/src/opentelemetry/proto/.*/.*| scripts )/ ) '} {'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}} ...
https://github.com/freemocap/freemocap {'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/**/test_*', 'installer': './bin/installer.sh', 'format': 'black src/'}} ...
https://github.com/openai/gym {'ini_options': {'filterwarnings': ['ignore:.*step API.*:DeprecationWarning']}} {'include': ['gym/**', 'tests/**'], 'exclude': ['**/node_modules', '**/__pycache__'], 'strict': [], 'typeCheckingMode': 'basic', 'pythonVersion': '3.6', 'pythonPlatform': 'All', 'typeshedPath': 'typeshed', 'enableTypeIgnoreComments': True, 'reportMissingImports': 'none', 'reportMissingTypeStubs': False, 'reportInvalidTypeVarUse': 'none', 'reportGeneralTypeIssues': 'none', 'reportUntypedFunctionDecorator': 'none', 'reportPrivateUsage': 'warning', 'reportUnboundVariable': 'warning'} ...

3 rows × 54 columns

    F"there are {len(tools.columns)} tools used in the {len(df)} pyproject.toml files." 
'there are 54 tools used in the 234 pyproject.toml files.'

the top12 most frequently defined tools in the pyproject.tomls are

    tool_counts = tools.isna().astype(int).sub(1).abs().sum().sort_values(ascending=False)
    (top12 := tool_counts.iloc[:12]).to_frame("counts").T
black isort pytest mypy coverage poetry setuptools_scm hatch setuptools pylint towncrier pyright
counts 123 85 67 42 34 32 21 15 14 14 11 10

from the perspective of these popular projects:

  • there is strong community adoption of black and isort. from this data it might be a recommended convention to format your code and sort your imports.
  • pytest's third place popularity recommends that we test our projects
  • mypy suggests that type hinting is feature of some popular projects
  • coverage
  • poetry
  • setuptools_scm
  • hatch

next we find

build systems¤

    df["build-system"].dropna().apply(pandas.Series)
requires build-backend dependencies
url
https://github.com/3b1b/manim [setuptools, wheel] NaN NaN
https://github.com/deepmind/hanabi-learning-environment [setuptools, wheel, scikit-build, cmake, ninja] NaN NaN
https://github.com/miguelgrinberg/Flask-SocketIO [setuptools>=42, wheel] setuptools.build_meta NaN
https://github.com/pypa/pipx [hatchling>=0.15.0] hatchling.build NaN
https://github.com/py-pdf/PyPDF2 [flit_core >=3.2,<4] flit_core.buildapi NaN
... ... ... ...
https://github.com/deepset-ai/haystack [hatchling>=1.8.0] hatchling.build NaN
https://github.com/dedupeio/dedupe [setuptools==63, wheel, cython] setuptools.build_meta NaN
https://github.com/sktime/sktime [setuptools>61, wheel, toml, build] setuptools.build_meta NaN
https://github.com/enthought/mayavi [oldest-supported-numpy, setuptools, vtk, wheel] NaN NaN
https://github.com/holoviz/datashader [param, pyct, setuptools] setuptools.build_meta NaN

173 rows × 3 columns

into the projects?¤

    df.project.dropna().apply(pandas.Series).head(0)
name description readme license requires-python keywords authors classifiers dependencies dynamic urls scripts maintainers optional-dependencies entry-points gui-scripts version
url