analyzing pyproject.toml
configurations of popular projects
get_pyproject_config_data
loads a dataframe created in a previous post .
it contains the results of a graphql query to posted to the github api
that returnsthe pyproject files for some of the most popular python projects on github.
def get_pyproject_config_data () -& gt ; "pandas.DataFrame" :
with importnb . Notebook (): from __09_pyproject_analysis import tidy_configs , tidy_responses , gather , pyproject_query
return tidy_configs ( df := tidy_responses ( responses := gather ( pyproject_query , max = 15 )))
the shape of our dataframe - df
from get_pyproject_config_data
- is:
on the rows: one project per row
on the columns: the keys found in the projects pyproject.toml
import importnb , pandas
display (( df := get_pyproject_config_data ()) . head ( 3 ))
F "there are { len ( df ) } `pyproject.toml` files in the dataset."
tool
build-system
project
mypy
flake8
url
https://github.com/open-telemetry/opentelemetry-python
{'black': {'line-length': 79, 'exclude': '(
/( # generated files
.tox|
venv|
.*/build/lib/.*|
exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen|
exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen|
exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen|
opentelemetry-proto/src/opentelemetry/proto/.*/.*|
scripts
)/
)
'}, 'pytest': {'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}}}
NaN
NaN
NaN
NaN
https://github.com/freemocap/freemocap
{'taskipy': {'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/**/test_*', 'installer': './bin/installer.sh', 'format': 'black src/'}}}
NaN
NaN
NaN
NaN
https://github.com/3b1b/manim
NaN
{'requires': ['setuptools', 'wheel']}
NaN
NaN
NaN
'there are 234 `pyproject.toml` files in the dataset.'
PEPXXX defines the tool
key as a place that third party applications can store configuration information.
when we explode the df.tool
in tools
we find a frame with all the third party tools named.
( tools := df . tool . dropna () . apply ( pandas . Series )) . head ( 3 ) . fillna ( "" )
black
pytest
taskipy
pyright
hatch
isort
mutmut
check-wheel-contents
flit
coverage
...
jupyter-releaser
check-manifest
vendoring
commitizen
scriv
autoflake
tbump
autopub
poetry-version-plugin
typeshed
url
https://github.com/open-telemetry/opentelemetry-python
{'line-length': 79, 'exclude': '(
/( # generated files
.tox|
venv|
.*/build/lib/.*|
exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen|
exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen|
exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen|
opentelemetry-proto/src/opentelemetry/proto/.*/.*|
scripts
)/
)
'}
{'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}}
...
https://github.com/freemocap/freemocap
{'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/**/test_*', 'installer': './bin/installer.sh', 'format': 'black src/'}}
...
https://github.com/openai/gym
{'ini_options': {'filterwarnings': ['ignore:.*step API.*:DeprecationWarning']}}
{'include': ['gym/**', 'tests/**'], 'exclude': ['**/node_modules', '**/__pycache__'], 'strict': [], 'typeCheckingMode': 'basic', 'pythonVersion': '3.6', 'pythonPlatform': 'All', 'typeshedPath': 'typeshed', 'enableTypeIgnoreComments': True, 'reportMissingImports': 'none', 'reportMissingTypeStubs': False, 'reportInvalidTypeVarUse': 'none', 'reportGeneralTypeIssues': 'none', 'reportUntypedFunctionDecorator': 'none', 'reportPrivateUsage': 'warning', 'reportUnboundVariable': 'warning'}
...
3 rows × 54 columns
F "there are { len ( tools . columns ) } tools used in the { len ( df ) } pyproject.toml files."
'there are 54 tools used in the 234 pyproject.toml files.'
the top12
most frequently defined tools in the pyproject.toml
s are
tool_counts = tools . isna () . astype ( int ) . sub ( 1 ) . abs () . sum () . sort_values ( ascending = False )
( top12 := tool_counts . iloc [: 12 ]) . to_frame ( "counts" ) . T
black
isort
pytest
mypy
coverage
poetry
setuptools_scm
hatch
setuptools
pylint
towncrier
pyright
counts
123
85
67
42
34
32
21
15
14
14
11
10
from the perspective of these popular projects:
there is strong community adoption of black
and isort
.
from this data it might be a recommended convention to format your code and sort your imports.
pytest
's third place popularity recommends that we test our projects
mypy
suggests that type hinting is feature of some popular projects
coverage
poetry
setuptools_scm
hatch
next we find
build systems
df [ "build-system" ] . dropna () . apply ( pandas . Series )
requires
build-backend
dependencies
url
https://github.com/3b1b/manim
[setuptools, wheel]
NaN
NaN
https://github.com/deepmind/hanabi-learning-environment
[setuptools, wheel, scikit-build, cmake, ninja]
NaN
NaN
https://github.com/miguelgrinberg/Flask-SocketIO
[setuptools>=42, wheel]
setuptools.build_meta
NaN
https://github.com/pypa/pipx
[hatchling>=0.15.0]
hatchling.build
NaN
https://github.com/py-pdf/PyPDF2
[flit_core >=3.2,<4]
flit_core.buildapi
NaN
...
...
...
...
https://github.com/deepset-ai/haystack
[hatchling>=1.8.0]
hatchling.build
NaN
https://github.com/dedupeio/dedupe
[setuptools==63, wheel, cython]
setuptools.build_meta
NaN
https://github.com/sktime/sktime
[setuptools>61, wheel, toml, build]
setuptools.build_meta
NaN
https://github.com/enthought/mayavi
[oldest-supported-numpy, setuptools, vtk, wheel]
NaN
NaN
https://github.com/holoviz/datashader
[param, pyct, setuptools]
setuptools.build_meta
NaN
173 rows × 3 columns
into the projects?
df . project . dropna () . apply ( pandas . Series ) . head ( 0 )
name
description
readme
license
requires-python
keywords
authors
classifiers
dependencies
dynamic
urls
scripts
maintainers
optional-dependencies
entry-points
gui-scripts
version
url