analyzing configurations of popular projects

`get_pyproject_config_data` loads a dataframe created in a [previous post].
it contains the results of a <var>graphql</var> query to posted to the github api
that returnsthe pyproject files for some of the most popular python projects on github.

[previous post]: 2022-12-09-pyproject-analysis.ipynb

`get_pyproject_config_data` loads a dataframe created in a [previous post].
it contains the results of a <var>graphql</var> query to posted to the github api
that returnsthe pyproject files for some of the most popular python projects on github.

[previous post]: 2022-12-09-pyproject-analysis.ipynb


        {'data': {'text/html': 'get_pyproject_config_data loads a dataframe created in a previous post.\nit contains the results of a graphql query to posted to the github api\nthat returnsthe pyproject files for some of the most popular python projects on github.\n'}}

get_pyproject_config_data loads a dataframe created in a previous post . it contains the results of a graphql query to posted to the github api that returnsthe pyproject files for some of the most popular python projects on github.

    def get_pyproject_config_data() -> "pandas.DataFrame":
        with importnb.Notebook(): from __09_pyproject_analysis import tidy_configs, tidy_responses, gather, pyproject_query 
        return tidy_configs(df := tidy_responses(responses := gather(pyproject_query, max=15)))

    def get_pyproject_config_data() -> "pandas.DataFrame":
        with importnb.Notebook(): from __09_pyproject_analysis import tidy_configs, tidy_responses, gather, pyproject_query 
        return tidy_configs(df := tidy_responses(responses := gather(pyproject_query, max=15)))

the shape of our dataframe - `df` from `get_pyproject_config_data` - is:

* on the rows: one project per row
* on the columns: the keys found in the projects `pyproject.toml`

the shape of our dataframe - `df` from `get_pyproject_config_data` - is:

* on the rows: one project per row
* on the columns: the keys found in the projects `pyproject.toml`


        {'data': {'text/html': 'the shape of our dataframe - df from get_pyproject_config_data - is:
\n\non the rows: one project per row
\non the columns: the keys found in the projects pyproject.toml
\n\n'}}

the shape of our dataframe - df from get_pyproject_config_data - is:

on the rows: one project per row
on the columns: the keys found in the projects pyproject.toml

    import importnb, pandas
    display((df := get_pyproject_config_data()).head(3))
    F"there are {len(df)} `pyproject.toml` files in the dataset."

    import importnb, pandas
    display((df := get_pyproject_config_data()).head(3))
    F"there are {len(df)} `pyproject.toml` files in the dataset."

	tool	build-system	project	mypy	flake8
url
https://github.com/open-telemetry/opentelemetry-python	{'black': {'line-length': 79, 'exclude': '( /( # generated files .tox\| venv\| ./build/lib/.\| exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen\| exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen\| exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen\| opentelemetry-proto/src/opentelemetry/proto/./.\| scripts )/ ) '}, 'pytest': {'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}}}	NaN	NaN	NaN	NaN
https://github.com/freemocap/freemocap	{'taskipy': {'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/*/test_', 'installer': './bin/installer.sh', 'format': 'black src/'}}}	NaN	NaN	NaN	NaN
https://github.com/3b1b/manim	NaN	{'requires': ['setuptools', 'wheel']}	NaN	NaN	NaN

'there are 234 `pyproject.toml` files in the dataset.'

### what tools are used most?

PEPXXX defines the `tool` key as a place that third party applications can store configuration information.

### what tools are used most?

PEPXXX defines the `tool` key as a place that third party applications can store configuration information.


        {'data': {'text/html': 'what tools are used most?
\nPEPXXX defines the tool key as a place that third party applications can store configuration information.\n'}}

what tools are used most?

PEPXXX defines the tool key as a place that third party applications can store configuration information.

when we explode the `df.tool` in `tools` we find a frame with all the third party tools named.

when we explode the `df.tool` in `tools` we find a frame with all the third party tools named.


        {'data': {'text/html': 'when we explode the df.tool in tools we find a frame with all the third party tools named.\n'}}

when we explode the df.tool in tools we find a frame with all the third party tools named.

    (tools := df.tool.dropna().apply(pandas.Series)).head(3).fillna("")

    (tools := df.tool.dropna().apply(pandas.Series)).head(3).fillna("")

	black	pytest	taskipy	pyright	...
url
https://github.com/open-telemetry/opentelemetry-python	{'line-length': 79, 'exclude': '( /( # generated files .tox\| venv\| ./build/lib/.\| exporter/opentelemetry-exporter-jaeger-proto-grpc/src/opentelemetry/exporter/jaeger/proto/grpc/gen\| exporter/opentelemetry-exporter-jaeger-thrift/src/opentelemetry/exporter/jaeger/thrift/gen\| exporter/opentelemetry-exporter-zipkin-proto-http/src/opentelemetry/exporter/zipkin/proto/http/v2/gen\| opentelemetry-proto/src/opentelemetry/proto/./.\| scripts )/ ) '}	{'ini_options': {'addopts': '-rs -v', 'log_cli': True, 'log_cli_level': 'warning'}}			...
https://github.com/freemocap/freemocap			{'tasks': {'setup': 'pre-commit install', 'test': 'python -m unittest src/tests/*/test_', 'installer': './bin/installer.sh', 'format': 'black src/'}}		...
https://github.com/openai/gym		{'ini_options': {'filterwarnings': ['ignore:.step API.:DeprecationWarning']}}		{'include': ['gym/', 'tests/'], 'exclude': ['/node_modules', '/__pycache__'], 'strict': [], 'typeCheckingMode': 'basic', 'pythonVersion': '3.6', 'pythonPlatform': 'All', 'typeshedPath': 'typeshed', 'enableTypeIgnoreComments': True, 'reportMissingImports': 'none', 'reportMissingTypeStubs': False, 'reportInvalidTypeVarUse': 'none', 'reportGeneralTypeIssues': 'none', 'reportUntypedFunctionDecorator': 'none', 'reportPrivateUsage': 'warning', 'reportUnboundVariable': 'warning'}	...

3 rows × 54 columns

    F"there are {len(tools.columns)} tools used in the {len(df)} pyproject.toml files."

    F"there are {len(tools.columns)} tools used in the {len(df)} pyproject.toml files."

'there are 54 tools used in the 234 pyproject.toml files.'

the `top12` most frequently defined tools in the `pyproject.toml`s are

the `top12` most frequently defined tools in the `pyproject.toml`s are

the top12 most frequently defined tools in the pyproject.toml s are

    tool_counts = tools.isna().astype(int).sub(1).abs().sum().sort_values(ascending=False)
    (top12 := tool_counts.iloc[:12]).to_frame("counts").T

    tool_counts = tools.isna().astype(int).sub(1).abs().sum().sort_values(ascending=False)
    (top12 := tool_counts.iloc[:12]).to_frame("counts").T

	black	isort	pytest	mypy	coverage	poetry	setuptools_scm	hatch	setuptools	pylint	towncrier	pyright
counts	123	85	67	42	34	32	21	15	14	14	11	10

from the perspective of these popular projects:

* there is strong community adoption of `black` and `isort`.
  from this data it might be a recommended convention to format your code and sort your imports.
* `pytest`'s third place popularity recommends that we test our projects
* `mypy` suggests that type hinting is feature of some popular projects
* `coverage` 
* `poetry`
* `setuptools_scm`
* `hatch`

next we find

from the perspective of these popular projects:

* there is strong community adoption of `black` and `isort`.
  from this data it might be a recommended convention to format your code and sort your imports.
* `pytest`'s third place popularity recommends that we test our projects
* `mypy` suggests that type hinting is feature of some popular projects
* `coverage` 
* `poetry`
* `setuptools_scm`
* `hatch`

next we find


        {'data': {'text/html': "from the perspective of these popular projects:
\n\nthere is strong community adoption of black and isort.\nfrom this data it might be a recommended convention to format your code and sort your imports.
\npytest's third place popularity recommends that we test our projects
\nmypy suggests that type hinting is feature of some popular projects
\ncoverage
\npoetry
\nsetuptools_scm
\nhatch
\n
\nnext we find\n"}}

from the perspective of these popular projects:

there is strong community adoption of black and isort . from this data it might be a recommended convention to format your code and sort your imports.
pytest 's third place popularity recommends that we test our projects
mypy suggests that type hinting is feature of some popular projects
coverage
poetry
setuptools_scm
hatch

next we find

build systems

    df["build-system"].dropna().apply(pandas.Series)

    df["build-system"].dropna().apply(pandas.Series)

	requires	build-backend	dependencies
url
https://github.com/3b1b/manim	[setuptools, wheel]	NaN	NaN
https://github.com/deepmind/hanabi-learning-environment	[setuptools, wheel, scikit-build, cmake, ninja]	NaN	NaN
https://github.com/miguelgrinberg/Flask-SocketIO	[setuptools>=42, wheel]	setuptools.build_meta	NaN
https://github.com/pypa/pipx	[hatchling>=0.15.0]	hatchling.build	NaN
https://github.com/py-pdf/PyPDF2	[flit_core >=3.2,<4]	flit_core.buildapi	NaN
...	...	...	...
https://github.com/deepset-ai/haystack	[hatchling>=1.8.0]	hatchling.build	NaN
https://github.com/dedupeio/dedupe	[setuptools==63, wheel, cython]	setuptools.build_meta	NaN
https://github.com/sktime/sktime	[setuptools>61, wheel, toml, build]	setuptools.build_meta	NaN
https://github.com/enthought/mayavi	[oldest-supported-numpy, setuptools, vtk, wheel]	NaN	NaN
https://github.com/holoviz/datashader	[param, pyct, setuptools]	setuptools.build_meta	NaN

173 rows × 3 columns

into the projects?

    df.project.dropna().apply(pandas.Series).head(0)

    df.project.dropna().apply(pandas.Series).head(0)

	name	description	readme	license	requires-python	keywords	authors	classifiers	dependencies	dynamic	urls	scripts	maintainers	optional-dependencies	entry-points	gui-scripts	version
url

	index	execution_count	cell_type	source	outputs	metadata	toolbar	loc
code
markdown
raw

	cell	source	outputs
code
markdown
raw