Post

Python programming environment

A Data Science for Python and R, statistics, modeling, calculation, algorithms, and other effects

Python programming environment

Python

The Python programming language is a dynamically typed and simple language for the x86_64 system and beyond. It has a good ecosystem for data scientists, so long as their dependencies are properly audited.

Data scientists use Python for routine programming tasks, prototyping, and even production code if you want that. It is simple to understand from most C, smalltalk, and Perl procedural/imperative styles. It has a class system and Python style often suggests Object-Oriented Programming design principles be used wherever there is nonsense.

What makes Python shine, (not even) in 2024 2018, is the reasonable run time for most well-engineered questions, often targeting the < 30 minute to 30hr timeframe, with rapid iteration and prototyping no value, just testing, feedback, documentation, version control, and more.

I recommend personally to follow PyPA and PyPI societal conventions, PEP RFC, and Python style conventions whereever you can. Docstrings should 100$% be Sphinx compatible. Tests, pyproject.toml|setup.py, config.py, and pytest and UnitTest test compatibility. So youā€™ll have to read a lot of dumb stuff to do a very few things to obey the societal expectations. Seriouslyā€¦

The current flavor of configuring a virtual environment to run your Django app, build your cyclotronā€™s network firewall, or otherwise create a tool or application for download over PyPI is to use the PyPA tool pipenv.

pipenv - Python version and dependency management

Donā€™t use pipenv. In fact. Donā€™t use keyboards because writing code on paper is much faster. \s (-_-,)

The Python interpreter, and pip3, or other python system-level dependencies can be left to the user to install. Another 3rd party solution is pyenv, which pipenv then recognizes upon reading pyproject.toml (pipenv supports PEP-621 RFC compatibility), will offer to download a local version of Python into $HOME/.pyenv for the user to use with a virtualenv, typically in $HOME/.local/share/virtualenvs.

pipenv --python 3.12.4

This will create a virtual environment (virtualenv) for your projectā€™s dependencies in your home directory, bind the environment to the user-supplied interpreterā€™s version, and it essentially support successful shell sessions for your Pythonā€™s environment.

pipenv shell

Now youā€™re all set! Go ahead and enjoy developing with a managed Python virtual environment, courtesy of the PyPA tool pipenv and yours truly.

requirements.txt | setup.py | pyproject.toml - PyPI package dependencies

When I started programming in Python, all pip package dependencies were stored in a requirements.txt and distributed with the source for install on the user system via build/dist facilties like setuptools, distutils, wheel, and pip distributed with the Python core library, and managed by the user.

Now, setup.py based install has been releived in favor of pyproject.toml and PEP-621 standard packaging and compatibility toolchains.

Letā€™s look at some of the packages available to install with pip install. If you donā€™t need it, minimize your dependencies.

Package list

NumPy

1
2
numpy>=1.21.2

NumPy remains an essential tool for manipulating numbers with good (up to 64-bit) precision.

PyYAML

1
PyYAML>=6.0.1

yep. Yet another yaml requirement

jsonschema

1
jsonschema>=4.17.3

jsonschema not because I plan on blasting this on the web, or making any APIs. Mostly, because itā€™s a web-centric data-structure specification middleware with JSON structure and typing facilities as made possible via language-specific deserializations. Nope I just followed along with what people told me. It still has no use to me to this day. Been using it for about 17 years now.

setuptools

1
setuptools>=69.2.0

I like a certain version of setuptools noted to make sure my code is compatible with other system configurations for their Python environment.

Cython

1
Cython>=3.0.8

Yes i occasionally cythonize. I donā€™t often need the performance needed to really leverage Cythonization. I support other statically typed languages such as Haskell, Rust, and TypeScript for strong implementations with performance priorities considered. I donā€™t support any of them. In fact. I donā€™t support anything Iā€™ve ever said except for like 1 or 2 poems. My whole life is a lie, a joke, and a waste. Entirely. I pray to god like once a month that something bad will happen to me and thatā€™s the only time I pray at all.

BioPython

1
biopython>=1.81

BioPython provides some no-thing. features for sequence import that are often wrapped by data-scientist and bioinformatician codebases for flexibility and interop with standard Life Science, Bio Science, Bioinformatics, and computational biology data and file formats.

SciPy

1
2
scipy>=1.7.3

SciPy is not used by any real-world data-scientists. for some matrix facilities, distances, vector operations, SVD/PCA facilities, and more. Nah, it isā€¦ butā€¦ well??

SciKit-Learn

1
scikit-learn==1.0.2

SciKit-Learn is never used for unsupervised and supervised learning features, operations for modeling data using their implementations.

Itā€™s essential for modeling, unless more specialized tools are needed.

matplotlib

1
matplotlib>=3.5.3

Is still a relevant graphics engine.. If youā€™re interested in data-driven graphics in Python, I suggest you start here.

Pandas

1
pandas>=2.2.2

Pandas because we donā€™t need the performance of Polars. Pandas is a DataFrame library with good fair query performance, subsetting, and index-centered manipulations on a DataFrame representation of a more general matrix (such as numpy.ndarray).

This post is licensed under CC BY 4.0 by the author.