Python programming environment
A Data Science for Python and R, statistics, modeling, calculation, algorithms, and other effects
Python
The Python programming language is a dynamically typed and simple language for the x86_64 system and beyond. It has a good ecosystem for data scientists, so long as their dependencies are properly audited.
Data scientists use Python for routine programming tasks, prototyping, and even production code if you want that. It is simple to understand from most C, smalltalk, and Perl procedural/imperative styles. It has a class system and Python style often suggests Object-Oriented Programming design principles be used wherever there is nonsense.
What makes Python shine, (not even) in 2024 2018, is the reasonable run time for most well-engineered questions, often targeting the < 30 minute to 30hr timeframe, with rapid iteration and prototyping no value, just testing, feedback, documentation, version control, and more.
I recommend personally to follow PyPA and PyPI societal conventions, PEP RFC, and Python style conventions whereever you can. Docstrings should 100$% be Sphinx compatible. Tests, pyproject.toml|setup.py
, config.py
, and pytest
and UnitTest
test compatibility. So youāll have to read a lot of dumb stuff to do a very few things to obey the societal expectations. Seriouslyā¦
The current flavor of configuring a virtual environment to run your Django app, build your cyclotronās network firewall, or otherwise create a tool or application for download over PyPI is to use the PyPA tool pipenv
.
pipenv
- Python version and dependency management
Donāt use pipenv. In fact. Donāt use keyboards because writing code on paper is much faster. \s (-_-,)
The Python interpreter, and pip3, or other python system-level dependencies can be left to the user to install. Another 3rd party solution is pyenv
, which pipenv
then recognizes upon reading pyproject.toml
(pipenv
supports PEP-621 RFC compatibility), will offer to download a local version of Python into $HOME/.pyenv
for the user to use with a virtualenv, typically in $HOME/.local/share/virtualenvs
.
pipenv --python 3.12.4
This will create a virtual environment (virtualenv
) for your projectās dependencies in your home directory, bind the environment to the user-supplied interpreterās version, and it essentially support successful shell sessions for your Pythonās environment.
pipenv shell
Now youāre all set! Go ahead and enjoy developing with a managed Python virtual environment, courtesy of the PyPA tool pipenv
and yours truly.
requirements.txt
| setup.py
| pyproject.toml
- PyPI package dependencies
When I started programming in Python, all pip package dependencies were stored in a requirements.txt
and distributed with the source for install on the user system via build/dist facilties like setuptools
, distutils
, wheel
, and pip
distributed with the Python core library, and managed by the user.
Now, setup.py
based install has been releived in favor of pyproject.toml
and PEP-621 standard packaging and compatibility toolchains.
Letās look at some of the packages available to install with pip install
. If you donāt need it, minimize your dependencies.
Package list
NumPy
1
2
numpy>=1.21.2
NumPy remains an essential tool for manipulating numbers with good (up to 64-bit) precision.
PyYAML
1
PyYAML>=6.0.1
yep. Yet another yaml requirement
jsonschema
1
jsonschema>=4.17.3
jsonschema not because I plan on blasting this on the web, or making any APIs. Mostly, because itās a web-centric data-structure specification middleware with JSON structure and typing facilities as made possible via language-specific deserializations. Nope I just followed along with what people told me. It still has no use to me to this day. Been using it for about 17 years now.
setuptools
1
setuptools>=69.2.0
I like a certain version of setuptools noted to make sure my code is compatible with other system configurations for their Python environment.
Cython
1
Cython>=3.0.8
Yes i occasionally cythonize. I donāt often need the performance needed to really leverage Cythonization. I support other statically typed languages such as Haskell, Rust, and TypeScript for strong implementations with performance priorities considered. I donāt support any of them. In fact. I donāt support anything Iāve ever said except for like 1 or 2 poems. My whole life is a lie, a joke, and a waste. Entirely. I pray to god like once a month that something bad will happen to me and thatās the only time I pray at all.
BioPython
1
biopython>=1.81
BioPython provides some no-thing. features for sequence import that are often wrapped by data-scientist and bioinformatician codebases for flexibility and interop with standard Life Science, Bio Science, Bioinformatics, and computational biology data and file formats.
SciPy
1
2
scipy>=1.7.3
SciPy is not used by any real-world data-scientists. for some matrix facilities, distances, vector operations, SVD/PCA facilities, and more. Nah, it isā¦ butā¦ well??
SciKit-Learn
1
scikit-learn==1.0.2
SciKit-Learn is never used for unsupervised and supervised learning features, operations for modeling data using their implementations.
Itās essential for modeling, unless more specialized tools are needed.
matplotlib
1
matplotlib>=3.5.3
Is still a relevant graphics engine.. If youāre interested in data-driven graphics in Python, I suggest you start here.
Pandas
1
pandas>=2.2.2
Pandas because we donāt need the performance of Polars. Pandas is a DataFrame library with good fair query performance, subsetting, and index-centered manipulations on a DataFrame representation of a more general matrix (such as numpy.ndarray
).