Benchmarking Aligners

1 minute read

Published: December 23, 2024

Short-read aligners make up the core of modern bioinformatics program technologies. The aligners are responsible for mapping (rather than aligning) short reads (typically produced by Illumina sequencers) to their most likely originating loci with gapped and ungapped, genomic and transcriptomic methods. For lists of both aligners and short-read mappers and their pros/cons, see the Wikipedia article.

Why bother benchmarking decade old software?

The short answer is: because these programs are still heavily used in bioinformatics groups even in 2024! Short read aligners are immensely popular software with userbases in the 10s of thousands of users, and are notably fast at approximate alignment (often referred to as read mapping) to a reference genome or transcriptome.

How this analysis works

For proper comparison, I’ll be comparing 5 or 6 short read aligners, all of which can be installed using the ‘Anaconda’ Python environment. Install your choice of aligner as follows.

conda install -c bioconda bowtie2

Next check out the Github repository. Included is a miniconda 3.12 environment.yml file which should recapitulate my Anaconda/miniconda environment. Also included is a shell script (-? –help) for running the analysis. PicardTools features a command called ‘CollectAlignmentSummaryMetrics’ and is the simplest method to check percentage aligned in tabular format for consistent definitions of alignment across the aligners.

Share on

Twitter Facebook LinkedIn

Thoughts on ML deployments, containerized workflows, and notebooks.

6 minute read

Published: February 06, 2025

This article hopes to bring the reader up to date (ca. 2017-2022) on modern cloud-native and scalable solutions for data science and natural science research application stacks using the Docker container standard for container specification (vs Singularity, Podman, or containerd containers that are equally valid). First I will provide a brief description of the goal of Docker containers. Next I’ll touch on the kubernetes architecture for distributed data processing and application service management. Finally, I’ll describe code repository, container registries, and Markdown/Rmarkdown/LaTeX documentation as it purtains to a service’s lifespan w.r.t. notebooks and documentation of custom services and their orchestration.

Matt Ralston

Benchmarking Aligners

Why bother benchmarking decade old software?

How this analysis works

Share on

You May Also Enjoy

Migrant Workers Playbook

Migrant Workers Playbook Es

Debunking Coronavirus Conspiracy Theories

Thoughts on ML deployments, containerized workflows, and notebooks.