10 years of software development experience
See the sidebar"s dropdown for more "About Me" details.
With a deep passion for software development and data-driven problem-solving, I specialize in R, Python, and Linux to build efficient, scalable, and robust solutions. My expertise lies in developing data-centric applications, automating workflows, and optimizing system performance for research, analytics, and business intelligence.
Proficient in Python, I design and implement machine learning models, data pipelines, and backend services, ensuring efficiency and maintainability. In R with Bioconductor, I leverage statistical computing and visualization techniques to extract meaningful insights and develop analytical tools tailored to various industries. My fluency in Linux allows me to manage system environments, deploy applications, and automate processes, ensuring seamless integration across platforms.
I thrive in collaborative and innovative environments, constantly refining my skills to tackle complex challenges. Whether it’s scripting in Bash, optimizing code performance, or architecting scalable solutions for cloud infastructure using Docker, I am committed to building software that drives impact.
I am eager to contribute my technical expertise and problem-solving mindset to projects that push the boundaries of technology and data science.
Work experience in bioinformatics software development
My experience spans multiple roles in genomics, bioinformatics, and computational chemistry, with a strong focus on software development, data analysis, and cloud computing. At Bayer Crop Sciences (June-December 2020), I worked as a Genomics Pipeline Developer, leveraging AWS tools and custom in-house software to optimize and scale bioinformatics pipelines. My contribution included expanding a pipeline from five to twenty-four modules within six months, introducing advanced capabilities such as biosynthetic gene discovery, phylogenetic annotations, and genome completeness metrics. Additionally, I developed custom data parsers and json-schema specifications in Python.
During my tenure at Bristol Myers Squibb (2015-2019), I served as a Research Scientist II in Genomics and Computer-Assisted Drug Design (CADD). My work encompassed bioinformatics server administration, sequencer-to-data-pipeline automation, and AWS integration. I developed expertise in RNA sequencing technologies, dataset quality control, and statistical modeling while also creating a version control system for large deduplicated datasets. In the CADD domain, I enhanced a theoretical chemistry framework for molecular enumeration and parameter optimization across multiple chemotypes, refining computational drug discovery methods and time to discovery.
As a Graduate Research Assistant at the Papoutsakis Laboratory (2012-2015), I focused on microbial genomics and transcriptomics, using bioreactors for bacterial fermentation and RNA extraction. I designed a protocol for generating strand-specific Illumina RNA-seq libraries and ensured RNA quality with analytical checkpoints. My computational contribution included building a bioinformatics pipeline to process 1.5 billion paired-end Illumina reads, a custom genome browser written in Python Django and javascript. I produced a transcriptome assembly from diverse culture conditions. Additionally, I supported a research team with computational analyses to advance microbial genomics studies.
Overall, my skill set includes expertise in bioinformatics, software engineering, cloud computing (AWS), and computational drug design. I am proficient in multiple programming languages (Python, R, NodeJS, and SQL) and have experience with various computational frameworks and tools, including Schrodinger, Openeye, and ChemAxon. My contributions across academia and industry reflect a strong ability to integrate software development with biological and chemical research, optimizing data pipelines and computational workflows to advance scientific discovery.
Skills Developed
Engineering skills
- Computer Science MSc. (MSc. in Comptuer Science, concentration in bioinformatics, 3.96 GPA, UD 2015)
In addition to my considerable training in analytical/physical/organic chemistries and molecular biology and genomics, I possess very strong skills in programming and data science. With a bioinformatics major in graduate school at Univ. of Delaware, I have formal training in advanced calculus, machine learning and statistics, and software engineering.
Additionally, I have practiced many skills in industry, leading efforts in Bristol-Myers Squibb company initiatives such as legacy software development and maintainance, reproducible research, dashboard development, software application development, database management and data version control, and programming language adoption.
Software development
- 
    Cloud and containers: Amazon Web Services (AWS), Docker and related container technologies, containerd, OCI, kubernetes/k8s
- 
    Programming: Rust, Python/C-Python, R, R-Shiny, bioconductor and Rmarkdown/Quarto, bash, Javascript/NodeJS/Bun,emacs, D3.js, Julia, Perl,awk,grep,sed, LaTeX, Matlab, HTML5/CSS3, SQLite3/libSQL, MySQL/MariaDB, Oracle SQL, MS SQL Server, PostgreSQL, miniconda
- 
    Relational Databases and noSQL technologies: MongoDB, Microsoft SQL Server, Oracle SQL, MariaDB/MySQL, SQLite3/libSQL, PostgreSQL, SQLalchemy and Object-relational-mapping (ORM), Rust ORMs, sqlx, rdflib, snapshotting
- 
    Cloud Computing and Cloud Native: GCP, Azure, AWS, S3, Elastic Compute (EC2), CodeBuild/CodeDeploy, Elastic Container Repository (ECR), Elastic Container Service (ECS), Docker, IAM, kubernetes/k8s 
- 
    Networking: Slurm, Sun Gride Engine (SGE/UGE), firewalld, iptables, tmux, nixos
- 
    Systems Administration: systemd, Docker, Ansible/chef,tmux,nginxand Apache/httpdweb servers, Sun Grid Engine (SGE/UGE) and SLURM, bash programming, parallel
Statistics and data science
- 
    Statistics: p-values, Student t-test, Chi-Square test, Fisher’s Exact Test, Hypothesis testing, ANOVA/GLM/regression, linear algebra, distribution fitting, R/Rstudio/Rshiny, Bioconductor, discrete/continuous probability, multivariate statistical analysis, regression, least-squares estimation, PCA, clustering, classification, normalization, regularization, variance reduction, naive Bayes, random forest, XGBoost, 
- 
    Dimensionality reduction: PCA/SVD, Uniform Manifold Approximation and Projection (UMAP), t-SNE, canonical correlation analysis, 
- 
    Optimizers: Gradient descent, simplex, particle swarm, simulated annealing, self-organizing maps, others. 
- 
    Kernels: kernel trick, linear programming, matrix math, linear algebra, abstract algebra, discrete math 
- 
    Maching learning: perceptron, SVM, PCA, UMAP, t-SNE, Deep Neural Networks, Recurrent Neural Networks, and Artificial Intelligence (AI) recurrence relations, linear regression and formal OLS-family assumptions and caveats, SAS programming, R programming, Python programming 
- 
    Data Science: Python programming, numpy,pandas, Jupyter notebooks, Markdown, LaTeX/Pandoc, staticly-typed compiled languages (rustc/cargo,zig, Typescript, Scala/Java/Clojure, C extensions in Python and R), RAII, CUDA, Cython, anaconda, apline linux, Docker, kubernetes/k8s,
Bioinformatics
- 
    Sequence alignment: Hisat2, Bowtie2, Tophat, BWA, Blast, Blat, ClustalO, Hidden Markov Models, E-value 
- 
    Assembly: Cufflinks, Trinity, Velvet 
- 
    Illumina data processing: Fastqc, Samtools, Picard, Bedtools 
- 
    Miscellaneous: Tuxedo suite, samtools, DEseq, limma, Circos, PFAM, David