Post

Why I do genomics projects and blogging

Bioinf. overview. What does coding for biology mean?

If you are curious about the latest developments in AI, machine-learning (ML), genomics, transcriptomics, bioinformatics and computational biology and why software development skills are essential for modern data science workloads, then you’re in good hands. Here I review my projects, skillsets, and describe the topics for the blog.

What is bioinformatics?

Bioinformatics is the study of the application of mathematics and statistical methods, as well as software engineering to solve the latest problems in genomics and beyond. We use programming languages and computer systems (typically Linux) to solve problems during the vetting/QC and analysis of biotechnological datasets.

If you’ve ever spoken with a bioinformatician or you are a practicing biologist, you’ll notice they talk about sequencing and sequences quite a bit. In biology, there are two types of informatial polymers in the human cell: amino-acids (proteins) and nucleic acids (DNA and RNA).

What is machine learning and AI?

Machine learning is an umbrella term encompassion methods for computation, recall, and prediction on inputs it hasn’t seen before. Basically, if you study the patterns long enough and encounter enough variance throughout a dataset, you may be able to understand and use those patterns in a predictive manner. Broadly speaking, Artificial Intelligence (AI) is a type of machine learning method related to use of architectures of neural networks such as the convolutional neural network, the recurrent neural network, and the transformer.

AI is on the mind of investors, researchers, engineers, and management alike. It can be used to augment researchers toolkits to produce code in less time, use pattern recognition and vector encodings to assess data in unforeseen ways, and it can interpret our real world to identify patterns that might be impossible for humans to discern.

Why is bioinformatics in the news?

mRNA vaccines have been in the news lately, and the technology revolves around packaging viral mRNA fragments and delivering them to the bloodstream for viral proteins and their epitopes to be created by infected cells, creating a strong immune response to the viral mRNA and its associated proteins.

By creating antibodies that pinpoint and target specific viral shapes, the immune system can create rapid responses to similar epitopes during a true infection.

Where else is bioinformatics in the news?

Bioinformatics is in the news whether you recognize it or not. DeepMind has created the AlphaFold AI artificial intelligence program using deep learning and transformer architectures along with over 170k protein structures from Uniprot PDB (Protein Data Bank) to train its neural network(s) to predict folding of unknown or newly sequenced proteins.

The Illumina sequencing company has come to dominate the conversation around sequencing whole genomes (WGS), epigenomes(ATACseq), exomes (WES), mRNA transcriptomes (RNASeq), transcription factor binding sites (CHIPseq), and more. By using a sequencing-by-synthesis method, where a fluorophore is ejected during the ligation and extension process, Illumina sequences can sequence billions of basepairs of DNA. An unprecedented explosion of biological information is at our hands and sequencing helps researchers uncover secrets of human genomics, viral evasion/adaptation mechanisms, biodiversity and ecology, and much more.

What is your background?

I write academic and professional software, starting in 2015, and my career has revolved around reading and interpreting analytical methods, genomic data, and biological regulatory, protein, and motif networks. Check out my websites about me for more.

This post is licensed under CC BY 4.0 by the author.