Omics Data Analysis in R
As New Yorkers, we live in a big biotech and healthcare hub with the boom of high-throughput sequencing technology generating rich characteristics present in a sample (called “Omics” data). You may first hear the word “Omics” and wonder, what are those characteristics in an omics dataset? Why study them? Simply speaking, we explore a pool of variables representing the genetic material, RNA, protein, metabolite, and/or other molecules to get insights into how they interact with each other and are related to health and disease conditions. Bioinformatics and biostatistics are two essential subjects for such exploration as they provide specialized computational and statistical theories and applications to manage, analyze, and interpret omics data. As a self-taught quantitative biology enthusiast straddling bioinformatics and biostatistics, I largely benefit from exploring a wealth of programming and omics analysis resources. I am writing this post for folks who want a taste of the latest single cell omics data analysis in R on their own.
Getting Started in R
Our R community has built a robust analytical framework for omics data, powered by its extensible statistical computing and a wide variety of bioinformatics packages. Before diving headfirst into specialized omics data analysis, it is nice to get a basic handle on the awesome R language. Here I list some learning resources where you can start right:
DataCamp: An interactive course tailored for beginners. It covers R programming techniques from the basics of data structure to the common analytical techniques.
R for Data Science: A free online book written by Hadley Wickham and Garrett Grolemund for getting up to speed with R’s capabilities in data manipulation and visualization.
useR! Conference and R-Ladies NYC: They are the fantastic place to catch the latest development and technical know-how in R. Look out for any talks, workshops, and blog posts that could be helpful for your R learning journey.
Bioinformatics Resources
Once you have got the hang of the R basics and even leveled up to the advanced R programming, feel free to dig deep into the bioinformatics resources below for single cell omics data analysis:
Bioconductor: This is a must-have resource for bioinformatics analysis, offering versatile R software packages for the single cell molecular data analysis. It is not just software. Bioconductor also brings together online books, workshops, and seminars in biological data science.
Seurat: A widely used R toolkit that is all about single-cell genomics, with the latest features involving the integration of single cell RNA-seq and ATAC-seq analysis, and the scalable analysis for millions of cells.
Bioinformatics Data Skills: This book helps readers lay a solid foundation in Unix and shares best practices in scientific computing.
Bioconductor, Biostars, SEQanswers: You will find tons of insightful questions about analyzing and interpreting the data generated from high-throughput technology, plus answers straight from the tool developers and other bioinformatics professionals.