The overall mission of the Biodata Mining and Discovery Section is to assist and to participate in biomedical research with Data Science and Bioinformatics approaches in support of the ultimate research goals of the NIAMS IRP.
We are currently integrating NGS data processing and data analysis related computational methods into a wide range of biological and biomedical studies, focusing on WES, ChIP-Seq, ATAC-Seq, RNA-Seq, and Single Cell RNA-Seq based research projects.
We also develop new data analysis strategies and customized computational solutions, and conduct research to evaluate emerging methods and techniques in the rapidly evolving field of applied bioinformatics and computational biology.
Since 2015, our team accomplished the following:
- Provided bioinformatics support to over 160 researchers from 47 different labs
- Processed and analyzed NGS data from more than 15,000 samples
- Co-authored 58 research publications
- Provided one-on-one data analysis training and mentoring to more than 70 researcher
The tools and utilities listed below have been developed by our team to support the ongoing research of the NIAMS.
- PAPST (Peak Assignment and Profile Search Tool)
- A Java desktop program for both gene centric and peak centric ChIP-Seq and ATAC-Seq data analysis. Learn more about the program.
- DNA-Seq/Mutational data analysis pipeline.
- A genetics data analysis pipeline, initially developed and used in a paper published in the New England Journal of Medicine.
- ChIP-Seq data analysis pipeline
- A Snakemake pipeline for fundamental ChIP-Seq data analysis including trimming, mapping, peak calling, and bigWig file generation for data visualization.
- ATAC-Seq data analysis pipeline
- A customized Snakemake pipeline for ATAC-Seq data analysis, initially developed and implemented for analyzing data in a paper published in Cell.
- RNA-Seq data analysis pipeline
- A Snakemake pipeline for fundamental RNA-Seq data analysis including trimming, mapping, PCR effect assessment, and gene expression value calculation.
- scRNA-Seq data analysis pipeline
- A Cell Ranger based pipeline for 10x genomics single cell data processing and analysis, initially implemented for analyzing data in a paper published in Nature Immunology.
- CITE-Seq data analysis pipeline
- A Cell Ranger based pipeline implemented for customized CITE-Seq data processing and analysis.
- HiC data analysis pipeline
- A JUICER based pipeline that performs PE read alignment, data filtering, data binning, and data normalization.
- Bi-Seq data analysis pipeline
- A Bismark package based pipeline for DNA methylation data analysis.
- Enrichr based pathway analysis R code
- An R code that facilitates Enrichr based pathway analysis and visualization.