일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- EdgeR
- single cell analysis
- CUT&RUN
- scRNAseq analysis
- DataFrame
- drug muggers
- scRNAseq
- ChIPseq
- Bioinformatics
- MACS2
- julia
- PYTHON
- javascript
- cellranger
- github
- HTML
- ngs
- drug development
- 싱글셀 분석
- pandas
- matplotlib
- 비타민 C
- Batch effect
- js
- CSS
- python matplotlib
- single cell
- CUTandRUN
- Git
- single cell rnaseq
- Today
- Total
바이오 대표
[ Multi-omic Analysis ] Immune Response Against COVID-19: an -omics approach 본문
[ Multi-omic Analysis ] Immune Response Against COVID-19: an -omics approach
바이오 대표 2022. 1. 19. 00:08
여러 -omic 데이터를 통합하여 (more data, potential to more power) 우리는 complex biological big data 를 분석할 수 있다.
( Genomics, Transcriptomics, Proteomics, Epigenomics )
* Epigenome: DNA나 히스톤단백질에 직접 결합하여 유전자의 발현을 직접 조절하는 화학물질과 단백질의 집합체
Abstract
To understand the human responses to the virus (COVID-19), especially in the Immune response. 따라서, 해당 페이퍼의 목표는 COVID-19로인한 면역 반응에서 큰 역할을 하는 immune genes를 발견하는 것이다.
사용한 Multiomic Datasets:
- Microarray (E-MTAB-8871)
- RNA-seq (E-MTAB-9221)
- RNA-seq (GSE152418)
- Chip-seq (GSE108881)
Introduction
면역반응은 [1] innate immune response [2] adaptive immune system로 나뉠 수 있다. 반응은 T cell 과 B cell 에 의해 일어나고 T Cell에는 CD4과 CD8 T cell (NK) 이 있다. CD4 T cell (helper T cell)은 cytokines을 분비해서, 면역 반응을 불러온다. 하지만 해당 과정에서 cytokine이 대량 분비시 문제가 될 수 있다.
# Used -Omics Data
Microarray | # Whole Blood Cells: Erythrocytes, Leukocytes, and platelets - Transcriptomics profiles of blood sampled via NanoString Human Immunology V2 Panel - Negative and positive controls in probe sets(for normalization) - Time series samples |
RNA-seq (PBMC samples) |
# Peripheral Blood Mononuclear Cells (PBMCs): Lymphocytes - Sequencing w/ a single Illumina HiSeq2000 flow cell - Covid patients in four stages (convalescent, moderate, severe, and intensive care unit) |
RNA-seq (organoid) |
# Organoids (Lung Epithelial cells) infected with COVID - Sequencing w/ Illumina NovaSeq 6000 |
ChiP-seq | # Calu3 (human lung cancer cell) infected w/ MERS-Cov (middle east respiratory syndrome coronavirus) - PMS (peripheral blood smear) sonicated and immunoprecipitated w/ anti-H3K4me, anti-H3K27me3 * H3kme3: epigenomic modification on Histon H3 for gene expression regulation (발현 촉진) * H3K27me3: associated with downregulation (발현 억제) |
* No nuclei: Erythrocytes, platelets
* Mononuclei: Lymphocytes (T cells, B cells, NK cells)
* Multi-lobed nuclei: Granulocyte ( Neutrophils,basophils, eosinophils)
* White blood cell: Blood cells (except Erythrocytes or platelets)
Analysis Methods
# Used R-packages and Analysis Process
Preprocessing & Expression Analysis | Gene Set analysis | |
Microarray | NanoStringR(nanoR) [0] rcc files [1] QC, background correction, normalization * background correction by SD * geometric mean for positive control normalization * housekeeping normalization [2] time-series categorization limma [3] Differential Expression Analysis [4] FDR Correction (Benjamini-Hochberg) |
goana (limma) - to determine Gene Ontology term kegga (limma) - to find over-represented pathways in DEG |
RNA-seq (PBMC samples) |
STAR [0] Reads, mapped to human genome (GRCH38) - counted w/ STAR using htseq-count EdgeR [1] DGE - Normalization - DE through a quasi-likelihood F-test [2] FDR Correction (Benjamini-Hochberg) |
" |
RNA-seq (organoid) |
SRA toolkit [1] SRA (Sequence Read Archive) --> Fastq [2] QC, trimming (Trimmomatic) [3] Aligned onto GRCH38 (Kallisto) EdgeR [4] Count Normalization [5] Statistical analysis |
" |
ChiP-seq | SRA toolkit [1] SRA (Sequence Read Archive) --> Fastq [2] QC, trimming (Trimmomatic) [3] Aligned onto GRCH38 (STAR) [4] Peak Callings (MACS2) [5] MACS2 broad peak file --> GRanges [6] Identify the overlap peaks w/ H3K27me3/H3K4me3 |
" |
* Type 1 error (FDR: False Discovery rate) Correction # 틀렸는데 맞다고 판단
[1] Benjamini-Hochberg (BH) https://www.youtube.com/watch?v=K8LQSvtjcEo&t=633s
[2] Bonferroni ( original p-vale/ # of test performed)
* GRCH38: Genome Reference Consortium Human Build 38
* htseq-count: counts for each gene how many aligned reads overlap its exons
* Ensembl ID 또는 Entrez ID를 알고 있을 때, 그와 관련된 정보들을 org.Hs.eg.db를 통해 annotation 할 수 있다.
# Used Visualization
plotPCA/ plotMDS |
Shows correlation or clustering by dimensionality-reduction | |
volcanoPlot | scatter plot that shows statistical significance (P-value) vs FC | |
plotBCV (edgeR) |
Shows estimate Tagwise, Common, Trend dispersions * Tagwise: allow for a different value for the dispersion to be used for each gene |
|
MA plot | log FoldChange vs Average Expression Level | |
pheatmpap | Shows the magnitude of a phenomenon as color |
|
Conclusion
각각의 데이터에서는 HLA, Immune-related genes (immunoglobulin fragments (IgG receptor, IgA, IgM), B cell receptor, interleukin gene) 같은 protein-coding genes 들이 expressed more. 전체적인 결과는 다음과 같다.
[1] 모든 데이터셋에서 공통적으로 의미 있는 gene = Cytokine (HLA-DPA1, PTGER4, NFIL3)
[2] gene set analysis 를 통해 감염 환자들에게서 mitochondria 와 oxidative phosphorylation 가 바뀐 점이 확인되었다.
네개의 데이터가 not that compatible 이여서 생각만큼 powerful 한 결과를 얻지 못했지만 좀더 연관성있는 데이터들을 이용한다면 보다 크게 more data, more powerful result 를 얻을 수 있을 것이다.