바이오 대표

[ Multi-omic Analysis ] Immune Response Against COVID-19: an -omics approach 본문

My works

[ Multi-omic Analysis ] Immune Response Against COVID-19: an -omics approach

바이오 대표 2022. 1. 19. 00:08

 

여러 -omic 데이터를 통합하여 (more data, potential to more power) 우리는 complex biological big data 를 분석할 수 있다. 

( Genomics, Transcriptomics, Proteomics, Epigenomics )

* Epigenome: DNA나 히스톤단백질에 직접 결합하여 유전자의 발현을 직접 조절하는 화학물질과 단백질의 집합체

 

 

 Abstract

To understand the human responses to the virus (COVID-19), especially in the Immune response. 따라서, 해당 페이퍼의 목표는 COVID-19로인한 면역 반응에서 큰 역할을 하는 immune genes를 발견하는 것이다.

 

사용한 Multiomic Datasets:

  • Microarray (E-MTAB-8871)
  • RNA-seq (E-MTAB-9221)
  • RNA-seq (GSE152418)
  • Chip-seq (GSE108881)

 

 Introduction 

면역반응은 [1] innate immune response [2] adaptive immune system로 나뉠 수 있다. 반응은 T cell 과 B cell 에 의해 일어나고 T Cell에는 CD4과 CD8 T cell (NK) 이 있다. CD4 T cell (helper T cell)은 cytokines을 분비해서, 면역 반응을 불러온다. 하지만 해당 과정에서 cytokine이 대량 분비시 문제가 될 수 있다. 

 

# Used -Omics Data 

Microarray  # Whole Blood Cells: Erythrocytes, Leukocytes, and platelets
 - Transcriptomics profiles of blood sampled via NanoString Human Immunology V2 Panel
 - Negative and positive controls in probe sets(for normalization)
 - Time series samples
RNA-seq
(PBMC samples)
 # Peripheral Blood Mononuclear Cells (PBMCs): Lymphocytes
 - Sequencing w/ a single Illumina HiSeq2000 flow cell
 - Covid patients in four stages (convalescent, moderate, severe, and intensive care unit)
RNA-seq
(organoid)
 # Organoids (Lung Epithelial cells) infected with COVID
 - Sequencing w/ Illumina NovaSeq 6000
ChiP-seq  # Calu3 (human lung cancer cell) infected w/ MERS-Cov (middle east respiratory syndrome coronavirus)
 - PMS (peripheral blood smear) sonicated and immunoprecipitated w/ anti-H3K4meanti-H3K27me3
   * H3kme3: epigenomic modification on Histon H3 for gene expression regulation (발현 촉진)
   * H3K27me3: associated with downregulation (발현 억제) 

* No nuclei: Erythrocytes, platelets 

* Mononuclei: Lymphocytes (T cells, B cells, NK cells)

* Multi-lobed nuclei: Granulocyte ( Neutrophils,basophils, eosinophils)

* White blood cell: Blood cells (except Erythrocytes or platelets)

 

 

 Analysis Methods  

# Used R-packages and Analysis Process 

  Preprocessing & Expression Analysis Gene Set analysis
Microarray  NanoStringR(nanoR)
 [0] rcc files
 [1] QC, background correction, normalization
      * background correction by SD
      * geometric mean for positive control normalization
      * housekeeping normalization
 [2] time-series categorization 
 limma
 [3] Differential Expression Analysis 
 [4] FDR Correction (Benjamini-Hochberg)
 goana (limma)
 - to determine Gene Ontology term

 kegga (limma)
 - to find over-represented pathways in DEG

RNA-seq
(PBMC samples)
 STAR
 [0] Reads, mapped to human genome (GRCH38
       - counted w/ STAR using htseq-count 
 EdgeR
 [1] DGE
      - Normalization
      - DE through a quasi-likelihood F-test
 [2] FDR Correction (Benjamini-Hochberg)
"
RNA-seq
(organoid)
 SRA toolkit 
 [1] SRA (Sequence Read Archive) --> Fastq
 [2] QC, trimming (Trimmomatic)
 [3] Aligned onto GRCH38 (Kallisto
 EdgeR
 [4] Count Normalization
 [5] Statistical analysis
"
ChiP-seq  SRA toolkit 
 [1] SRA (Sequence Read Archive) --> Fastq
 [2] QC, trimming (Trimmomatic)
 [3] Aligned onto GRCH38 (STAR)
 [4] Peak Callings (MACS2)
 [5] MACS2 broad peak file --> GRanges
 [6] Identify the overlap peaks w/ H3K27me3/H3K4me3
"

* Type 1 error (FDR: False Discovery rate) Correction    #  틀렸는데 맞다고 판단 

[1] Benjamini-Hochberg (BH)  https://www.youtube.com/watch?v=K8LQSvtjcEo&t=633s 

[2] Bonferroni  ( original p-vale/ # of test performed)

 

* GRCH38: Genome Reference Consortium Human Build 38

* htseq-count: counts for each gene how many aligned reads overlap its exons

* Ensembl ID 또는 Entrez ID를 알고 있을 때, 그와 관련된 정보들을 org.Hs.eg.db를 통해 annotation 할 수 있다.

 

# Used Visualization

plotPCA/
plotMDS
 Shows correlation or clustering by dimensionality-reduction
volcanoPlot  scatter plot that shows  statistical significance (P-value) vs FC 
plotBCV
(edgeR) 
 Shows estimate Tagwise, Common, Trend dispersions
 
 * Tagwise: allow for a different value for the dispersion to be used for each gene
MA plot   log FoldChange vs Average Expression Level
pheatmpap  Shows the magnitude of a phenomenon as color

 

 

 Conclusion 

각각의 데이터에서는 HLA, Immune-related genes (immunoglobulin fragments (IgG receptor, IgA, IgM), B cell receptor, interleukin gene) 같은 protein-coding genes 들이 expressed more. 전체적인 결과는 다음과 같다.

[1] 모든 데이터셋에서 공통적으로 의미 있는 gene = Cytokine (HLA-DPA1,  PTGER4, NFIL3)

[2] gene set analysis 를 통해 감염 환자들에게서 mitochondria 와 oxidative phosphorylation 가 바뀐 점이 확인되었다. 

네개의 데이터가 not that compatible 이여서 생각만큼 powerful 한 결과를 얻지 못했지만 좀더 연관성있는 데이터들을 이용한다면 보다 크게 more data, more powerful result 를 얻을 수 있을 것이다. 

* 아쉬운점: 해당 프로젝트는 COVID-19 가 발생한지 (2019.12) 일년도 채 되지 않았을 때 (2020.10) 진행하였기에 충분한 데이터가 부족하였다.