[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1

Notice

Recent Posts

Recent Comments

Link

Link to blog "한 사람의 일상"

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

바이오 대표

[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1 본문

Bioinformatics/Tools

[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1

바이오 대표 2023. 3. 27. 13:28

Cell Ranger 이란? (v7.1)

Illumina 의 Chromium single cell data 를 align 하고 feature-barcode matrics를 만들고, clustering secondary analysis 등을 하기 위한 분석 파이프라인 set 입니다. Cell ranger을 이용해서 데이터를 핸들링 할때 크게 다음 5개의 파이프라인 사용이 가능합니다.

cellranger mkfastq
Demultiplexing BCL(raw base call) files → FASTQ files bcl2fastq
cellranger count
Illumina single cell FASTQ가 주어진다면 가장 흔하게 사용되는 파이프라인입니다. Alignment, filtering, barcode counting, UMI counting 을 포함하고 있으며 이를 이용해서 feature-barcode matric (count)생성합니다.
cellranger multi
Multiplexing (데이터들이 합쳐져 있어서 demultiplexing이 필요한 경우) 데이터를 분석하기위한 파이프라인입니다.
cellranger aggr
normalize count output (sequencing depth) and combine
cellranger reanalyze
counts 데이터에 parameter setting 을 조정하여 다시 분석 할 수 있습니다.

** 보통은 1 GEM → 1 Library (or 1 sample)

** 10x 홈페이지에서 설명하는 1 sample, multiple GEM 경우, 단일 샘플로 실험을 2회 이상 진행하는 것을 의미합니다. 단일 세포 경우 몇십만~몇백만 세포를 가지고 있는데, 실제 1개의 GEM 에서는 보통 약 1~2만개의 세포데이터를 확보하기에 sequencing depth를 늘려주기 위해서 여러번 (multiple flow cells) sequencing 하는 것입니다.

Input FASTQ 형식

[Sample Name]S1_L00**[Lane Number]**[Read Type]_001.fastq.gz

Where Read Type is one of:

I1: Sample index read (optional) - 8nt i7 sample index
I2: Sample index read (optional)
R1: Read 1 - cell barcode (16nt) + UMI seq (10nt)
R2: Read 2 - cDNA (98nt 3’ to 5’)

illumina sequencer 마다 정보가 조금씩 다를 수 있다. 위에 적은 설명은 single cell 3’ v2 관련한 설명입니다.

https://davetang.org/muse/2018/06/06/10x-single-cell-bam-files/ 여기서 bam 관련 설명 강추!!
expression/documentation/steps/sequencing/sequencing-requirements-for-single-cell-3 https://davetang.org/muse/2018/06/06/10x-single-cell-bam-files/

Gene Expression Algorithms Overview

Cell ranger 에서 alignment, trimming, and counting 을 할때, 어떤 과정을 거치는지 조금 더 자세히 다뤄보겠습니다.

** Intron = intragenic regions: located within protein-coding genes but are removed before a protein is made (include regulatory elements such as enhancers)

** intergenic regions ~ located between genes ~ junk DNA

Alignment

Read Trimming

Full length cDNA construct 에는 5’ end 에 30bp template switch oligo (TSO) sequence 가 3’ 에는 poly-A 가 달려있습니다.. Alignment 전에 TSO 와 3’ poly-A를 제거해줌으로써, sensitivity of the assay 를 향상시켜줍니다.

** BAM 파일의 ts:i , pa:i 는 각각 # of trimmed TSO , # of trimmed poly-A-nucleotides 을 보여줍니다.

Genome Alignment

STAR (splicing-aware alignments of reads) 을 이용하여 alignning 하고, GTF (transcript annotation)을 이용하여 해당 reads 가 exonic, intronic, intergenic 중 어디에 속하는지 나눠줍니다. 이때, reads 의 50% 이상이 exon 부분과 교차한다면 exonic으로 분류되고, 나머지중 intron과 교차되는 reads는 intronic 나머지는 intergenic으로 분류됩니다.

MAPQ adjustment

MAPQ =255 이면 reads 가 확실히 exonic locus 라는 의미입니다.

Transcriptome alignment

하나의 gene 에 mapping 된 reads 만 이용하여서 UMI counting 을 진행합니다.
include-introns = false 설정 가능합니다.

10x barcode correction

읽힌 barcode 와 알고있는 barcode sequence (barcode whitelist file) 를 비교해 correction 을 진행해줍니다. 만약에 읽힌 barcode 가 whitelist 에 존재하지 않고 one Hamming distance 차이가 난다면, whitelist barcode 랑 비교하여 가장 posterior probability 가 높은 barcode 로 바꿔줍니다.

BAM → CR: original barcode , CB: corrected barcode

UMI counting

transcriptome 에 mapped 된 reads 들을 한그룹으로 다 모으는데 이때, 다른 그룹과의 UMI 가 같거나 one hamming distance apart 이면 error 라고 생각하고 UMI with higher support 로 바꾸어줍니다. UMI 수정후에 reads 들을 bacode, UMI, gene annotation을 이용해서 그룹핑 해주는데 이때 같은 barcode 와 UMI 가 같은 reads들이 다른 gene annotation을 갖고있다면, 그중 most supporting reads 만 UMI counting 에 이용됩니다. 만약에 한 reads 가 뚜렷하게 더 좋지 못하면, 해당 read groups을 다 버립니다.

위의 2가지 filtering step → unfiltered feature-barcode matrix

** filtered feature-barcode matrix 는 only detected cell-associated barcodes 만을 포함합니다.

저작자표시

'Bioinformatics > Tools' 카테고리의 다른 글

[ Peak Annotation] HOMER 이용하여 Peak annotation 하기 - annotatePeaks.pl/loadGenome.pl (0)	2023.05.16
[SEACR 논문] “Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling” (0)	2023.04.06
[ Cut & Tag / Cut & Run ] Cut & Tag 투토리얼 (0)	2023.03.18
[ NSG QC / trimming ] TrimGalore (0)	2023.02.14
[ 싱글셀 분석 ] cellranger-atac aggr (0)	2023.02.03

'Bioinformatics/Tools' Related Articles

바이오 대표

[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1 본문

[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1

Cell Ranger 이란? (v7.1)

Gene Expression Algorithms Overview

Alignment

'Bioinformatics > Tools' 카테고리의 다른 글

티스토리툴바