[싱글셀 multiome 논문)“Dictionary learning for integrative, multimodal and scalable single-cell analysis” 2023

Notice

Recent Posts

Recent Comments

Link

Link to blog "한 사람의 일상"

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

바이오 대표

[싱글셀 multiome 논문)“Dictionary learning for integrative, multimodal and scalable single-cell analysis” 2023 본문

논문

[싱글셀 multiome 논문)“Dictionary learning for integrative, multimodal and scalable single-cell analysis” 2023

바이오 대표 2023. 6. 11. 11:08

요약

보통 reference cell type annotation은 scRNAseq의 gene expression 을 기반으로 만들어졌다. 따라서 새로운 데이터의 scRNAseq을 annotate하는 것은 어느정도 쉽지만 다른 modality 갖는 데이터, 즉 gene expression이 아닌 정보를 갖는 데이터 (scATACseq, scCUT&Tag, CyTOF-protein)를 annotate하는데는 어려움이 있다. 따라서 해당 논문에서는 “dictionary learning” 기술을 이용한다. 쉽게 설명하자면 예를 들어, [1] RNA+ATAC정보를 담고있는 10x multiome 을 bridge로 이용하는 것이다. 현존하는 [2] RNA annotated reference 를 [1]scRNA+scATAC정보를이용해서 다른 [3]scATAC 데이터의 세포를 annotate하는 방법이다.

Dictionary learning

Dictionary learning 은 representation learning 의 형태를 띈다. Representation learning 은 최근 이미지나 유전자 데이터 분석에서 많이 사용되는 방법으로써, input data를 분석하기 쉽게 다른 형태 데이터를 변형을 해주는 과정이 포함되어 있다. Dictionary learning에서는 multiome [1]scRNA+scATAC의 세포 하나를 atom이라는 단어로 정의하고 이들을 새로운 dictionary-defined space에 표현 (dictionary representation=weighted linear combination of atoms)한다. 이제 reference 데이터와 [1]scRNA 와 annotate을 하고 싶은 다른 modality를 갖는 데이터 [3] scATACseq 를 위에서 정의한 dictionary representation으로 데이터를 표현한다. 그러면 [1] 과 [3]이 같은 feature로 표현이 되었으니 해당 정보를 이용해서 align/annotate할 수 있다.

여기서 Multiome 데이터를 L 이라는 공간에 atom level로 표현이 된 세포들을 이용해서 scRNAseq (Dx) 과 scATACseq (Dy)를 재구성 하였다. (Lx, Ly)

Dictionary learning 장점:

이전에는 scATACseq 데이터를 annotate 하려면 chromatin accessibility 가 높으면 gene expression이 높을거라는 가정을 바탕으로 가짜 gene expression 을 만들어서 scRNAseq reference를 mapping 하였다. dictionary learning은 이러한 가정을 하지 않고 데이터를 다룬다는 장점이 있다.
dictionary representation으로 표현도면 현존하는 cell annotation tools(Harmony, mnnCorrect, Surat, Scanorama or scVI) 과 호환된다.
multiome 데이터 즉 atom 갯수가 많으면 computational burden이 클수 밖에 없다. 따라서, eigen decomposition을 구하여, atom dimensionality 를 selected eigen vectors로 줄인다.

**supplementary methods에 자세한 설명이 나와있다고 하는데 어디있는지 찾지 못하고 있다.. 찾으신분 댓글남겨주시면 감사하겠습니다.

실제 실험 예시 (scATACseq annotation)

Mapped scATACseq 이 본래 논문에서의 cluster/annotate 정보를 다 포함하고 있을 뿐더러, CD14+ vs CD15+ 와 같이 세포수가 적거나 high-resolution subpopulation도 확인할 수 있었다 (unsupervised scATACseq에서는 발견되지 않는다.)

a. scRNAseq reference (’Azimuth reference’;297,627 cells) of human BMMCs b. scATACseq of human BMMCs. c. mapped scATACseq. 해당 transform을 할때 사용된 multiome 은 10x 데이터셋으로 NeurlIPS 2021에 공개되어 있다.

세포가 백만단위 일때:

Seurat v5 에서 발표한 sketch 방법을 이용한다.

sketch of cells (5000개의 세포 from each dataset) 추출
Learn Dictionary representation - 이때 각 데이터셋에서 각자 구한다 (for parallel processing)
데이터셋들을 합친다 (integration)
harmonized atoms 구한다 (dictionary representation)
annotating

** PCA는 아직까지는 모든 데이터를 이용하여 해야한다.

저작자표시

'논문' 카테고리의 다른 글

[ 싱글셀 논문 ] Gene Network 모델 (SCENIC vs WGCNA) “Single-cell network biology for resolving cellular heterogeneity in human diseases” (2020) (2)	2023.07.22
[싱글셀 논문] Background/Ambient RNA 제거 툴 비교 (SoupX, DecontX, Cellbender) “The effect of background noise and its removal on the analysis of single-cell expression data” 2023 (2)	2023.07.10
[scRNAseq 논문] 싱글셀 Seurat v4 (“Integrated analysis of multimodal single-cell data” 2021 (1)	2023.03.13
[scRNAseq 논문] 싱글셀 데이터 핸들링 (10x genomics pipeline - cellranger) “Massively parallel digital transcriptional profiling of single cells” 2016 (1)	2023.03.05
[scRNAseq 논문] 싱글셀 batch integration “A benchmark of batch-effect correction methods for single-cell RNA sequencing data” 2020 (0)	2023.02.26

'논문' Related Articles

바이오 대표

[싱글셀 multiome 논문)“Dictionary learning for integrative, multimodal and scalable single-cell analysis” 2023 본문

[싱글셀 multiome 논문)“Dictionary learning for integrative, multimodal and scalable single-cell analysis” 2023

요약

Dictionary learning

실제 실험 예시 (scATACseq annotation)

'논문' 카테고리의 다른 글

티스토리툴바