[ Peak Annotation] HOMER 이용하여 Peak annotation 하기

Notice

Recent Posts

Recent Comments

Link

Link to blog "한 사람의 일상"

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

바이오 대표

[ Peak Annotation] HOMER 이용하여 Peak annotation 하기 - annotatePeaks.pl/loadGenome.pl 본문

Bioinformatics/Tools

[ Peak Annotation] HOMER 이용하여 Peak annotation 하기 - annotatePeaks.pl/loadGenome.pl

바이오 대표 2023. 5. 16. 08:42

HOMER 을 이용해서 Peak annotation 하는 방법

HOMER v4.11

annotatePeaks.pl <peak/BED file> <genome>  >  <output file>

<peak/BED file> BED 파일 이나 HOMER peak files

BED files 는 최소 6개의 칼럼으로 구성되어야 한다고 하지만 column 4 개(chr, start, end, strand) 도 허용한다.
- Column1: chromosome
- Column2: starting position
- Column3: ending position
- Column4: Unique Peak ID
- Column5: not used
- Column6: Strand (+/- or 0/1, where 0="+", 1="-")
HOMER peak files (TAB으로 분리되는 txt files 형식)
- Column1: Unique Peak ID
- Column2: chromosome
- Column3: starting position
- Column4: ending position
- Column5: Strand (+/- or 0/1, where 0="+", 1="-")

< genome > 은 homer 가 가지고 있는 데이터 베이스를 이용하거나 custom genome을 이용할 수도 있다. HOMER 은 자체적으로 Human (hg18, hg19, hg38), Mouse (mm8, mm9, mm10), Rat (rn4, rn5, rn6), Frog (xenTro2, xenTro3), Zebrafish (danRer7), Drosophila (dm3), C elegans (ce6, ce10), S. cerevisiae (sacCer2, sacCer3), pombe (ASM294v1), Arabidopsis (tair10), Rice (msu6) 을 지원한다. 다음 단락에에서는 custom genome을 이용하는 방법또한 소개하고자 한다.

HOMER에 내재되어 있는 데이터를 다운받고자 하면 perl path-to-homer/configureHomer.pl —install <genome> (ex, ./configureHomer.pl —install hg38) 을 이용 할 수 있다.

< output file> 은 먼저 seperate program(assignGenomeAnnotation)을 이용해서, 기본적으로 input peak 이 genome의 어디 부분에 해당되는지에 대한 정보를 구한다.

“Basic annotation”: TSS (by default defined from -1kb to +100bp), TTS (by default defined from -100 bp to +1kb), CDS Exons, 5' UTR Exons, 3' UTR Exons, Introns, Intergenic
“Detailed annotation” : CpG islands 정보와 repeats 정보가 추가된다.

⇒

Peak ID Chromosome Peak start position Peak end position Strand Peak Score

FDR/Peak Focus Ratio/Region Size	Annotation (i.e. Exon, Intron, ...)	Detailed Annotation (Exon, Intron etc. + CpG Islands, repeats, etc.)	Distance to nearest RefSeq TSS	Nearest TSS: Native ID of annotation file	Nearest TSS: Entrez Gene ID
Nearest TSS: Unigene ID	Nearest TSS: RefSeq ID	Nearest TSS: Ensembl ID	Nearest TSS: Gene Symbol	Nearest TSS: Gene Aliases	Nearest TSS: Gene description
Additional columns depend on options selected when running the program.

** 경험상 HOMER, CHIPEAKANNO, 등을 사용해보았을때, homer 가 가장 많은 peak DB를 갖고있다. 참고: https://www.nature.com/articles/s41598-017-02464-y

Custome genome 사용법

loadGenome.pl

HOMER 에서는 나의 custom genome을 다른 여러 functions들에서도 사용할 수 있도록, HOMER 형식 data format으로 만들어 /path-to-homer/data/genome 에 저장할 수 있는 기능을 가지고있다. http://homer.ucsd.edu/homer/introduction/update.html 해당 문서에 자세한 내용이 나와있다.

예시)

loadGenome.pl -name "acahirinus" -org mouse -fasta $FASTA -gtf $GTF -tid -force > acahirinus.out
# -tid: GTF의 transcript_id를 이용
# -gid: GTF의 gene_id를 이용해서 tanscripts 확인

이렇게 loadGenome.pl 을 이용하면 /path-to-homer/data/genome 에 “acahirinus”라는 폴더안에 데이터가 생성된다.

$ tree acahirinus/
acahirinus/
|-- acahirinus.basic.annotation
|-- acahirinus.rna
|-- acahirinus.tss
|-- acahirinus.tts
|-- annotations
|   |-- basic
|   |   |-- exons.ann.txt
|   |   |-- introns.ann.txt
|   |   |-- promoters.ann.txt
|   |   `-- tts.ann.txt
|   |-- custom
|   `-- repeats
|-- genome.fa
`-- preparsed

이제 생성된 homer data genome을 이용해서 다른 function 에 이용할 수 있다. 예시로 annotatePeaks.pl <peak/BED file> acahirinus > <output file> 하지만 해당 식을 이용하면 fully annotated 된 모습을 확인 할 수 없었다 w/ ‘Could not find full/detailed annotation file (path-to-homer/data/genomes/acahirinus//acahirinus.annotation)’. Loadgenome 으로 생성된 데이터에 문제가 있나 싶어, hg38 데이터를 다운 받고 확인해보니, 약간 다른 모습을 확인할 수 있다. 여기에는 acahirinus에는 존재하지 않았던 hg38.full.annotation이 존재한다. 이때문에 Output file에 TSS,TTS,exon,intron등과 관련된 정보는 나오지만 관련 gene name, id 등이 Missing 되었었다. 그래서 이런저런 방법으로 열심히 시도해본결과,,,

$ tree -x hg38/
hg38/
|-- annotations
|   |-- basic
|   |   |-- exons.ann.txt
|   |   |-- introns.ann.txt
|   |   |-- promoters.ann.txt
|   |   `-- tts.ann.txt
|   |-- custom
|   `-- repeats
|-- chrom.sizes
|-- genome.fa
|-- hg38.aug
|-- hg38.basic.annotation
|-- hg38.full.annotation
|-- hg38.miRNA
|-- hg38.repeats
|-- hg38.rna
|-- hg38.splice3p
|-- hg38.splice5p
|-- hg38.stop
|-- hg38.tss
|-- hg38.tts
`-- preparsed
    |-- hg39.200.cgbins
		....

Custome genome 이용해서 annotation 하는 법

역시나 그냥 documentation을 잘 따라하면 되는 것이 였다. Custom annotation을 이용해서 annotate.pl을 할 때에는 -gtf $GTF를 추가적으로 붙여줘야한다.

annotatePeaks.pl <peak/BED file> <genome> -gtf <GTF>  >  <output file>
# ex) annotatePeaks.pl test.bed acahirinus -gtf acahirinus.gtf  >  test.txt

** Homer의 functions들을 LINUX 환경 어디에서나 사용하고 싶다면 다음과 같이 path 설정을 해주는 방법도 있다. 훨씬 편하다.

PATH="$HOME/opt/homer/bin:$PATH"

저작자표시 (새창열림)

'Bioinformatics > Tools' 카테고리의 다른 글

[ IGV ] IGV 에 custom annotation track 만들고 업로드 하기 (유전자 이름으로 검색) (1)	2023.07.11
[ FRiP 스코어 ] BEDTools, Samtools 이용해서 FRiP 스코어 구하기 (0)	2023.06.27
[SEACR 논문] “Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling” (0)	2023.04.06
[ 싱글셀 분석 ] 10x Cell ranger 정복하기 1 (1)	2023.03.27
[ Cut & Tag / Cut & Run ] Cut & Tag 투토리얼 (0)	2023.03.18

'Bioinformatics/Tools' Related Articles