[ SMILES ] CNN based on SMILES representation of compounds for detecting chemical motif

Notice

Recent Posts

Recent Comments

Link

Link to blog "한 사람의 일상"

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

바이오 대표

[ SMILES ] CNN based on SMILES representation of compounds for detecting chemical motif 본문

Drug

[ SMILES ] CNN based on SMILES representation of compounds for detecting chemical motif

바이오 대표 2021. 10. 12. 21:59

< Background >

다양한 fingerprint 와 graph convooution architecture을 이용한 DL 모델 발전.

하지만 이는 (1)분자 비대칭성(chirality of compounds), (2) effective feature을 발견하느냐에 따라 효과적일수도있고 아닐수도 있다.

< Abstract >

Model: CNN (input = SMILEs notation)

Goal: Compounds Classification (Chemical Motif detection with the learned feature)

dataset: TOX 21 dataset

SMILES 장점: Linearly represents 로써 low dimensional representation 이다.

CNN 을 통해 의미있는 features, 즉 learned filters를 이용하면 protein-binding sites 와 같이 중요하게 알려진 known structure (motif)뿐 아니라 unknown functional groups 또한 찾을 수 있다.

< Intro >

Chemical compound 를 컴퓨터에 입력할 수 있도록 한 대표적인 format 에는 MOL, SDF, Fingerprint, SMILES 가 있다. Fingerprint 는 vector 형식이다 (ex) morgan). SMILES 는 chemical structure의 linear notation으로, 정해져있는 문법(ASCII)과 캐릭터를 이용해서 표현가능하다. * SMILES 예시 O=Cc1ccc(O)c(OC)c1 COc1cc(C=O)ccc1O

CNN 은 2D grid of pixels 즉 이미지 데이터에 적합한 모델이다. (graph structure 은 쉽지않아). 또한 CNN 을 이용하면 representation learning 을 이용하여 ML 도중 automatically 의미있는 features을 추출할 수 있다. 애초에 Fingerprint를 사람이 줄게 아니라, 컴퓨터가 스스로 알아낼 수 있도록 해야하고 이는 representation learning을 통해 얻을 수 있다.

Used Dataset: TOX 21 dataset, ROC-AUC scores evaluation

=> One dimensional CNN using SMILES representations

< Methods >

Main: CNN based on the SMILES notation

SMILES feature matrix -------NN---- global max pooling -----> SMILES convolution finger print (SCFP)

Global max pooling을 이용해서 만들었기에 1 filter == 1D SCFP

SMILES feature matrix == ( max length of SMILES strings, 42 ) *42 = 21 symbols for atoms + 21 for SMILES symbols

*SMILEs convolution fingerprint (SCFP) , low dimensional feature vector 64 dimension

< Conclusion >

CNN based on SMILES string 좋다

SMILES 승! vs ECFP (morgan)

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2523-5

Convolutional neural network based on SMILES representation of compounds for detecting chemical motif - BMC Bioinformatics

Background Previous studies have suggested deep learning to be a highly effective approach for screening lead compounds for new drugs. Several deep learning models have been developed by addressing the use of various kinds of fingerprints and graph convolu

bmcbioinformatics.biomedcentral.com

저작자표시 (새창열림)

'Drug' 카테고리의 다른 글

[ Molecular Descriptors ] ECFPs, Morgan (0)	2022.01.13
[ Drug Development ] CheMBL DB & PubChem을 이용한 DRUG 후보 뽑아내기 (0)	2021.10.04

'Drug' Related Articles

바이오 대표

[ SMILES ] CNN based on SMILES representation of compounds for detecting chemical motif 본문

[ SMILES ] CNN based on SMILES representation of compounds for detecting chemical motif

'Drug' 카테고리의 다른 글

티스토리툴바