Molecule mining

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.

Coding(Molecule_i,Molecule_{j $\neq$ i})

Kernel methods

Marginalized graph kernel[1]
Optimal assignment kernel[2][3][4]
Pharmacophore kernel[5]
C++ (and R) implementation combining
- the marginalized graph kernel between labeled graphs[1]
- extensions of the marginalized kernel[6]
- Tanimoto kernels[7]
- graph kernels based on tree patterns[8]
- kernels based on pharmacophores for 3D structure of molecules[5]

Maximum Common Graph methods

MCS-HSCS[9] (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)
Small Molecule Subgraph Detector (SMSD)[10]- is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).[11]

Coding(Molecule_i)

Molecular query methods

Warmr[12][13]
AGM[14][15]
PolyFARM[16]
FSG[17][18]
MolFea[19]
MoFa/MoSS[20][21][22]
Gaston[23]
LAZAR[24]
ParMol[25] (contains MoFa, FFSM, gSpan, and Gaston)
optimized gSpan[26][27]
SMIREP[28]
DMax[29]
SAm/AIm/RHC[30]
AFGen[31]
gRed[32]
G-Hash[33]

Methods based on special architectures of neural networks

BPZ[34][35]
ChemNet[36]
CCS[37][38]
MolNet[39]
Graph machines[40]

References

H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
Fröhlich H., Wegner J. K., Zell A. (2006). "Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression". QSAR Comb. Sci. 25 (4): 317–326. doi:10.1002/qsar.200510135.{{cite journal}}: CS1 maint: multiple names: authors list (link)
H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
Mahe P., Ralaivola L., Stoven V., Vert J. (2006). "The pharmacophore kernel for virtual screening with support vector machines". J Chem Inf Model. 46 (5): 2003–2014. arXiv:q-bio/0603006. Bibcode:2006q.bio.....3006M. doi:10.1021/ci060138m. PMID 16995731. S2CID 15060229.{{cite journal}}: CS1 maint: multiple names: authors list (link)
P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. (2004). "Extensions of marginalized graph kernels". Proceedings of the 21st ICML: 552–559.{{cite journal}}: CS1 maint: multiple names: authors list (link)
L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi (2005). "Graph kernels for chemical informatics". Neural Networks. 18 (8): 1093–1110. doi:10.1016/j.neunet.2005.07.009. PMID 16157471.{{cite journal}}: CS1 maint: multiple names: authors list (link)
P. Mahé and J.-P. Vert (2009). "Graph kernels based on tree patterns for molecules". Machine Learning. 75 (1): 3–35. arXiv:q-bio/0609024. doi:10.1007/s10994-008-5086-2. ISSN 0885-6125. S2CID 5943581.
Wegner J. K., Fröhlich H., Mielenz H., Zell A. (2006). "Data and Graph Mining in Chemical Space for ADME and Activity Data Sets". QSAR Comb. Sci. 25 (3): 205–220. doi:10.1002/qsar.200510009.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Rahman S. A., Bashton M., Holliday G. L., Schrader R., Thornton J. M. (2009). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1 (1): 12. doi:10.1186/1758-2946-1-12. PMC 2820491. PMID 20298518.{{cite journal}}: CS1 maint: multiple names: authors list (link)
"Small Molecule Subgraph Detector (SMSD)".
King R. D., Srinivasan A., Dehaspe L. (2001). "Wamr: a data mining tool for chemical data". J. Comput.-Aid. Mol. Des. 15 (2): 173–181. Bibcode:2001JCAMD..15..173K. doi:10.1023/A:1008171016861. PMID 11272703. S2CID 3055046.{{cite journal}}: CS1 maint: multiple names: authors list (link)
L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001;, 2, 87-92.
A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
Kuramochi M., Karypis G. (2004). "An Efficient Algorithm for Discovering Frequent Subgraphs". IEEE Transactions on Knowledge and Data Engineering. 16 (9): 1038–1051. doi:10.1109/tkde.2004.33. S2CID 242887.
Deshpande M., Kuramochi M., Wale N., Karypis G. (2005). "Frequent Substructure-Based Approaches for Classifying Chemical Compounds". IEEE Transactions on Knowledge and Data Engineering. 17 (8): 1036–1050. doi:10.1109/tkde.2005.127. hdl:11299/215559.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Helma C., Cramer T., Kramer S., de Raedt L. (2004). "Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds". J. Chem. Inf. Comput. Sci. 44 (4): 1402–1411. doi:10.1021/ci034254q. PMID 15272848.{{cite journal}}: CS1 maint: multiple names: authors list (link)
T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
Meinl, T.; Berthold, M. R. (2004). "Hybrid fragment mining with MoFa and FSG" (PDF). 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583). Vol. 5. pp. 4559–4564. doi:10.1109/ICSMC.2004.1401250. ISBN 0-7803-8567-5. S2CID 3248671.
S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
C. Helma, Predictive Toxicology, CRC Press, 2005.
M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
Karwath A., Raedt L. D. (2006). "SMIREP: predicting chemical activity from SMILES". J Chem Inf Model. 46 (6): 2432–2444. doi:10.1021/ci060159g. PMID 17125185. S2CID 1460089.
Ando H., Dehaspe L., Luyten W., Craenenbroeck E., Vandecasteele H., Meervelt L. (2006). "Discovering H-Bonding Rules in Crystals with Inductive Logic Programming". Mol Pharm. 3 (6): 665–674. doi:10.1021/mp060034z. PMID 17140254.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Mazzatorta P., Tran L., Schilter B., Grigorov M. (2007). "Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity". J. Chem. Inf. Model. 47 (1): 34–38. doi:10.1021/ci600411v. PMID 17238246.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Wale N., Karypis G. "Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification". ICDM. 2006: 678–689.
A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, Proc. of ECML--PKDD, pp. 365–376, 2008.
Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
Baskin, I. I.; V. A. Palyulin; N. S. Zefirov (1993). "[A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]". Doklady Akademii Nauk SSSR. 333 (2): 176–179.
I. I. Baskin, V. A. Palyulin, N. S. Zefirov (1997). "A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds". J. Chem. Inf. Comput. Sci. 37 (4): 715–721. doi:10.1021/ci940128y.{{cite journal}}: CS1 maint: multiple names: authors list (link)
D. B. Kireev (1995). "ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping". J. Chem. Inf. Comput. Sci. 35 (2): 175–180. doi:10.1021/ci00024a001.
A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–146. doi:10.1023/A:1008368105614. S2CID 10031212.
A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci (2001). "Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines". J. Chem. Inf. Comput. Sci. 41 (1): 202–218. CiteSeerX 10.1.1.137.2895. doi:10.1021/ci9903399. PMID 11206375.{{cite journal}}: CS1 maint: multiple names: authors list (link)
O. Ivanciuc (2001). "Molecular Structure Encoding into Artificial Neural Networks Topology". Roumanian Chemical Quarterly Reviews. 8: 197–220.
A. Goulon, T. Picot, A. Duprat, G. Dreyfus (2007). "Predicting activities without computing descriptors: Graph machines for QSAR". SAR and QSAR in Environmental Research. 18 (1–2): 141–153. doi:10.1080/10629360601054313. PMID 17365965. S2CID 11759797.{{cite journal}}: CS1 maint: multiple names: authors list (link)

External links

Small Molecule Subgraph Detector (SMSD) - is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules.
5th International Workshop on Mining and Learning with Graphs, 2007
Overview for 2006
Molecule mining (basic chemical expert systems)
ParMol and master thesis documentation - Java - Open source - Distributed mining - Benchmark algorithm library
TU München - Kramer group
Molecule mining (advanced chemical expert systems)
DMax Chemistry Assistant - commercial software
AFGen - Software for generating fragment-based descriptors

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[kti03-1] H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF

[fwz05b-2] H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF

[fwz06-3] Fröhlich H., Wegner J. K., Zell A. (2006). "Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression". QSAR Comb. Sci. 25 (4): 317–326. doi:10.1002/qsar.200510135.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[fwz05a-4] H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer

[mrsv06-5] Mahe P., Ralaivola L., Stoven V., Vert J. (2006). "The pharmacophore kernel for virtual screening with support vector machines". J Chem Inf Model. 46 (5): 2003–2014. arXiv:q-bio/0603006. Bibcode:2006q.bio.....3006M. doi:10.1021/ci060138m. PMID 16995731. S2CID 15060229.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Mahe2004-6] P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. (2004). "Extensions of marginalized graph kernels". Proceedings of the 21st ICML: 552–559.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Ralaivola2005-7] L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi (2005). "Graph kernels for chemical informatics". Neural Networks. 18 (8): 1093–1110. doi:10.1016/j.neunet.2005.07.009. PMID 16157471.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Mahe2009-8] P. Mahé and J.-P. Vert (2009). "Graph kernels based on tree patterns for molecules". Machine Learning. 75 (1): 3–35. arXiv:q-bio/0609024. doi:10.1007/s10994-008-5086-2. ISSN 0885-6125. S2CID 5943581.

[wfmz06-9] Wegner J. K., Fröhlich H., Mielenz H., Zell A. (2006). "Data and Graph Mining in Chemical Space for ADME and Activity Data Sets". QSAR Comb. Sci. 25 (3): 205–220. doi:10.1002/qsar.200510009.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[SMSD09-10] Rahman S. A., Bashton M., Holliday G. L., Schrader R., Thornton J. M. (2009). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1 (1): 12. doi:10.1186/1758-2946-1-12. PMC 2820491. PMID 20298518.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[SMSD-11] "Small Molecule Subgraph Detector (SMSD)".

[ksd01-12] King R. D., Srinivasan A., Dehaspe L. (2001). "Wamr: a data mining tool for chemical data". J. Comput.-Aid. Mol. Des. 15 (2): 173–181. Bibcode:2001JCAMD..15..173K. doi:10.1023/A:1008171016861. PMID 11272703. S2CID 3055046.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[dtk98-13] L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.

[iwom01-14] A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001;, 2, 87-92.

[iwnm02-15] A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.

[ck03-16] A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.

[fsg04-17] Kuramochi M., Karypis G. (2004). "An Efficient Algorithm for Discovering Frequent Subgraphs". IEEE Transactions on Knowledge and Data Engineering. 16 (9): 1038–1051. doi:10.1109/tkde.2004.33. S2CID 242887.

[fsg05-18] Deshpande M., Kuramochi M., Wale N., Karypis G. (2005). "Frequent Substructure-Based Approaches for Classifying Chemical Compounds". IEEE Transactions on Knowledge and Data Engineering. 17 (8): 1036–1050. doi:10.1109/tkde.2005.127. hdl:11299/215559.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[hckr04-19] Helma C., Cramer T., Kramer S., de Raedt L. (2004). "Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds". J. Chem. Inf. Comput. Sci. 44 (4): 1402–1411. doi:10.1021/ci034254q. PMID 15272848.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[mbb04-20] T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.

[mbbp04-21] T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.

[mb04-22] Meinl, T.; Berthold, M. R. (2004). "Hybrid fragment mining with MoFa and FSG" (PDF). 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583). Vol. 5. pp. 4559–4564. doi:10.1109/ICSMC.2004.1401250. ISBN 0-7803-8567-5. S2CID 3248671.

[nk04-23] S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.

[hel05-24] C. Helma, Predictive Toxicology, CRC Press, 2005.

[woe06-25] M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF

[jk05-26] K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.

[yh02a-27] X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.

[kr06-28] Karwath A., Raedt L. D. (2006). "SMIREP: predicting chemical activity from SMILES". J Chem Inf Model. 46 (6): 2432–2444. doi:10.1021/ci060159g. PMID 17125185. S2CID 1460089.

[ahlcvm06-29] Ando H., Dehaspe L., Luyten W., Craenenbroeck E., Vandecasteele H., Meervelt L. (2006). "Discovering H-Bonding Rules in Crystals with Inductive Logic Programming". Mol Pharm. 3 (6): 665–674. doi:10.1021/mp060034z. PMID 17140254.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[mtsg06-30] Mazzatorta P., Tran L., Schilter B., Grigorov M. (2007). "Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity". J. Chem. Inf. Model. 47 (1): 34–38. doi:10.1021/ci600411v. PMID 17238246.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[afgen06-31] Wale N., Karypis G. "Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification". ICDM. 2006: 678–689.

[gago08-32] A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, Proc. of ECML--PKDD, pp. 365–376, 2008.

[wang10-33] Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.

[34] Baskin, I. I.; V. A. Palyulin; N. S. Zefirov (1993). "[A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]". Doklady Akademii Nauk SSSR. 333 (2): 176–179.

[35] I. I. Baskin, V. A. Palyulin, N. S. Zefirov (1997). "A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds". J. Chem. Inf. Comput. Sci. 37 (4): 715–721. doi:10.1021/ci940128y.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[36] D. B. Kireev (1995). "ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping". J. Chem. Inf. Comput. Sci. 35 (2): 175–180. doi:10.1021/ci00024a001.

[37] A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–146. doi:10.1023/A:1008368105614. S2CID 10031212.

[38] A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci (2001). "Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines". J. Chem. Inf. Comput. Sci. 41 (1): 202–218. CiteSeerX 10.1.1.137.2895. doi:10.1021/ci9903399. PMID 11206375.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[39] O. Ivanciuc (2001). "Molecular Structure Encoding into Artificial Neural Networks Topology". Roumanian Chemical Quarterly Reviews. 8: 197–220.

[40] A. Goulon, T. Picot, A. Duprat, G. Dreyfus (2007). "Predicting activities without computing descriptors: Graph machines for QSAR". SAR and QSAR in Environmental Research. 18 (1–2): 141–153. doi:10.1080/10629360601054313. PMID 17365965. S2CID 11759797.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Molecule mining

Coding(Molecule_i,Molecule_{j $\neq$ i})

Kernel methods

Maximum Common Graph methods

Coding(Molecule_i)

Molecular query methods

Methods based on special architectures of neural networks

See also

References

Further reading

External links

Molecule mining

Coding(Moleculei,Moleculej ≠ {\displaystyle \neq } i)

Kernel methods

Maximum Common Graph methods

Coding(Moleculei)

Molecular query methods

Methods based on special architectures of neural networks

See also

References

Further reading

External links

Coding(Molecule_i,Molecule_{j $\neq$ i})

Coding(Molecule_i)