Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds which are readily accessible and available to researchers. It is a library used in the method of molecular docking.
Theoretical spaces
A chemical space often referred to in cheminformatics is that of potential pharmacologically active molecules. Its size is estimated to be in the order of 10 molecules. There are no rigorous methods for determining the precise size of this space. The assumptions used for estimating the number of potential pharmacologically active molecules, however, use the Lipinski rules, in particular the molecular weight limit of 500. The estimate also restricts the chemical elements used to be Carbon, Hydrogen, Oxygen, Nitrogen and Sulfur. It further makes the assumption of a maximum of 30 atoms to stay below 500 daltons, allows for branching and a maximum of 4 rings and arrives at an estimate of 10. This number is often misquoted in subsequent publications to be the estimated size of the whole organic chemistry space, which would be much larger if including the halogens and other elements. In addition to the drug-like space and lead-like space that are, in part, defined by the Lipinski's rule of five, the concept of known drug space (KDS), which is defined by the molecular descriptors of marketed drugs, has also been introduced. KDS can be used to help predict the boundaries of chemical spaces for drug development by comparing the structure of the molecules that are undergoing design and synthesis to the molecular descriptor parameters that are defined by the KDS.
Empirical spaces
As of October 2024, 219 million molecules were assigned with a Chemical Abstracts Service (CAS) Registry Number. ChEMBL Database version 33 record biological activities for 2,431,025 distinct molecules. Chemical libraries used for laboratory-based screening for compounds with desired properties are examples for real-world chemical libraries of small size (a few hundred to hundreds of thousands of molecules).
Generation
Systematic exploration of chemical space is possible by creating in silico databases of virtual molecules, which can be visualized by projecting multidimensional property space of molecules in lower dimensions. Generation of chemical spaces may involve creating stoichiometric combinations of electrons and atomic nuclei to yield all possible topology isomers for the given construction principles. In Cheminformatics, software programs called Structure Generators are used to generate the set of all chemical structure adhering to given boundary conditions. Constitutional Isomer Generators, for example, can generate all possible constitutional isomers of a given molecular gross formula.
In the real world, chemical reactions allow us to move in chemical space. The mapping between chemical space and molecular properties is often not unique, meaning that there can be very different molecules exhibiting very similar properties. Materials design and drug discovery both involve the exploration of chemical space.
See also
References
- Reymond, J.-L.; Awale, M. (2012). "Exploring chemical space for drug discovery using the chemical universe database". ACS Chem. Neurosci. 3 (9): 649–657. doi:10.1021/cn3000422. PMC 3447393. PMID 23019491.
- Rudling, Axel; Gustafsson, Robert; Almlöf, Ingrid; Homan, Evert; Scobie, Martin; Warpman Berglund, Ulrika; Helleday, Thomas; Stenmark, Pål; Carlsson, Jens (2017-10-12). "Fragment-Based Discovery and Optimization of Enzyme Inhibitors by Docking of Commercial Chemical Space". Journal of Medicinal Chemistry. 60 (19): 8160–8169. doi:10.1021/acs.jmedchem.7b01006. ISSN 1520-4804. PMID 28929756.
- Bohacek, R .S.; C. McMartin; W. C. Guida (1999). "The art and practice of structure‐based drug design: A molecular modeling perspective". Medicinal Research Reviews. 16 (1): 3–50. doi:10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6. PMID 8788213. S2CID 44271689.
- Kirkpatrick, P.; C. Ellis (2004). "Chemical space". Nature. 432 (7019): 823–865. Bibcode:2004Natur.432..823K. doi:10.1038/432823a.
- Mirza, A.; Desai, R.; Reynisson, J. (2009). "Known drug space as a metric in exploring the boundaries of drug-like chemical space". Eur. J. Med. Chem. 44 (12): 5006–5011. doi:10.1016/j.ejmech.2009.08.014. PMID 19782440.
- Bade, R.; Chan, H.F.; Reynisson, J. (2010). "Characteristics of known drug space. Natural products, their derivatives and synthetic drugs". Eur. J. Med. Chem. 45 (12): 5646–5652. doi:10.1016/j.ejmech.2010.09.018. PMID 20888084.
- Matuszek, A. M.; Reynisson, J. (2016). "Defining Known Drug Space Using DFT". Mol. Inform. 35 (2): 46–53. doi:10.1002/minf.201500105. PMID 27491789. S2CID 21489164.
- "CAS Registry". www.cas.org. Retrieved 2024-10-16.
- "ChEMBL Database". www.ebi.ac.uk. Retrieved 2024-10-16.
- Zdrazil, Barbara; Felix, Eloy; Hunter, Fiona; Manners, Emma J; Blackshaw, James; Corbett, Sybilla; de Veij, Marleen; Ioannidis, Harris; Lopez, David Mendez; Mosquera, Juan F; Magarinos, Maria Paula; Bosc, Nicolas; Arcila, Ricardo; Kizilören, Tevfik; Gaulton, Anna (2024-01-05). "The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods". Nucleic Acids Research. 52 (D1): D1180–D1192. doi:10.1093/nar/gkad1004. ISSN 0305-1048. PMC 10767899. PMID 37933841.
- L. Ruddigkeit; R. van Deursen; L. C. Blum; J.-L. Reymond (2012). "Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17". J. Chem. Inf. Model. 52 (11): 2864–2875. doi:10.1021/ci300415d. PMID 23088335.
- M. Awale; R. van Deursen; J. L. Reymond (2013). "MQN-Mapplet: Visualization of Chemical Space with Interactive Maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13". J. Chem. Inf. Model. 53 (2): 509–18. doi:10.1021/ci300513m. PMID 23297797.
- L. Ruddigkeit; L. C. Blum; J.-L. Reymond (2013). "Visualization and Virtual Screening of the Chemical Universe Database GDB-17". J. Chem. Inf. Model. 53 (1): 56–65. doi:10.1021/ci300535x. PMID 23259841. S2CID 18531792.