The compounds are not "drug-like", but the systematic coverage allows for systematic experimentation and simple metrics to explore the properties of the generative methods. They are available for download
http://gdb.unibe.ch/downloads/. The smaller ones could support fast experimentation and the larger ones for more extended exploration. Using a training set, it can be measured how much of the remaining space is recreated as a simple percentage (probably as a function of number of samples molecules). It can also be measured if the networks create molecules outside the training space (e.g. creating molecules with 7 or 9 atoms when they are trained on GDB-8.) The recreation of the remaining space can be tuned using hyper parameter search, and it is possible to make experiments of how much of the chemical space needs to be covered to get a decent recreation of the remainder.
Tasks
To perform the benchmark, it’s good to start with tasks already done in the literature. Also, it is interesting to evaluate the same model across a large variety of tasks (to avoid overfitting a particular task).
Multi-objective tasks are more realistic, but more difficult than single-objective tasks (for example, getting molecules which are active, non-toxic, and synthetizable). It has been tried recently (
Peking University).
Here’s a list of tasks (tell me if I omitted your paper):
Drug discovery tasks
Organic materials tasks
Data
This DiversityNet benchmark is based on publicly available data, like all the papers cited above. In most papers, data is taken from:
Many papers only use small datasets (including
mine), and in some way, that’s bad. Model pre-training should be made on a large dataset.
Even better, different pre-training set sizes could be tested (5K, 10K, 15K, 30K, 50K, 100K, 250K, 1M) to understand how performance of the generative model changes (that’s a suggestion from an anonymous referee of my paper).
Besides small molecules chemistry, the same generative models can be used for other tasks related to drug discovery: for RNA sequences (
University of Tokyo), for DNA sequences (
University of Toronto) and for proteins (
Harvard 4,
ETH Zurich 3). However, I think it is better to keep those non-chemistry tasks for separate benchmarks: DiversityNet-genetics and DiversityNet-proteins.