Reinvent: Data Prep
- Run
- About
This app provides how to process, analyze, and filter data from ChEMBL or other sources. There are several reasons why data used to train a generative model should be pre-processed. Invalid or duplicate entries need to be removed. Unusual compounds that are clearly not drug-like need to be excluded (too big, reactive groups and etc.) and rare tokens can be removed. There are some rare compounds that can be considered outliers and excluding them frees up space in the vocabulary, making it smaller. Input is SMILES strings in .smi format. Output is filtered SMILES in .csv format that can be used in other Reinvent apps. (e.g. Reinvent: Create Initial Prior/Agent Generative Model). For more information, please see the tutorial page of Reinvent Apps.
Example Use Case: Pre-processing SMILES strings from big sources for downstream analysis.
Limitation: Filtering options are kept like in the demo notebook. Other options will be added next versions.
Nov-04-2022
so you can keep track of your jobs