Reinvent: Data Prep

AstraZeneca |

Reinvent: Data Prep

Drug Design

Run
About
API Example

This app provides how to process, analyze, and filter data from ChEMBL or other sources. There are several reasons why data used to train a generative model should be pre-processed. Invalid or duplicate entries need to be removed. Unusual compounds that are clearly not drug-like need to be excluded (too big, reactive groups and etc.) and rare tokens can be removed. There are some rare compounds that can be considered outliers and excluding them frees up space in the vocabulary, making it smaller. Input is SMILES strings in .smi format. Output is filtered SMILES in .csv format that can be used in other Reinvent apps. (e.g. Reinvent: Create Initial Prior/Agent Generative Model). For more information, please see the tutorial page of Reinvent Apps.

Example Use Case: Pre-processing SMILES strings from big sources for downstream analysis.

Limitation: Filtering options are kept like in the demo notebook. Other options will be added next versions.

Citation:

Blaschke T, Arús-Pous J, Chen H, Margreitter C, Tyrchan C, Engkvist O, et al. REINVENT 2.0 – an AI Tool for De Novo Drug Design. ChemRxiv 2020. doi:10.26434/chemrxiv.12058026.v3. This content is a preprint and has not been peer-reviewed.

Released:
Nov-04-2022

Previous Job Parameters

Your previous job parameters will show up here
so you can keep track of your jobs

Results

Parameters

Reinvent: Data Prep

Upload SMILES Dataset