AstraZeneca |

Reinvent: Data Prep

1
Drug Design
    More
Step 1: Upload your data

Upload SMILES Dataset

Drag your file(s) or upload
  • Your file can be in the following formats:smi
  • SMILES strings should be in the first column of the .smi file without a header. Example: Cc1cc(S(=O)(=O)NC(C)c2nnc3ccccn23)ccc1Br Cc1cc(=NC(=O)c2ccc3c(-c4nc5ccccc5[nH]4)[nH]nc3c2)[nH][nH]1 O=C1CC(c2ccc(Br)cc2)Nc2c(Br)cc(Br)cc21
or
Don’t have a file?
Use our demo data to run
Use Demo Data
Step 2: Set Parameters

This app provides how to process, analyze, and filter data from ChEMBL or other sources. There are several reasons why data used to train a generative model should be pre-processed. Invalid or duplicate entries need to be removed. Unusual compounds that are clearly not drug-like need to be excluded (too big, reactive groups and etc.) and rare tokens can be removed. There are some rare compounds that can be considered outliers and excluding them frees up space in the vocabulary, making it smaller. Input is SMILES strings in .smi format. Output is filtered SMILES in .csv format that can be used in other Reinvent apps. (e.g. Reinvent: Create Initial Prior/Agent Generative Model). For more information, please see the tutorial page of Reinvent Apps.

Example Use Case: Pre-processing SMILES strings from big sources for downstream analysis.

Limitation: Filtering options are kept like in the demo notebook. Other options will be added next versions.

Citation:
Blaschke T, Arús-Pous J, Chen H, Margreitter C, Tyrchan C, Engkvist O, et al. REINVENT 2.0 – an AI Tool for De Novo Drug Design. ChemRxiv 2020. doi:10.26434/chemrxiv.12058026.v3. This content is a preprint and has not been peer-reviewed.
Released:
Nov-04-2022
Previous Job Parameters
Your previous job parameters will show up here
so you can keep track of your jobs
Results
Parameters