The Natural Product Classification and Biosynthetic Intelligence Engine (NPIE) is an online inference tool for natural product molecule analysis. Users input a single SMILES string to obtain two types of results simultaneously: natural product classification predictions based on the NPClassifier framework, including Pathway, Super_class, and Class tiers; and biosynthetic gene cluster (BGC) related predictions based on a MIBiG 4.0 dataset multi-label model, including probability and determination results for labels such as PKS, NRPS, terpene, and ribosomal. The tool uses a unified molecular representation backbone model, mounting multiple task adapters and heads on a single encoder to achieve "one encoding, multi-task output," balancing inference efficiency with result consistency.
Methodologically, the tool first converts the input SMILES into an explicit structured string representation, then feeds it into a Transformer encoder pre-trained on large-scale molecular corpora. The classification component uses multi-random-seed ensemble for joint Pathway/Super_class/Class prediction. The BGC component uses multi-label ensemble to output the probability, threshold, and final determination for each label, providing a more granular view of the molecule's association strength with different biosynthetic pathways.
Core outputs:
1. Natural product classification: Outputs Pathway, Super class, and Class tier predictions, along with hierarchy consistency validation and a recommended consistent triplet.
2. BGC results: Outputs probability, threshold, and final determination for labels such as PKS, NRPS, terpene, and ribosomal, supporting analysis of structural similarity to different biosynthetic types.
1. Input Molecule for Analysis:
Usage
- This page supports single SMILES analysis. The structure preview is displayed first; click "Submit" to invoke the backend inference service.
- BGC results reflect the structural similarity of the molecule to known natural product / BGC distributions, and do not equate to experimentally validated biosynthetic conclusions.
Model Performance Summary
Multi-task NPClassifier Summary
============================================================
Pathway acc=0.9405±0.0022 f1=0.9120±0.0033 auprc=0.9656±0.0025
Super_class acc=0.8569±0.0038 f1=0.7805±0.0117 auprc=0.8581±0.0140
Class acc=0.7451±0.0011 f1=0.6139±0.0047 auprc=0.6872±0.0061
BGC Multi-label Summary
============================================================
f1_samples 0.8415 ± 0.0084
f1_macro 0.7999 ± 0.0081
exact_match 0.7160 ± 0.0094
auprc_macro 0.8723 ± 0.0169
hamming_loss 0.0669 ± 0.0019
AUPRC PKS 0.9403 ± 0.0143
AUPRC NRPS 0.9228 ± 0.0085
AUPRC ribosomal 0.9103 ± 0.0376
AUPRC other 0.8064 ± 0.0177
AUPRC terpene 0.9095 ± 0.0528
AUPRC saccharide 0.7444 ± 0.0373
Last updated: 2026-06-25