NPIE — Natural Product Classification & Biosynthetic Intelligence Engine

The Natural Product Classification and Biosynthetic Intelligence Engine (NPIE) is an online inference tool for natural product molecule analysis. Users input a single SMILES string to obtain two types of results simultaneously: natural product classification predictions based on the NPClassifier framework, including Pathway, Super_class, and Class tiers; and biosynthetic gene cluster (BGC) related predictions based on a MIBiG 4.0 dataset multi-label model, including probability and determination results for labels such as PKS, NRPS, terpene, and ribosomal. The tool uses a unified molecular representation backbone model, mounting multiple task adapters and heads on a single encoder to achieve "one encoding, multi-task output," balancing inference efficiency with result consistency.

Methodologically, the tool first converts the input SMILES into an explicit structured string representation, then feeds it into a Transformer encoder pre-trained on large-scale molecular corpora. The classification component uses multi-random-seed ensemble for joint Pathway/Super_class/Class prediction. The BGC component uses multi-label ensemble to output the probability, threshold, and final determination for each label, providing a more granular view of the molecule's association strength with different biosynthetic pathways.

Core outputs:

1. Natural product classification: Outputs Pathway, Super class, and Class tier predictions, along with hierarchy consistency validation and a recommended consistent triplet.

2. BGC results: Outputs probability, threshold, and final determination for labels such as PKS, NRPS, terpene, and ribosomal, supporting analysis of structural similarity to different biosynthetic types.

1. Input Molecule for Analysis:

Current length: 0



Usage

  • This page supports single SMILES analysis. The structure preview is displayed first; click "Submit" to invoke the backend inference service.
  • BGC results reflect the structural similarity of the molecule to known natural product / BGC distributions, and do not equate to experimentally validated biosynthetic conclusions.

Model Performance Summary

Multi-task NPClassifier Summary
============================================================
Pathway          acc=0.9405±0.0022  f1=0.9120±0.0033  auprc=0.9656±0.0025
Super_class      acc=0.8569±0.0038  f1=0.7805±0.0117  auprc=0.8581±0.0140
Class            acc=0.7451±0.0011  f1=0.6139±0.0047  auprc=0.6872±0.0061

BGC Multi-label Summary
============================================================
  f1_samples            0.8415 ± 0.0084
  f1_macro              0.7999 ± 0.0081
  exact_match           0.7160 ± 0.0094
  auprc_macro           0.8723 ± 0.0169
  hamming_loss          0.0669 ± 0.0019
  AUPRC PKS           0.9403 ± 0.0143
  AUPRC NRPS          0.9228 ± 0.0085
  AUPRC ribosomal     0.9103 ± 0.0376
  AUPRC other         0.8064 ± 0.0177
  AUPRC terpene       0.9095 ± 0.0528
  AUPRC saccharide    0.7444 ± 0.0373
        
Pathway F1
Super Class F1

Last updated: 2026-06-25