VHH Nativeness Assessment (VHH-Nativ)

This tool evaluates the nativeness of camelid-derived single-domain antibody (VHH) amino acid sequences. We use antibody language models trained on large-scale antibody sequence data to compute perplexity (PPL) for input sequences — lower PPL indicates closer proximity to the natural sequence distribution.

The output includes each sequence's PPL, percentile within the VHH reference distribution, and nativeness score.

Training data scale: human VH ~15M, human VL ~17M, VHH ~18M, human heavy chain CDR3 ~5M; plus ~3.7M human paired (VH-VL) records.

1. VHH Amino Acid Sequences (supports up to 10 FASTA entries):

Parsed sequences: 0, Total residues: 0



Model Performance (Current Version)

Antibody MLM Model Performance:
  PPL = 1.39
  Acc = 0.9162

Antibody Autoregressive (GPT) Model Performance:
  PPL = 1.47
  Acc = 0.8974

VHH Reference PPL Distribution (random sample of 50,000)

count = 50000
mean  = 1.5379514
std   = 0.4447357
min   = 1.1094235
max   = 5.7620468

percentiles:
  p1  = 1.1464990
  p5  = 1.1636765
  p10 = 1.1778821
  p25 = 1.2204022
  p50 = 1.3505365
  p75 = 1.7256708
  p90 = 2.1715568
  p95 = 2.4626958
  p99 = 3.0437380

Lower PPL indicates closer proximity to the natural VHH sequence distribution. Nativeness scores and recommendations will be provided based on percentile ranks.

Last updated: 2026-05-26