This tool evaluates the nativeness of camelid-derived single-domain antibody (VHH) amino acid sequences. We use antibody language models trained on large-scale antibody sequence data to compute perplexity (PPL) for input sequences — lower PPL indicates closer proximity to the natural sequence distribution.
The output includes each sequence's PPL, percentile within the VHH reference distribution, and nativeness score.
Training data scale: human VH ~15M, human VL ~17M, VHH ~18M, human heavy chain CDR3 ~5M; plus ~3.7M human paired (VH-VL) records.
1. VHH Amino Acid Sequences (supports up to 10 FASTA entries):
Parsed sequences: 0, Total residues: 0
Model Performance (Current Version)
Antibody MLM Model Performance: PPL = 1.39 Acc = 0.9162 Antibody Autoregressive (GPT) Model Performance: PPL = 1.47 Acc = 0.8974
VHH Reference PPL Distribution (random sample of 50,000)
count = 50000 mean = 1.5379514 std = 0.4447357 min = 1.1094235 max = 5.7620468 percentiles: p1 = 1.1464990 p5 = 1.1636765 p10 = 1.1778821 p25 = 1.2204022 p50 = 1.3505365 p75 = 1.7256708 p90 = 2.1715568 p95 = 2.4626958 p99 = 3.0437380
Lower PPL indicates closer proximity to the natural VHH sequence distribution. Nativeness scores and recommendations will be provided based on percentile ranks.
Last updated: 2026-05-26