Protein Structural Property Prediction (ProtSA)

ProtSA is a protein structural property predictor that relies only on amino acid sequences: given a protein sequence, it outputs residue-level relasa (relative solvent-accessible surface area), plddt (local confidence), and sec (secondary structure H/E/C). The model is trained with a three-stage curriculum: Stage 1 learns stable residue-level structural semantic representations (mainly node regression and secondary structure recognition), then Stage 2 introduces more complete structural constraints for joint optimization. A leakage-free design ensures that both training and inference use sequence-side information only. The final version performs stably on test tasks, balancing practical accuracy and inference efficiency, and is suitable for large-scale sequence structural annotation frontends.

1. Protein Sequence (max 1024 aa):

Parsed sequences: 0, total residues: 0



Test Set Metrics (focused on relasa/plddt/sec)

============================================================
   Model Metrics (Stage2, Sequence-only)
============================================================

[Residue-level Regression]
------------------------------------------------------------
  Target         MAE        R2       Note
------------------------------------------------------------
  relasa       0.0952    0.7540    relative solvent accessibility
  plddt_norm   0.0327    0.7446    normalized pLDDT (0-1)
  plddt(0-100) 3.27      -         converted from plddt_norm
------------------------------------------------------------
  rsa_pcc      0.8685051555514335
------------------------------------------------------------


============================================================
   Secondary Structure Classification
============================================================
  Class   Support     Prec   Recall       F1      Acc
  --------------------------------------------------
  H        913987   0.9557   0.9589   0.9573   0.9589
  E        404837   0.9369   0.9107   0.9236   0.9107
  C       1064603   0.9331   0.9403   0.9367   0.9403
  --------------------------------------------------
  Macro                               0.9392   0.9424

  Confusion Matrix (rows=true, cols=pred):
              H      E       C
  H      876424    779   36784
  E        1168 368678   34991
  C       39487  24059 1001057
        

Last updated: 2026-04-30