Protein Secondary Structure Prediction (ProtSS)

Protein secondary structure refers to specific conformations formed when backbone atoms in a polypeptide chain coil or fold along certain axes. It describes the spatial arrangement of peptide backbone atoms and does not involve amino acid side chains. Two common annotation and evaluation schemes are Q3 and Q8: Q3 is a coarse three-state classification (H/E/C), while Q8 is an eight-state fine-grained classification (defined by DSSP). The main force stabilizing secondary structure is hydrogen bonding. In practice, protein secondary structure is usually a combination of different conformations rather than a single pure alpha-helix or beta-sheet form, with different proteins having different composition ratios.

1. Protein Sequence (max 1024 aa):

Total length: 0



Secondary Structure Prediction

A more accuracy secondary structure prediction model (Q3) is available at Protein Structural Property Prediction (ProtSA).

Secondary structure prediction is an essential step in protein machine learning, catalytic residue analysis, and protein structure prediction. The 8 secondary-structure states are: H (alpha-helix), G (3-10 helix), I (pi-helix), B (isolated beta-bridge), E (extended strand/beta-sheet), T (hydrogen-bonded turn), S (bend), and C/blank (coil/other).

Model Performance Metrics

We trained a protein secondary-structure prediction model supporting both Q8 and Q3. Below are the test results on the CB513 dataset (SOTA-level performance):

  • CB513 Test Accuracy (Q8): 0.7587
  • CB513 Test Accuracy (Q3): 0.8731
[Q8 Classification Report]
              precision    recall  f1-score   support

           H     0.8850    0.9424    0.9128     43037
           G     0.5375    0.4450    0.4869      5173
           I     0.7033    0.4497    0.5487       796
           E     0.8464    0.8789    0.8623     30090
           B     0.5756    0.1622    0.2531      1831
           T     0.6149    0.6481    0.6310     16457
           S     0.5892    0.4262    0.4946     13541
       C/L/      0.6908    0.6911    0.6910     33086

   micro avg     0.7661    0.7587    0.7624    144011
   macro avg     0.6803    0.5805    0.6100    144011
weighted avg     0.7562    0.7587    0.7541    144011


[Q3 Classification Report]
              precision    recall  f1-score   support

           H     0.8914    0.9211    0.9060     48993
           E     0.8566    0.8543    0.8554     31845
           C     0.8664    0.8447    0.8554     61789

    accuracy                         0.8731    142627
   macro avg     0.8714    0.8733    0.8723    142627
weighted avg     0.8728    0.8731    0.8728    142627
        

References

  • Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195-202.

Last updated: 2026-03-19