Datasets

Datasets for training, testing and casestudy

Our work used datasets S1676, S236, S543 to investigate the prediction of stability changes on protein (Table S1).

Table S1 Datasets used to build, evaluate and independently test in SCpre-seq

Dataset

Total Variants (Proteins)

Destabilizing Variants (Proteins)

Stabilizing Variants (Proteins)

Stabilizing Variants (Proteins)

Additional Details

S1676

1676 (67)

1,223 (64)

424 (53)

29(4)

Unique Variants/Averaged DDG

S236

236 (22)

192 (18)

42 (14)

2(2)

Unique Variants/Averaged DDG

S543

543(55)

426(48)

107(37)

10(6)

Unique Variants/Averaged DDG

p53

42 (1)

31(1)

11 (1)

0

One Protein

Databases and references

Folkman, L., et al., EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J Mol Biol, 2016. 428(6): p. 1394-1405.

Pires, D.E., D.B. Ascher, and T.L. Blundell, mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 2014. 30(3): p. 335-42.

Datasets with features

Downloading