Evaluation of CpG_promoter (old QDA model + new cpgplot)
Test data:
Positive dataset: 11370 human promoters (-1700 bp to 300 bp with respect to TSS)
Negative dataset: 1470 CpG islands from UCSC genome browser, with those CpG islands near (potential) TSSs removed (outside the 2kb region around TSS). 2kb region centered at the centroid of each CpG island is used as input.
Results:
|
|
|
Prediction |
|
|
|
|
T |
F |
|
Ground Truth |
T |
8053 |
1019 |
|
F |
371 |
513 |
|
T: promoter related; F: promoter non-related
Remark: For the 11370 promoter sequences, 2298 of them do not have CpG island, so not included in results; Similarly, for the 1470 CpG islands not close to TSSs from UCSC genome browser, 586 of them are not detected by the program cpgplot, so not included in results.
Sensitivity = TP / (TP + FN) = 88.8%
Specificity = TN / (TN + FP) = 58.0%
This performance is comparable to the result presented in the original paper.