Intra-species Protein Phosphorylation Prediction

This page provides a software implementation of the approach for team PRB (submission Team49) in the IMPROVER Intra-species protein phosphorylation prediction of the IMPROVER Species Translation Challenge.
Prerequisites:
R statistical environment v 3.0.0 or higher on a system (linux like) that has the following packages installed:
– foreach
– doMC
– limma
– ROC
– ROCR
– class
– e1071
– caret
– MASS
– ROCR
– ROC

The entry point in this pipeline is the R script file: SC1team49.r. The code reads the rat gene expression file for stimuli in the training set (“GEx_rat_train.txt”) as well as the phosphorylation levels of the 16 proteins at 2 time points (“Phospho_5_rat_train.txt”, “Phospho_25_rat_train.txt”) and builds one model for each protein and predicts from it the phosphorylation status for new stimuli using the expression data from “GEx_rat_test.txt”. Predictions (0=not phosphorylated, 1=phosphorylated) are written in the file “PRB_SC1_predictions.txt” that follows the template provided by STC organizers ( “template_prediction_SC1_phospho_rat.txt”).
An additional RData file, “Models_PRB_SC1.Rdata”, is created to store the model information for each protein, including the gene predictor names.

To download the R script and required data sets click here.