※Computational identification of protein kinases and site-specific kinase-substrate relations in plants
<1>. Summary

     Protein kinases (PKs) are the key regulators responsible for the protein phosphorylation that is involved in nearly all aspects of biological activities (Manning, et al., 2002 ; Hanks, et al., 1995). Identification of the PKs and substrates with their phosphorylation sites (p-sites) is the foundation for understanding the mechanisms of protein phosphorylation in regulating plant growth and development (Olsen, et al., 2006 ; Sinha, et al., 2011). Besides the experimental method, computational analysis of protein phosphorylation in plant has aroused public interest. In this protocol, we took Vitis vinifera as example to describe how to identify plant PKs based on Hidden Markov Model (HMM) profiles and ortholog search. Frist, we manually classified the 1,855 curated PKs from the kinase.com database (Manning, et al., 2002) into 10 groups including 149 families on the base of the previously described theories (Manning, et al., 2002 ; Hanks, et al., 1995) and then constructed 139 HMM profiles in the family level by the hmmbuild program (Eddy, et al., 2009) based on the kinase domain sequences of the curated PKs. Hereafter, we identify 1,243 PKs through applying the hmmsearch program (Eddy, et al., 2009) to search the Vitis vinifera protein sequences (Flicek, et al., 2013). Moreover by ortholog search(Tatusov, et al., 1997), 5 PKs in the families without any HMM models were identified. We totally characterized 1,248 PKs of Vitis vinifera with 9 groups and 49 families. Further, the GPS 2.1 algorithm (Xue, et al., 2008) with GPS predictors was used to predict the site-specific kinase-substrate relations (ssKSRs) for Vitis vinifera. Afterwards, we adopted the protein-protein interaction (PPI) data as the major factor to remove false positive predictions that was iGPS algorithm (Song, et al., 2012). Finally, we constructed the kinase-substrate phosphorylation network by Cytoscape 2.8.3 (Shannon, et al., 2003). More details about the process are provided in (unpublished), the relevant datasets are supplied in below.


<2>. Relevant datasets

(1).  Protein sequences of Vitis vinifera : the protein sequences were downloaded from EnsemblPlants (release version 21) (Flicek, et al., 2013).

(2).  Phosphopeptides of Vitis vinifera : we downloaded 927 experimentally identified phosphopeptides of Vitis vinifera from P3DB (release version 3.0), which is a more comprehensive phosphoproteomics database for nine plant species (Yao, et al., 2013).

(3).  Phosphoprotein of Vitis vinifera : 607 experimentally identified phosphoprotein with their sequences of Vitis vinifera were also downloaded from P3DB (release version 3.0) (Yao, et al., 2013).

(4).  PSP(15, 15) items of Vitis vinifera : we extracted all items of PSP (15, 15), a phosphopeptide as a phosphorylation residue of S, T or Y surrounded by 15 upstream residues and 15 downstream residues (Xue, et al., 2008). In the case that the p-site located in N-terminal or C-terminal of protein sequence, we completed the phosphopeptide to PSP (15, 15) with “*” characters if necessary.

(5).  GPS predictors of Vitis vinifera protein kinases : based on the hypothesis that similar PKs classified in a same group or family would recognize similar short linear motifs of substrate modification, we manually selected 1,086 kinases with GPS predictors.

(6).  Prediction results of the Vitis vinifera ssKSRs by GPS 2.1 algorithm : 171,241 ssKSRs between 1,072 PKs and 483 substrates for the 674 p-sites, with an average of 254.1 upstream PKs per p-site were predicted by GPS2.1 algorithm which was developed mainly for the prediction of kinase-specific p-sites (Xue, et al., 2008).

(7).  Prediction results of the Vitis vinifera ssKSRs by iGPS algorithm : predicted 2,574 ssKSRs among 737 PKs and 110 substrates for the 129 p-sites, with an average of 20.0 upstream PKs per p-site by iGPS algorithm which filtered the prediction results of GPS 2.1 algorithm with the protein-protein interaction (PPI) data (Song, et al., 2012).