※ Data Statistics for iEKPD 2.0 and EKPD 1.0:


    In this work, we collected 1,860 protein kinases, 439 protein phosphatases, 400 PPBD-containing proteins from the literature and public databases. These regulators were further classified into 151 families for protein kinases, 36 families for protein phosphatases and 21 families for PPBD-containing proteins, respectively. To computationally detect more proteins in eukaryotes, we constructed hidden Markov model (HMM) profiles for these families. For families without HMM profiles, we also conducted orthologous searches. Currently, the iEKPD (integrated annotations for Eukaryotic protein Kinases, protein Phosphatases & phosphoprotein-binding Domains, ver 2.0) has been expanded with 197,348 unique protein entries including 109,912 protein kinases, 23,294 protein phosphatases and 68,748 PPBD-containing proteins in 164 eukaryotes. Moreover, we also annotated these proteins using multi-layer data sources integrated from up to 109 public databases in 14 aspects as follows: (i) Cancer Mutation, including TCGA, ICGC, COSMIC, CGAP, IntOGen, BioMuta and TumorFusions; (ii) Genetic Variation, including dbSNP, GVM, VarCards, ActiveDriverDB, Kin-Driver, m6AVar and rSNPBase 3.0; (iii) Disease-associated Information, including ClinVar, GWASdb, PTMD, OMIM, MSDD, DiseaseEnhancer, BRONCO, HGVTB, DisGeNET and PancanQTL; (iv) mRNA Expression, including TCGA, ICGC, COSMIC, GEO, ArrayExpress, The Human Protein Atlas, Human Proteome Map, GXD, BioExpress, TissGDB, FFGED, SZDB and TISSUES 2.0; (v) DNA & RNA Element, including UTRdb, circBase, circRNADb, CircNet, Circ2Traits, miRTarBase, microRNA.org, TRANSFAC, miRWalk, TargetScan, miRecords, miRNAMap, SomamiR DB 2.0, miRcode, RAID v2.0, LncRNADisease, OverGeneDB and SEA; (vi) DNA Methylation, including TCGA, ICGC, COSMIC and MethyCancer; (vii) Molecular Interaction, including HINT, PINA, Mentha, InWeb_IM, MIST, RISE, IID, iRefIndex, DifferentialNET, TRRUST v2, TIMBAL v2, BindingDB, PLIC, RAIN, YTRP and RegNetwork; (viii) Drug-target relation, including TTD, DrugBank, ADReCS-Target, ECOdrug, DGIdb 3.0, KPID, GRAC, PDTD and CTD; (ix) Protein 3D Structure, including PDB, MMDB and SCOP; (x) Post-translational Modification (PTM), including PLMD, dbPAF, dbPPT, PhosSNP, PhosphoSitePlus, dbPTM, HPRD, Phospho.ELM, UniProt, PHOSIDA, BioGRID, O-GlycBase, PhosphoBase and mUbiSiDa; (xi) Protein Expression/Proteomics, including The Human Protein Atlas and Human Proteome Map; (xii) Subcellular Localization, including NLSdb and COMPARTMENTS; (xiii) Protein Functional Annotation, including CGDB, THANATOS and RaftProt; (xiv) Basic Annotation: Ensembl, UniProt, GeneBank, GO, KEGG, PROSITE, InterPro, Pfam, SMART and RESID. The online service of iEKPD was implemented in PHP + MySQL + JavaScript. Here we confirm that iEKPD will be continuously maintained and updated, whereas all data sets and annotations are freely accessed for all users.

ContentiEKPD 2.0EKPD 1.0
Known data
PKs18601855
PPs439347
PPBDs400N/A
Total26432202
Data integration
Data size~99.8 GB~0.5 GB
Families208181
HMM profiles176166
Species16484
Total entries197,34861,729
Integrated databases1098
Regulator typesPK, PP and PPBDPK and PP
Integrated InformationBasic Annotation, Cancer Mutation, Genetic Variation, Disease-associated Information, mRNA Expression, DNA & RNA Element, DNA Methylation, Molecular Interaction, Drug-target relation, Protein 3D Structure, Post-translational Modification (PTM), Protein Expression/Proteomics, Subcellular Localization, Protein Functional Annotation, protein 3D structuresBasic Annotation

USAGE:

   In iEKPD, we try to make it more powerful and convenient to be used. This USAGE is prepared for the online service. The iEKPD provides the browse, search and advance options.

1.Browse. Two simple ways have been provided for users to browse proteins which play a pivotal role in phospho-signaling events in database. Users can browse by species or by family classifications. Species browse allows user to choose interested animals, fungi and plants. Family browse allows user to select interested family.

EXAMPLE: Please first click on the species phylogenetic trees or family classification picture and enter specific browse webpage. Then by selecting species or family, the related protein will be presented.

2. Search. Five search options are provided, including simple search,Batch Search, advance search and BLAST search.

(1) Simple search. You can input one keyword to search the iEKPD. The search fields include iEKPD ID, Ensembl Protein ID, Ensembl Gene ID, UniProt Accession and Gene Name/Alias.

EXAMPLE: You can click on the "Example" button to load an instance. All species containing PGAM5 will be shown by clicking on the "Submit" button.

(2) Batch search. You can input one keyword to search several Proteins the iEKPD. The search fields include iEKPD, Ensembl Protein ID, Ensembl Gene ID, Ensembl Transcript ID, UniProt Accession and Gene Name/Alias.

EXAMPLE: You can click on the "Example" button to load an instance. All species containing Ensembl Protein ID like "ENSP00000438465; ENSP00000284273; ENSP00000322323; ENSP00000305769; ENSP00000263801" will be shown by clicking on the "Submit" button.

(3) Advance search. allows you to input up to three terms to find the information more specifically. The querying fields can be empty if less terms are needed. The three terms could be connected by the following operators:

exclude: If selected, the term following this operator must be not contained in the specified field(s)
and: the term following this operator has to be included in the specified field(s)
or: either the preceding or the following term to this operator should occur in the specified field(s)

(4) HMM search. This option is used to find related classification information of the specific protein or an unknown sequence submitted by user, including family, score, E-value, domain length and alignment information. If sequence can not be classified into any family, the program will tell you "No hits found". The details of classification method are available in paper.

EXAMPLE: You can click on the "Example" button to load the protein sequence. By clicking on the "Submit" button, you can find the related classifications and the hits returned are ranked by the score.

(5) BLAST search. This option is used to find the specific protein and/or related homologues by sequence alignment. This search-option will help you to find the querying protein accurately and fast. Only one protein sequence in FASTA format is allowed per time. The E-value threshold could be user-defined, while the species information could be specified. The default parameters of E-value and species are 0.01 and H. sapiens, respectively.

EXAMPLE: You can click on the "Example" button to load the protein sequence of human protein. By clicking on the "Submit" button, you can find the related homologues in H. sapiens.

Frequently Asked Questions:

1. Q: Can the search result display the protein's function in the protein kinase system and phosphoprotein-binding domain system at the same time?

A: If you input a protein that has the potential to be the function actor in both two systems, we will display all the possible roles it can be.

2. Q: What do "E-value", "Score", "Start" and "End" mean? (Target sequence description)

A: "E-value": The expectation value (statistical significance) of the target. "Score": The score (in bits) for the target hit. It includes the biased-composition correction. "Start": The position in the target sequence at which the hit starts. "End": The position in the target sequence at which the hit ends. We extracted them from the output files when the hmmsearch program of HMMER or the blastall program in the BLAST software package run out.

3. Q: What is the difference between "Reviewed" and "Unreviewed"? (Status)

A: The protein is annotated as reviewed or unreviewed according to whether it has been reported by the paper or its quality of annotation information. For example, if a protein is annotated as "Reviewed" in UniProtKB or it has been reported by the paper, we will also mark it as "Reviewed". If a protein cannot be mapped to UniProtKB or it is "Unreiewed" in UniProtKB, we will annotate it as "Unreiewed".

4. Q: I have a few questions which are not listed above, how can I contact the authors of iEKPD?

A: Please contact the three major authors: Dr. Yu Xue, Deng Wankun, Yaping Guo for details.