Identification and understanding of Kinase Activating Missense Mutations

Carlos Henrique Miranda Rodrigues

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/BUOS-ARRG8V

Type:	Dissertação de Mestrado
Title:	Identification and understanding of Kinase Activating Missense Mutations
Authors:	Carlos Henrique Miranda Rodrigues
First Advisor:	Douglas Eduardo Valente Pires
First Referee:	Lucas Bleicher
Second Referee:	Gisele Lobo Pappa
Third Referee:	Sandro Carvalho Izidoro
Abstract:	Protein phosphorylation and dephosphorylation play vital roles in a variety of cellular processes, and the balance between them must be closely regulated. Disturbances in the harmonic relationship between protein phosphorylation and dephosphorylation, through the introduction of dominant activating missense mutations in protein kinases, are known to be driver events of many cancer. Despite this, the identification of potential activating mutations has proven to be a difficult task, and has been limited to evolutionary and sequence-based comparisons with previously characterised mutations. This study aims to fill this gap by proposing a novel machine learning method for predicting missense activating mutations on protein kinases, named Kinact. Experimental data on 384 point mutations in 42 different protein kinases was collected from Kin-Driver, Clinvar and Ensembl databases. The resulting data sample was then manually curated and 258 mutations were mapped into solved 3D structures of the Protein Data Bank. Each protein was classified into one group of the Kinase Classification and a set of in-silico analysis were performed with sequence and structure data. The most descriptive features were then used as input for training and testing supervised learning algorithms and predictive classification models that rely on attributes solely from sequence level, structural level and in combination were generated. The best performing model was observed when a combination of structural and sequence-based features were used as evidence during the learning task, achieving a precision of up to 90% and Area Under ROC Curve of 0.96 under 10-fold cross-validation and precision of 81% and Area Under ROC Curve of 0.89 on blind tests. We show the best performing model of Kinact significantly outperforms the gold-standard methods used by clinical geneticists (p-value < 0.01), SIFT and PolyPhen-2, which achieved Area Under ROC Curve of 0.49 and 0.63 on the training data set, respectively and 0.67 and 0.53, respectively, on the blind test. Kinact conveniently combines high-performance open source web visualization tools to assist further research on how mutations affect protein kinases activity. The method is freely available as a user friendly, easy to use web server at <http://biosig.unimelb.edu.au/kinact/>
Subject:	Bioinformática
language:	Inglês
Publisher:	Universidade Federal de Minas Gerais
Publisher Initials:	UFMG
Rights:	Acesso Aberto
URI:	http://hdl.handle.net/1843/BUOS-ARRG8V
Issue Date:	25-Jul-2017
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
identification_and_understanding_of_kinase_carlos_rodrigues.pdf		9.45 MB	Adobe PDF	View/Open

Show full item record