Please use this identifier to cite or link to this item:
http://hdl.handle.net/1843/BUOS-ARRG8V
Type: | Dissertação de Mestrado |
Title: | Identification and understanding of Kinase Activating Missense Mutations |
Authors: | Carlos Henrique Miranda Rodrigues |
First Advisor: | Douglas Eduardo Valente Pires |
First Referee: | Lucas Bleicher |
Second Referee: | Gisele Lobo Pappa |
Third Referee: | Sandro Carvalho Izidoro |
Abstract: | Protein phosphorylation and dephosphorylation play vital roles in a variety of cellular processes, and the balance between them must be closely regulated. Disturbances in the harmonic relationship between protein phosphorylation and dephosphorylation, through the introduction of dominant activating missense mutations in protein kinases, are known to be driver events of many cancer. Despite this, the identification of potential activating mutations has proven to be a difficult task, and has been limited to evolutionary and sequence-based comparisons with previously characterised mutations. This study aims to fill this gap by proposing a novel machine learning method for predicting missense activating mutations on protein kinases, named Kinact. Experimental data on 384 point mutations in 42 different protein kinases was collected from Kin-Driver, Clinvar and Ensembl databases. The resulting data sample was then manually curated and 258 mutations were mapped into solved 3D structures of the Protein Data Bank. Each protein was classified into one group of the Kinase Classification and a set of in-silico analysis were performed with sequence and structure data. The most descriptive features were then used as input for training and testing supervised learning algorithms and predictive classification models that rely on attributes solely from sequence level, structural level and in combination were generated. The best performing model was observed when a combination of structural and sequence-based features were used as evidence during the learning task, achieving a precision of up to 90% and Area Under ROC Curve of 0.96 under 10-fold cross-validation and precision of 81% and Area Under ROC Curve of 0.89 on blind tests. We show the best performing model of Kinact significantly outperforms the gold-standard methods used by clinical geneticists (p-value < 0.01), SIFT and PolyPhen-2, which achieved Area Under ROC Curve of 0.49 and 0.63 on the training data set, respectively and 0.67 and 0.53, respectively, on the blind test. Kinact conveniently combines high-performance open source web visualization tools to assist further research on how mutations affect protein kinases activity. The method is freely available as a user friendly, easy to use web server at <http://biosig.unimelb.edu.au/kinact/> |
Subject: | Bioinformática |
language: | Inglês |
Publisher: | Universidade Federal de Minas Gerais |
Publisher Initials: | UFMG |
Rights: | Acesso Aberto |
URI: | http://hdl.handle.net/1843/BUOS-ARRG8V |
Issue Date: | 25-Jul-2017 |
Appears in Collections: | Dissertações de Mestrado |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
identification_and_understanding_of_kinase_carlos_rodrigues.pdf | 9.45 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.