Prediction of alpha helices in proteins using Modified Logistic Regression Model

Carmelina Figueiredo Vieira Leite

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/33892

Type:	Dissertação
Title:	Prediction of alpha helices in proteins using Modified Logistic Regression Model
Authors:	Carmelina Figueiredo Vieira Leite
First Advisor:	Marcos Augusto dos Santos
Abstract:	The advance in proteins secondary structure prediction produces directly impacts on health and biological processes knowledge. Despite the achievements and advances, the prediction of proteins structure remains a challenge. Considering this fact, we propose a de novo method for the prediction of alpha helix. Initially, we created a list of proteins with low identity between them, from the repository Protein Data Bank, using PISCES. Each protein was separated into fragments (of size 9) using the sliding window technique. From the obtained fragments, we classified them into the ones that were 100% a standard type alpha helix, the ones that were not a 100% of the same type of secondary structure. For each fragment, we used a sliding window of size 3 to characterize them. These had a value associated with the occurrence of the alpha helix structure. It was possible to predict the secondary structure group, alpha helix, of an unknown protein/query. To accomplish our goals, we used modified logistic regression and constructed two methods for prediction of these structures. Tests of accuracy and specificity applied to the methods gave results greater than 70%. Unfortunately, the sensitivity did not show good results. One of the methods revealed to be a very promising application for the secondary structure prediction problem, and to a possible usage in other purpose. All methods were implemented in MatLab R2015b (2015)
Abstract:	O avanço na predição da estrutura secundária de proteínas produz diretamente impactos na saúde e no conhecimento de processos biológicos. Apesar das conquistas e avanços, a predição da estrutura de proteínas continua a ser um desafio. Neste trabalho, nós propomos um método de novo para a predição de alfa hélice. Primeiramente, criamos uma lista de proteínas com baixa identidade entre eles, a partir do Banco de dados Protein Data Bank, utilizando a ferramenta PISCES. Cada proteína foi separada em fragmentos de tamanho (9), utilizando a técnica de janela deslizante. Os fragmentos obtidos foram classificados em aqueles que são 100% alfa hélice do tipo padrão e aquelas que não têm 100% deste tipo de estrutura secundária. Para cada fragmento, utilizamos uma janela deslizante de tamanho 3 para caracterizar cada um. Estes tripletos têm um valor associado com a ocorrência da estrutura α hélice. Com isso, é possível prever a estrutura secundária de uma proteína desconhecida. Para isso, usamos regressão logística modificada e construídos dois métodos de predição. Testes de precisão, especificidade deram origem a resultados superiores a 70%. Infelizmente, a sensibilidade não teve um bom resultado. Um dos métodos criados revelou-se promissor, tanto para este problema quanto para os outros problemas. Todos os métodos foram implementados em Matlab R2015b (2015)
Subject:	Bioinformática Modelos Logísticos Previsões Proteínas
language:	eng
metadata.dc.publisher.country:	Brasil
Publisher:	Universidade Federal de Minas Gerais
Publisher Initials:	UFMG
metadata.dc.publisher.department:	ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS
metadata.dc.publisher.program:	Programa de Pós-Graduação em Bioinformatica
Rights:	Acesso Aberto
metadata.dc.rights.uri:	http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
URI:	http://hdl.handle.net/1843/33892
Issue Date:	29-Aug-2016
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
PPGBioinformatica_CarmelinaFigueiredoVieiraLeite_DissertacaoMESTRADO.pdf		2.63 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License