Um algoritmo evolucionário para mineração de fluxo de dados em microblogs

Juliana Oliveira Ferreira

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/ESBF-9GMPLV

Full metadata record

DC Field	Value	Language
dc.contributor.advisor1	Gisele Lobo Pappa	pt_BR
dc.contributor.referee1	Luiz Henrique de Campos Merschmann	pt_BR
dc.contributor.referee2	Omar Paranaiba Vilela Neto	pt_BR
dc.creator	Juliana Oliveira Ferreira	pt_BR
dc.date.accessioned	2019-08-12T11:59:03Z	-
dc.date.available	2019-08-12T11:59:03Z	-
dc.date.issued	2013-07-01	pt_BR
dc.identifier.uri	http://hdl.handle.net/1843/ESBF-9GMPLV	-
dc.description.abstract	The outgrowing number of information posted by users in social network, together with other resources provided by the Web 2.0, asked for a paradigm shift in the way data-based systems work. A few well-behaved data instances were replaced by a continuous and non-stationary data flow. Hence, traditional mining algorithms used to extract patterns from data had to be adapted for dealing with these new reality. Given the nature of data flows, algorithms had to learn how to deal with at least three challenges: (i) What data should be kept and which should be discarded during the learning process? (ii) When should the classification model be updated? (iii) How should it be update?In this direction, this paper proposes an evolutionary algorithm (EA) for learning in data streams and is able to explore the evolution of classifiers together with the evolution of the data. One of the main reasons we use AE is that it has a population of possible solutions to the problem which tends to evolve over time by selecting the fittest individuals and operations of crossover and mutation. This feature can be exploited so that, over time, both models as data evolve simultaneously.The proposed algorithm works with a dynamic vocabulary, and tackles the three challenges aforementioned. It uses a method based on a data repository, which stores a predefined set of instances. The Page-Hinkley (PH) statistical test is used to detect changes in the performance of classifiers, signaling when the model should be retrained. The model is updated leveraging the evolution operators of the EA.The method was tested in four datasets of short text and extensive vocabulary collected from Twitter, each of them corresponding to a real-life event. The results were compared with two state of the art algorithms from the literature, and the results obtained were equal to or better than those obtained by these algorithms.	pt_BR
dc.description.resumo	A amplitude e a velocidade com que a informação passou a ser propagada na Web provocaram mudanças na maneira como os sistemas baseados em dados trabalham, gerando sistemas em que o fluxo de dados é contínuo e o conhecimento não é estacionário. Entretanto, os algoritmos antes utilizados para extrair padrões e informações desses dados não são ideais para tratar dessas novas características. Visando tratar dos desafios trazidos pelos fluxos de dados, este trabalho propõe a utilização de um algoritmo evolucionário para aprendizado em fluxos de dados que seja capaz de explorar a evolução dos classificadores justamente com a evolução dos dados. O foco da pesquisa são bases de dados com texto curto e vocabulário extenso, como os gerados na rede social Twitter. A técnica foi validada utilizando 4 bases de dados de eventos em escala mundial. Os resultados foram comparados a dois algoritmos estado da arte na literatura, e os resultados obtidos foram iguais ou melhores aos das técnicas atuais.	pt_BR
dc.language	Português	pt_BR
dc.publisher	Universidade Federal de Minas Gerais	pt_BR
dc.publisher.initials	UFMG	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.subject	Algoritmos Evolucionários	pt_BR
dc.subject	Classificador	pt_BR
dc.subject	Fluxo de Dados Contínuos	pt_BR
dc.subject	Mineração de Dados	pt_BR
dc.subject	Algoritmos Genéticos	pt_BR
dc.subject.other	Computação	pt_BR
dc.subject.other	Algoritmos genéticos	pt_BR
dc.subject.other	Mineração de dados (Computação)	pt_BR
dc.title	Um algoritmo evolucionário para mineração de fluxo de dados em microblogs	pt_BR
dc.type	Dissertação de Mestrado	pt_BR
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
julianaoliveiraferreira.pdf		10.74 MB	Adobe PDF	View/Open

Show simple item record