Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/ESBF-8ZKMCP
Type: Dissertação de Mestrado
Title: Uma abordagem baseada em fluxo de filtros para o reconhecimento de entidades em mensagens do twitter
Authors: Diego Marinho de Oliveira
First Advisor: Alberto Henrique Frade Laender
First Co-advisor: Adriano Alonso Veloso
First Referee: Adriano Alonso Veloso
Second Referee: Gisele Lobo Pappa
Third Referee: Renato Martins Assuncao
metadata.dc.contributor.referee4: Luiz Enrique Zarate
Abstract: A tarefa de reconhecimento de entidades consiste em localizar e classificar elementos em um texto não estruturado por meio de técnicas de processamento de linguagem natural apropriadas ao domínio da aplicação. Recentemente, microblogs como o Twitter, por exemplo, tornou-se um fenômeno na Web, representando um novo desafio para o reconhecimento de entidades. Dessa forma, este trabalho propõe uma abordagem alternativa denominada FS-NER (Filter Stream Named Entity Recognition) que se baseia na utilização de filtros de forma independente e rápida, altamente escalável e adequada ao ambiente do Twitter para o reconhecimento de entidades. Os resultados obtidos demonstraram que apesar da simplicidades dos filtros usados, a abordagem FS-NER foi capaz de superar as outras baseadas em Conditional Random Fields com melhoria média de 3% para a métrica F1. Além disso, essa abordagem apresenta ordem de magnitude mais rápida e, portanto, mais apropriada para o paradigma de fluxo de dados do Twitter.
Abstract: The task of entity named recognition is to locate and classify elements in unstructured text through techniques of natural language processing appropriate to the application domain. In the Web context, this task is critical to the identification of entities such as people, organizations, places, among others. Recently, microblogs like Twitter and Tumblr became a phenomenon on the Web, representing a new challenge for the recognition of entities. In Twitter, for example, traffic a large volume of messages in a short time, dificulting the task and the extraction of information about a particular subject. Moreover, the Twitter environment is quite dynamic and driven by data stream, requiring thus tools and methods suited to its characteristics. There is not in the literature, however, many works that address this issue, showing a wide area of ​​research to be conducted for named entity recognition in this environment. Thus, this master thesis proposes an alternative approach to perform this task called FS-NER (Filter Stream Named Entity Recognition). The FS-NER approach is based on the use of filters in an independent and fast manner, highly scalable and suitable for the environment of the Twitter for named entity recognition. In order to evaluate the effectiveness of the proposed approach, we carried out an exhaustive set of experiments using messages of Twitter. In these experiments, we used three distinct collections: one containing messages in English, one in Portuguese and third in several languages. The results showed that despite the simplicities of the filters used, the proposed approach was able to outperform the other approach based on Conditional Random Fields with improvement mean of 3% for the F1 metric. Moreover, this approach presents orders of magnitude faster and therefore more suitable for the typical data stream paradigm of Twitter.
Subject: Computação
Redes sociais on-line
Twitter
language: Português
Publisher: Universidade Federal de Minas Gerais
Publisher Initials: UFMG
Rights: Acesso Aberto
URI: http://hdl.handle.net/1843/ESBF-8ZKMCP
Issue Date: 26-Oct-2012
Appears in Collections:Dissertações de Mestrado

Files in This Item:
File Description SizeFormat 
disserta__o___diegomoliveira.pdf1.97 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.