Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning
Carregando...
Data
Autor(es)
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Minas Gerais
Descrição
Tipo
Monografia de especialização
Título alternativo
Classification and analysis of entries from the Encyclopedia of Conscientiology using natural language processing and machine learning methods
Primeiro orientador
Membros da banca
Luiz Henrique Duczmal
Uriel Moreira Silva
Uriel Moreira Silva
Resumo
Abstract
Currently, there is a great interest in developing statistical text analysis. Extracting keywords and efficiently creating vectors to enable the application of statistical methods, classification algorithms, and pattern detection are frequent challenges in this field. Sentiment analysis, which assesses the degree of positivity, neutrality, or negativity in texts, is a growing research area. To better understand natural language processing techniques and develop sentiment analysis, this research utilizes the Python programming language and its various libraries for
text processing, data processing, and machine learning, such as PyPDF , Pandas, NumPy, SpaCy, NLTK, Scikit-learn and SciPy. The method employed involves extracting text from PDF files, cleaning the data to eliminate noise, missing information, and duplicates, preprocessing the data to convert it into the appropriate format for model input, and finally, applying machine learning models to classify the PDF files. The dataset was created using 2019 entries from the Encyclopedia of Conscientiology, each containing information such as the title (or research topic) and a classification that can be positive, neutral, or negative. The objective of this research is to classify the entries from the Encyclopedia of Conscientiology using machine learning models such as Naïve Bayes, Logistic Regression, Support Vector Classifiers, Random Forests and Neural Networks. Additionally, a descriptive analysis of the results was performed using statistical techniques. To validate the models, a random sampling technique was used, such as stratified cross-validation, and the f1-score was used as a classification metric for imbalanced classes.
Assunto
Estatística, Análise de regressão logística, Classificação (Computadores), Aprendizado do computador, Processamento de linguagem natural, Redes neurais
Palavras-chave
Machine learning, Processamento de linguagem natural, Redes neurais, Algoritmos de classificação, Análise de sentimentos
Citação
Departamento
Endereço externo
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso aberto
