Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning

Gabriel Augusto Narciso Barreiros

Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning

Arquivos

Primário Classificação e Análise de Verbetes da Enciclopédia da Conscienciologia com Processamento de Linguagem Natural e Métodos de Machine Learning.pdf (14.33 MB)

Data

2025-09-05

Autor(es)

Gabriel Augusto Narciso Barreiros

Editor

Universidade Federal de Minas Gerais

Tipo

Monografia de especialização

Título alternativo

Classification and analysis of entries from the Encyclopedia of Conscientiology using natural language processing and machine learning methods

Primeiro orientador

Marcos Antonio da Cunha Santos

Membros da banca

Luiz Henrique Duczmal
Uriel Moreira Silva

Abstract

Currently, there is a great interest in developing statistical text analysis. Extracting keywords and efficiently creating vectors to enable the application of statistical methods, classification algorithms, and pattern detection are frequent challenges in this field. Sentiment analysis, which assesses the degree of positivity, neutrality, or negativity in texts, is a growing research area. To better understand natural language processing techniques and develop sentiment analysis, this research utilizes the Python programming language and its various libraries for text processing, data processing, and machine learning, such as PyPDF , Pandas, NumPy, SpaCy, NLTK, Scikit-learn and SciPy. The method employed involves extracting text from PDF files, cleaning the data to eliminate noise, missing information, and duplicates, preprocessing the data to convert it into the appropriate format for model input, and finally, applying machine learning models to classify the PDF files. The dataset was created using 2019 entries from the Encyclopedia of Conscientiology, each containing information such as the title (or research topic) and a classification that can be positive, neutral, or negative. The objective of this research is to classify the entries from the Encyclopedia of Conscientiology using machine learning models such as Naïve Bayes, Logistic Regression, Support Vector Classifiers, Random Forests and Neural Networks. Additionally, a descriptive analysis of the results was performed using statistical techniques. To validate the models, a random sampling technique was used, such as stratified cross-validation, and the f1-score was used as a classification metric for imbalanced classes.

Assunto

Estatística, Análise de regressão logística, Classificação (Computadores), Aprendizado do computador, Processamento de linguagem natural, Redes neurais

Palavras-chave

Machine learning, Processamento de linguagem natural, Redes neurais, Algoritmos de classificação, Análise de sentimentos

URI

https://hdl.handle.net/1843/1651

Departamento

ICX - DEPARTAMENTO DE ESTATÍSTICA

Curso

Curso de Especialização em Estatística

Coleções

Especialização em Estatística

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso aberto

Página do item completo

Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning

Arquivos

Data

Autor(es)

Título da Revista

ISSN da Revista

Título de Volume

Editor

Descrição

Tipo

Título alternativo

Primeiro orientador

Membros da banca

Resumo

Abstract

Assunto

Palavras-chave

Citação

URI

Departamento

Curso

Endereço externo

Coleções

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons