Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning

dc.creatorGabriel Augusto Narciso Barreiros
dc.date.accessioned2026-02-13T15:58:19Z
dc.date.issued2025-09-05
dc.description.abstractCurrently, there is a great interest in developing statistical text analysis. Extracting keywords and efficiently creating vectors to enable the application of statistical methods, classification algorithms, and pattern detection are frequent challenges in this field. Sentiment analysis, which assesses the degree of positivity, neutrality, or negativity in texts, is a growing research area. To better understand natural language processing techniques and develop sentiment analysis, this research utilizes the Python programming language and its various libraries for text processing, data processing, and machine learning, such as PyPDF , Pandas, NumPy, SpaCy, NLTK, Scikit-learn and SciPy. The method employed involves extracting text from PDF files, cleaning the data to eliminate noise, missing information, and duplicates, preprocessing the data to convert it into the appropriate format for model input, and finally, applying machine learning models to classify the PDF files. The dataset was created using 2019 entries from the Encyclopedia of Conscientiology, each containing information such as the title (or research topic) and a classification that can be positive, neutral, or negative. The objective of this research is to classify the entries from the Encyclopedia of Conscientiology using machine learning models such as Naïve Bayes, Logistic Regression, Support Vector Classifiers, Random Forests and Neural Networks. Additionally, a descriptive analysis of the results was performed using statistical techniques. To validate the models, a random sampling technique was used, such as stratified cross-validation, and the f1-score was used as a classification metric for imbalanced classes.
dc.identifier.urihttps://hdl.handle.net/1843/1651
dc.languagepor
dc.publisherUniversidade Federal de Minas Gerais
dc.rightsAcesso aberto
dc.rightsCC0 1.0 Universalen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/
dc.subjectEstatística
dc.subjectAnálise de regressão logística
dc.subjectClassificação (Computadores)
dc.subjectAprendizado do computador
dc.subjectProcessamento de linguagem natural
dc.subjectRedes neurais
dc.subject.otherMachine learning
dc.subject.otherProcessamento de linguagem natural
dc.subject.otherRedes neurais
dc.subject.otherAlgoritmos de classificação
dc.subject.otherAnálise de sentimentos
dc.titleClassificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning
dc.title.alternativeClassification and analysis of entries from the Encyclopedia of Conscientiology using natural language processing and machine learning methods
dc.typeMonografia de especialização
local.contributor.advisor1Marcos Antonio da Cunha Santos
local.contributor.advisor1Latteshttp://lattes.cnpq.br/7054616839592595
local.contributor.referee1Luiz Henrique Duczmal
local.contributor.referee1Uriel Moreira Silva
local.publisher.countryBrasil
local.publisher.departmentICX - DEPARTAMENTO DE ESTATÍSTICA
local.publisher.initialsUFMG
local.publisher.programCurso de Especialização em Estatística
local.subject.cnpqCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::INFERENCIA PARAMETRICA

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
Classificação e Análise de Verbetes da Enciclopédia da Conscienciologia com Processamento de Linguagem Natural e Métodos de Machine Learning.pdf
Tamanho:
14.33 MB
Formato:
Adobe Portable Document Format

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
license.txt
Tamanho:
2.07 KB
Formato:
Item-specific license agreed to upon submission
Descrição: