Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning
| dc.creator | Gabriel Augusto Narciso Barreiros | |
| dc.date.accessioned | 2026-02-13T15:58:19Z | |
| dc.date.issued | 2025-09-05 | |
| dc.description.abstract | Currently, there is a great interest in developing statistical text analysis. Extracting keywords and efficiently creating vectors to enable the application of statistical methods, classification algorithms, and pattern detection are frequent challenges in this field. Sentiment analysis, which assesses the degree of positivity, neutrality, or negativity in texts, is a growing research area. To better understand natural language processing techniques and develop sentiment analysis, this research utilizes the Python programming language and its various libraries for text processing, data processing, and machine learning, such as PyPDF , Pandas, NumPy, SpaCy, NLTK, Scikit-learn and SciPy. The method employed involves extracting text from PDF files, cleaning the data to eliminate noise, missing information, and duplicates, preprocessing the data to convert it into the appropriate format for model input, and finally, applying machine learning models to classify the PDF files. The dataset was created using 2019 entries from the Encyclopedia of Conscientiology, each containing information such as the title (or research topic) and a classification that can be positive, neutral, or negative. The objective of this research is to classify the entries from the Encyclopedia of Conscientiology using machine learning models such as Naïve Bayes, Logistic Regression, Support Vector Classifiers, Random Forests and Neural Networks. Additionally, a descriptive analysis of the results was performed using statistical techniques. To validate the models, a random sampling technique was used, such as stratified cross-validation, and the f1-score was used as a classification metric for imbalanced classes. | |
| dc.identifier.uri | https://hdl.handle.net/1843/1651 | |
| dc.language | por | |
| dc.publisher | Universidade Federal de Minas Gerais | |
| dc.rights | Acesso aberto | |
| dc.rights | CC0 1.0 Universal | en |
| dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | |
| dc.subject | Estatística | |
| dc.subject | Análise de regressão logística | |
| dc.subject | Classificação (Computadores) | |
| dc.subject | Aprendizado do computador | |
| dc.subject | Processamento de linguagem natural | |
| dc.subject | Redes neurais | |
| dc.subject.other | Machine learning | |
| dc.subject.other | Processamento de linguagem natural | |
| dc.subject.other | Redes neurais | |
| dc.subject.other | Algoritmos de classificação | |
| dc.subject.other | Análise de sentimentos | |
| dc.title | Classificação e análise de verbetes da Enciclopédia da Conscienciologia com processamento de linguagem natural e métodos de machine learning | |
| dc.title.alternative | Classification and analysis of entries from the Encyclopedia of Conscientiology using natural language processing and machine learning methods | |
| dc.type | Monografia de especialização | |
| local.contributor.advisor1 | Marcos Antonio da Cunha Santos | |
| local.contributor.advisor1Lattes | http://lattes.cnpq.br/7054616839592595 | |
| local.contributor.referee1 | Luiz Henrique Duczmal | |
| local.contributor.referee1 | Uriel Moreira Silva | |
| local.publisher.country | Brasil | |
| local.publisher.department | ICX - DEPARTAMENTO DE ESTATÍSTICA | |
| local.publisher.initials | UFMG | |
| local.publisher.program | Curso de Especialização em Estatística | |
| local.subject.cnpq | CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::INFERENCIA PARAMETRICA |
Arquivos
Pacote original
1 - 1 de 1
Carregando...
- Nome:
- Classificação e Análise de Verbetes da Enciclopédia da Conscienciologia com Processamento de Linguagem Natural e Métodos de Machine Learning.pdf
- Tamanho:
- 14.33 MB
- Formato:
- Adobe Portable Document Format
Licença do pacote
1 - 1 de 1
Carregando...
- Nome:
- license.txt
- Tamanho:
- 2.07 KB
- Formato:
- Item-specific license agreed to upon submission
- Descrição: