Overcoming class imbalance in drug discovery problems: graph neural networks and balancing approaches

dc.creatorRafael Lopes Almeida
dc.creatorVinicius G. Maltarollo
dc.creatorFrederico Gualberto Ferreira Coelho
dc.date.accessioned2025-05-27T14:07:16Z
dc.date.accessioned2025-09-09T00:22:38Z
dc.date.available2025-05-27T14:07:16Z
dc.date.issued2023
dc.identifier.doi10.1016/j.jmgm.2023.108627
dc.identifier.issn10933263
dc.identifier.urihttps://hdl.handle.net/1843/82516
dc.languageeng
dc.publisherUniversidade Federal de Minas Gerais
dc.relation.ispartofJournal of Molecular Graphics and Modelling
dc.rightsAcesso Aberto
dc.subjectRedes neurais (Computação)
dc.subject.otherGraph neural networks, Unbalanced dataset, Drug discovery
dc.subject.otherThe usage of a robust architecture can be beneficial for unbalanced datasets. Weighted loss function and oversampling improve performance on unbalanced datasets. Oversampled models have a higher chance of attaining a high MCC score. Case-specific strategies analysis for each dataset is recommended for better results.
dc.titleOvercoming class imbalance in drug discovery problems: graph neural networks and balancing approaches
dc.typeArtigo de periódico
local.citation.spage108627
local.citation.volume126
local.description.resumoThis research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.
local.publisher.countryBrasil
local.publisher.departmentENG - DEPARTAMENTO DE ENGENHARIA ELETRÔNICA
local.publisher.initialsUFMG
local.url.externahttps://www.sciencedirect.com/science/article/pii/S1093326323002255

Arquivos

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
License.txt
Tamanho:
1.99 KB
Formato:
Plain Text
Descrição: