A fuzzy data reduction cluster method based on boundary information for large datasets

Gustavo Rodrigues Lacerda Silva; Paulo Neto; Luiz Torres; Antônio Braga

doi:https://doi.org/10.1007/s00521-019-04049-4

A fuzzy data reduction cluster method based on boundary information for large datasets

Data

2019

Autor(es)

Gustavo Rodrigues Lacerda Silva

Paulo Neto

Luiz Torres

Antônio Braga

Editor

Universidade Federal de Minas Gerais

Tipo

Artigo de periódico

Resumo

The fuzzy c-means algorithm (FCM) is aimed at computing the membership degree of each data point to its corresponding cluster center. This computation needs to calculate the distance matrix between the cluster center and the data point. The main bottleneck of the FCM algorithm is the computing of the membership matrix for all data points. This work presents a new clustering method, the bdrFCM (boundary data reduction fuzzy c-means). Our algorithm is based on the original FCM proposal, adapted to detect and remove the boundary regions of clusters. Our implementation efforts are directed in two aspects: processing large datasets in less time and reducing the data volume, maintaining the quality of the clusters. A significant volume of real data application (> 106 records) was used, and we identified that bdrFCM implementation has good scalability to handle datasets with millions of data points.

Assunto

Computação

Palavras-chave

Data reduction techniques can be considered a useful strategy to handle the heterogeneity and massiveness of big datasets by reducing the high data volume into a manageable size. One way to use data reduction in big datasets is to apply sampling approaches. Usually, these methods extract some piece of information from big datasets without resorting to high-performance computing.

URI

https://hdl.handle.net/1843/82228

Departamento

ENG - DEPARTAMENTO DE ENGENHARIA ELETRÔNICA

Endereço externo

https://link.springer.com/article/10.1007/s00521-019-04049-4

Coleções

Artigo de Periódico

Página do item completo

A fuzzy data reduction cluster method based on boundary information for large datasets

Data

Autor(es)

Título da Revista

ISSN da Revista

Título de Volume

Editor

Descrição

Tipo

Título alternativo

Primeiro orientador

Membros da banca

Resumo

Abstract

Assunto

Palavras-chave

Citação

URI

Departamento

Curso

Endereço externo

Coleções

Avaliação

Revisão

Suplementado Por

Referenciado Por