Escalonamento baseado em localidade no ambiente Watershed

Bruno Cerqueira Hott

Escalonamento baseado em localidade no ambiente Watershed

dc.creator	Bruno Cerqueira Hott
dc.date.accessioned	2019-08-09T18:05:36Z
dc.date.accessioned	2025-09-09T01:26:27Z
dc.date.available	2019-08-09T18:05:36Z
dc.date.issued	2016-07-15
dc.description.abstract	Increased in connectivity and bandwidth on the Internet, combined with the reduced cost of electronic equipment in general have caused an explosion in the volume of data traveling over the network. At the same time, resources to store these data have been growing, which led to the appearance of specially developed systems to process them, and as an early example the MapReduce model of Google, which was followed by several open source implementations such as Hadoop, and new models such as Spark. In addition, it was necessary a solution to the storage of this huge data set and distributed file systems like HDFS and Tachyon, were emerging. Because the data are now a very large volume and are distributed over multiple machines in a cluster, the problem arises of getting applications close to the databases in a effectively way.If this is not done, the price of moving the data through the system can be very high and impair the final performance of the application. Depending on location, the data access application may be performed directly on the disk of the local machine, the local memory via caching of memory or from another cluster machine via network. The various commitments in terms of storage capacity, access time and computational cost involved make nontrivial a positioning decision.This work implements the scheduling based on data locality in the Watershed processing environment. For this analysis was made an integration of Watershed Hadoop ecosystem, creating channels of communication with the HDFS distributed file systems and Tachyon. Based on the location information provided by these systems, we have implemented a process scheduler based on locality for Watershed applications on those file systems.Finally, experiments were conducted in order to compare the various means of manipulating files, either by the local file system, distributed or in memory. The results show the advantages of taking into account the placement of data in scheduling such applications.
dc.identifier.uri	https://hdl.handle.net/1843/ESBF-AEDNUF
dc.language	Português
dc.publisher	Universidade Federal de Minas Gerais
dc.rights	Acesso Aberto
dc.subject	Computação
dc.subject	Big data
dc.subject	Sistemas distribuidos
dc.subject	Sistemas distribuídos
dc.subject.other	Localidade de dados
dc.subject.other	Sistemas distribuídos
dc.subject.other	Big-data
dc.title	Escalonamento baseado em localidade no ambiente Watershed
dc.type	Dissertação de mestrado
local.contributor.advisor1	Dorgival Olavo Guedes Neto
local.contributor.referee1	Italo Fernando Scota Cunha
local.contributor.referee1	Renato Antonio Celso Ferreira
local.description.resumo	O aumento dos volumes de dados disponíveis para procesamento em diversos cenários e o surgimento de plataformas de armazenamento e processamento como Hadoop têm viabilizado novas aplicações, mas também criado novos desafios. Com um volume muito grande de dados, distribuídos por diversas máquinas, surge o problema de se levar as aplicações para perto dos dados, a fim de reduzir os custos com comunicação dentro do sistema. Entretanto, ainda existe pouco entendimento sobre a interferência da localidade dos dados no desempenho desses frameworks. Este trabalho avalia esse problema no contexto do ambiente Watershed. Para essa análise fizemos uma integração do Watershed ao ecossistema Hadoop e implementamos um escalonador baseado na informação de localidade fornecidas pelo sistema para aplicações Watershed. Os resultados obtidos comprovam as vantagens de se levar em conta o posicionamento dos dados no escalonamento de aplicações desse tipo.
local.publisher.initials	UFMG

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: brunohott.pdf
Tamanho:: 1.32 MB
Formato:: Adobe Portable Document Format

Baixar

Coleções

Pós-Graduação em Ciência da Computação - Dissertações