Escalonamento baseado em localidade no ambiente Watershed

dc.creatorBruno Cerqueira Hott
dc.date.accessioned2019-08-09T18:05:36Z
dc.date.accessioned2025-09-09T01:26:27Z
dc.date.available2019-08-09T18:05:36Z
dc.date.issued2016-07-15
dc.description.abstractIncreased in connectivity and bandwidth on the Internet, combined with the reduced cost of electronic equipment in general have caused an explosion in the volume of data traveling over the network. At the same time, resources to store these data have been growing, which led to the appearance of specially developed systems to process them, and as an early example the MapReduce model of Google, which was followed by several open source implementations such as Hadoop, and new models such as Spark. In addition, it was necessary a solution to the storage of this huge data set and distributed file systems like HDFS and Tachyon, were emerging. Because the data are now a very large volume and are distributed over multiple machines in a cluster, the problem arises of getting applications close to the databases in a effectively way.If this is not done, the price of moving the data through the system can be very high and impair the final performance of the application. Depending on location, the data access application may be performed directly on the disk of the local machine, the local memory via caching of memory or from another cluster machine via network. The various commitments in terms of storage capacity, access time and computational cost involved make nontrivial a positioning decision.This work implements the scheduling based on data locality in the Watershed processing environment. For this analysis was made an integration of Watershed Hadoop ecosystem, creating channels of communication with the HDFS distributed file systems and Tachyon. Based on the location information provided by these systems, we have implemented a process scheduler based on locality for Watershed applications on those file systems.Finally, experiments were conducted in order to compare the various means of manipulating files, either by the local file system, distributed or in memory. The results show the advantages of taking into account the placement of data in scheduling such applications.
dc.identifier.urihttps://hdl.handle.net/1843/ESBF-AEDNUF
dc.languagePortuguês
dc.publisherUniversidade Federal de Minas Gerais
dc.rightsAcesso Aberto
dc.subjectComputação
dc.subjectBig data
dc.subjectSistemas distribuidos
dc.subjectSistemas distribuídos
dc.subject.otherLocalidade de dados
dc.subject.otherSistemas distribuídos
dc.subject.otherBig-data
dc.titleEscalonamento baseado em localidade no ambiente Watershed
dc.typeDissertação de mestrado
local.contributor.advisor1Dorgival Olavo Guedes Neto
local.contributor.referee1Italo Fernando Scota Cunha
local.contributor.referee1Renato Antonio Celso Ferreira
local.description.resumoO aumento dos volumes de dados disponíveis para procesamento em diversos cenários e o surgimento de plataformas de armazenamento e processamento como Hadoop têm viabilizado novas aplicações, mas também criado novos desafios. Com um volume muito grande de dados, distribuídos por diversas máquinas, surge o problema de se levar as aplicações para perto dos dados, a fim de reduzir os custos com comunicação dentro do sistema. Entretanto, ainda existe pouco entendimento sobre a interferência da localidade dos dados no desempenho desses frameworks. Este trabalho avalia esse problema no contexto do ambiente Watershed. Para essa análise fizemos uma integração do Watershed ao ecossistema Hadoop e implementamos um escalonador baseado na informação de localidade fornecidas pelo sistema para aplicações Watershed. Os resultados obtidos comprovam as vantagens de se levar em conta o posicionamento dos dados no escalonamento de aplicações desse tipo.
local.publisher.initialsUFMG

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
brunohott.pdf
Tamanho:
1.32 MB
Formato:
Adobe Portable Document Format