Active learning in contextual bandits: handling the uncertainty about the user's preferences in interactive recommendation systems

Nicollas de Campos Silva

Active learning in contextual bandits: handling the uncertainty about the user's preferences in interactive recommendation systems

dc.creator	Nicollas de Campos Silva
dc.date.accessioned	2023-12-19T19:51:02Z
dc.date.accessioned	2025-09-08T23:03:00Z
dc.date.available	2023-12-19T19:51:02Z
dc.date.issued	2023-07-03
dc.description.abstract	Atualmente, Sistemas de Recomendação (SsR) têm se preocupado com o ambiente online de aplicações do mundo real, onde o sistema deve continuamente aprender e prever novas recomendações. Trabalhos atuais têm abordado essa tarefa como um problema de MultiArmed Bandit (MAB) ao propor modelos de Contextual Bandit (CB). A ideia é aplicar técnicas de recomendação usuais para explorar as preferências do usuário, enquanto o sistema também tenta aprender novas informações sobre seus gostos. Contudo, o nível de personalização desses modelos ainda está diretamente relacionado às informações previamente disponíveis sobre os usuários. Após uma extensa revisão da literatura sobre o assunto, observamos que os algoritmos atuais têm negligenciado o impacto de cenários de incerteza sobre as preferências do usuário. Assumindo que o modelo bandit pode aprender independentemente do item recomendado, tais modelos estão perdendo uma oportunidade de obter mais informações sobre os usuários. Nesse sentido, esta dissertação aborda o desafio de lidar com cenários de incerteza em modelos de Contextual Bandit. Em particular, investigamos dois cenários comuns em sistemas interativos: (1) quando o usuário entra pela primeira vez e (2) quando o sistema continua fazendo recomendações incorretas devido a suposições enganosas anteriores. Em ambos os cenários, propomos introduzir conceitos de Reinforcement Learning para representar o trade-off entre exploitation e exploration nos modelos bandit. Nossa solução consiste em recomendar itens não personalizados com base na entropia e na popularidade para obter mais informações sobre o usuário sem diminuir a precisão do modelo quando um cenário de incerteza é observado. Essa solução é então instanciada em três algoritmos bandit tradicionais, criando novas versões de cada um deles. Experimentos em domínios de recomendação distintos mostram que essas versões modificadas superam suas versões originais e todas as demais linhas de base, aumentando a acurácia a longo prazo. Além disso, uma avaliação contrafactual valida que tais melhorias não foram simplesmente alcançadas devido ao viés de conjuntos de dados offline.
dc.description.sponsorship	CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
dc.identifier.uri	https://hdl.handle.net/1843/62090
dc.language	eng
dc.publisher	Universidade Federal de Minas Gerais
dc.rights	Acesso Aberto
dc.subject	Computação – Teses
dc.subject	Sistemas de Recomendação, Multi-Armed Bandits
dc.subject.other	Recommendation Systems
dc.subject.other	Multi-Armed Bandits
dc.title	Active learning in contextual bandits: handling the uncertainty about the user's preferences in interactive recommendation systems
dc.title.alternative	Active learning em modelos bandit contextuais: lidando com a incerteza sobre a preferência dos usuários em sistemas de recomendação interativos
dc.type	Tese de doutorado
local.contributor.advisor-co1	Leonardo Chaves Dutra da Rocha
local.contributor.advisor1	Adriano César Machado Pereira
local.contributor.advisor1Lattes	http://lattes.cnpq.br/6813736989856243
local.contributor.referee1	Rodrygo Luis Teodoro Santos
local.contributor.referee1	Anísio Mendes Lacerda
local.contributor.referee1	Fernando Henrique de Jesus Mourão
local.contributor.referee1	Marcelo Garcia Manzato
local.creator.Lattes	http://lattes.cnpq.br/2903958691750105
local.description.resumo	Nowadays, Recommendation Systems (RSs) have been concerned about the online environment of real-world applications where the system should continually learn and predict new recommendations. Current works have addressed this task as a Multi-Armed Bandit (MAB) problem by proposing Contextual Bandit (CB) models. The idea is to apply usual recommendation techniques to exploit the user’s preferences while the system also addresses some exploration to learn new information about their tastes. The personalisation level of such models is still directly related to the information previously available about the users. However, after an extensive literature review on this topic, we observe that current algorithms have neglected the impact of scenarios of uncertainty about the user’s preferences. Assuming that the bandit model can learn regardless of the recommended item, such models are wasting an opportunity to get more information about the users. In this sense, this dissertation addresses the challenge of handling scenarios of uncertainty in Contextual Bandit models. In particular, we investigate two usual scenarios in interactive systems: (1) when the user joins for the first time and (2) when the system continually makes wrong recommendations because of prior misleading assumptions. In both scenarios, we propose to introduce concepts from the Active Learning theory to represent the usual trade-off between exploration and exploitation in the bandit models. Our solution consists of recommending non-personalised items based on entropy and popularity to get more information about the user without decreasing the model’s accuracy when an uncertain scenario is observed. This solution is then instantiated into three traditional bandit algorithms, creating new versions of each of them. Experiments in distinct recommendation domains show that these modified versions outperform their original ones and all baselines by increasing the cumulative reward in the long run. Moreover, a counterfactual evaluation validates that such improvements were not simply achieved due to the bias of offline datasets.
local.identifier.orcid	https://orcid.org/0000-0003-4393-3348
local.publisher.country	Brasil
local.publisher.department	ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
local.publisher.initials	UFMG
local.publisher.program	Programa de Pós-Graduação em Ciência da Computação

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: PhD thesis - final version.pdf
Tamanho:: 2.81 MB
Formato:: Adobe Portable Document Format

Baixar

Licença do pacote

Agora exibindo 1 - 1 de 1

Nome:: license.txt
Tamanho:: 2.07 KB
Formato:: Plain Text
Descrição:

Baixar

Coleções

Pós-Graduação em Ciência da Computação - Teses