MICRO-REVISÕES GERADAS POR USUÁRIOS:
CARACTERIZAÇÃO E PREDIÇÃO DE
POPULARIDADE

MARISA AFFONSO VASCONCELOS
MICRO-REVISÕES GERADAS POR USUÁRIOS:
CARACTERIZAÇÃO E PREDIÇÃO DE
POPULARIDADE
Tese apresentada ao Programa de Pós-
-Graduação em Ciência da Computação do
Instituto de Ciências Exatas da Universi-
dade Federal de Minas Gerais como req-
uisito parcial para a obtenção do grau de
Doutor em Ciência da Computação.
Orientador: Jussara Marques de Almeida Gonçalves
Belo Horizonte
Fevereiro de 2015

MARISA AFFONSO VASCONCELOS
USER GENERATED MICRO REVIEWS:
CHARACTERIZATION AND POPULARITY
PREDICTION
Thesis presented to the Graduate Program
in Computer Science of the Universidade
Federal de Minas Gerais in partial fulfill-
ment of the requirements for the degree of
Doctor in Computer Science.
Advisor: Jussara Marques de Almeida Gonçalves
Belo Horizonte
February 2015
c© 2015, Marisa Affonso Vasconcelos.
Todos os direitos reservados.
Vasconcelos, Marisa Affonso
V331u User generated micro reviews: characterization and
popularity prediction / Marisa Affonso Vasconcelos. —
Belo Horizonte, 2015
xxv, 166 f. : il. ; 29cm
Tese (doutorado) — Universidade Federal de Minas
Gerais — Departamento de Ciência da Computação
Orientadora: Jussara Marques de Almeida
Gonçalves
1. Computação - Teses. 2. Redes sociais on-line -
Teses. 3. Predição (Lógica) - Teses. 4. Comportamento
do consumidor - Teses. I. Orientadora. II. Título.
CDU 519.6*04(043)


Aos meus pais, Maria Aparecida e Antônio, aos meus irmãos, Mariana e Daniel,
a minha avó Hercília, a amiga Vanessa Vidal e a todos aqueles que acreditaram nesse
trabalho.
“Reconhece a queda.
E não desanima.
Levanta, sacode a poeira.
E dá a volta por cima.”
Paulo Vanzolini, “Volta por Cima”
ix

Acknowledgments
À Prof. Jussara Almeida, orientadora da tese, agradeço o seu apoio, disponibilidade,
paciência e dedicação. Obrigada pelos seus ensinamentos, pela confiança e amizade ao
longo deste período. Ter sido sua orientada foi uma experiência extremamente rica.
Ao Prof. Marcos Gonçalves pela colaboração, dedicação e incentivo.
Eu gostaria de agradecer a todos os professores do PPGCC, em particular aos
professores Fabrício Benevenuto, Virgílio Almeida e Wagner Meira Jr. por estar sempre
dispostos a me ouvir e ajudar. Eu gostaria de agradecer aos vários funcionários e alunos
do PPGCC que sempre estiveram disponíveis para me auxiliar. Tenho muito orgulho
de ter feito parte de um programa de pós-graduação de tamanha excelência e qualidade.
Aos amigos do laboratório CAMPS que sempre estiveram disponíveis para me
auxiliar no que eu precisasse e em tornar essa caminhada mais divertida. Obrigada
Giovanni Comarela, Gabriel Magno, Tiago Rodrigues, Geraldo Franciscani, João Pesce,
Rafael Ottoni, Matheus Santos, Evandro Cunha, Diego Las Casas, Gustavo Rauber,
Emanuel Vianna, Diego Saez-Trumper e Felipe Moraes. À amiga Tatiana Pontes pela
colaboração e por estar sempre ao meu lado compartilhando as dificuldades e conquistas
dessa caminhada. Ao amigo Saulo Ricci por ser imprencidível para a realização desse
trabalho e pela amizade. Ao amigo Daniel Hasan pelas discussões que contribuíram
para o sucesso desse trabalho.
À amiga Vanessa Vidal que insistiu que eu seguisse as minhas intuições e meus
talentos no desenvolvimento desse trabalho.
Aos meus pais, irmãos e minha avó que torceram pelo meu sucesso. Obrigada
pelo amor, carinho e incentivo dado durante todo o processo.
À FAPEMIG e ao CNPq pelo apoio financeiro.
A todos aqueles que de alguma forma colaboraram para a realização deste tra-
balho.
xi

“Life can only be understood backwards; but it must be lived forwards.”
(Søren Kierkegaard)
xiii

Resumo
Desde a popularização da Web 2.0, as pessoas se tornaram cada vez mais engajadas
ao expressar suas opiniões através de revisões sobre produtos e serviços. Como outros
tipos de conteúdo gerado pelo usuário, revisões on-line vêm em várias formas, tamanhos
e qualidades. Tal variabilidade na qualidade é particularmente notória em revisões
textuais produzidas em aplicativos móveis, geralmente chamadas de micro-revisões ou
tips, devido à sua concisão inerente. Em um ambiente abundante de conteúdo, ser
capaz de estimar a utilidade de uma (micro-) revisão on-line, e finalmente, prever a sua
popularidade futura entre os usuários, com precisão e o mais cedo possível, pode ser
muito benéfico para os métodos de filtragem e recomendação de conteúdo, auxiliando
os usuários a encontrar revisões valiosas e fornecendo um feedback rápido a empresários
e futuros clientes.
Nesse contexto, investigamos como os usuários exploram micro-revisões, focando,
particularmente, nas tips do Foursquare, um tipo cada vez mais popular de revisão
cujo elevado grau de informalidade e concisão oferece dificuldades extras na concepção
de métodos de predição efetivos. Usando dados coletados do Foursquare, investigamos
também como a popularidade da tip, estimada pelo número de vezes que a tip recebeu
um like de um usuário, evolui ao longo do tempo e quais os fatores que podem ser
combinados para desenvolver um modelo para prever a popularidade da tip em um
dado instante no futuro. Finalmente, desenvolvemos soluções para duas diferentes
tarefas de predição: predição do ranking de popularidade de um conjunto de tips e a
predição do nível de popularidade que uma tip em particular irá alcançar. Resultados
experimentais mostram que um conjunto multidimensional de variáveis previsoras, que
considera atributos do usuário que postou a tip e do venue onde ela foi postada, leva a
resultados mais precisos do que a utilização de cada um desses conjuntos isoladamente.
Além disso, os modelos, quando aplicados às tips do Foursquare, são também mais
robustos que os modelos do estado da arte para predição de popularidade, já que
nossos modelos podem ser aplicados em qualquer tip, no momento da postagem ou
após dela.
xv
Palavras-chave: Micro-revisões, popularidade, predição, redes sociais, comporta-
mento do usuário.
xvi
Abstract
Since the popularization of the Web 2.0, people are becoming increasingly engaged
expressing their opinions with reviews about products and services. As any other
type of user-generated content, online reviews come in various forms, sizes and qual-
ities. Such quality variability is particularly prominent in textual reviews produced
on mobile apps, often called micro-reviews or tips, due to their inherent conciseness.
In such content abundant environment, being able to estimate the helpfulness of an
online (micro-)review, and ultimately predict its future popularity among users as ac-
curately and early as possible, can greatly benefit content filtering and recommendation
methods, helping users find valuable reviews and providing quick feedback to business
owners and future customers.
In this context, we investigate how users exploit micro-reviews, focusing partic-
ularly on Foursquare tips, an increasingly popular type of review whose high degree
of informality and briefness offers extra difficulties to the design of effective prediction
methods. Using collected data from Foursquare, we also investigate how tip popularity,
given by the number of times the tip received a “like” from a user, evolves over time
and which factors impact this popularity evolution. Then, we explore how these factors
can be combined to develop models to predict tip popularity at a given point in time
in the future. We develop solutions to two different prediction tasks: predicting the
popularity ranking of a set of tips and predicting the popularity level a particular tip
will achieve. Our experimental results show that a multidimensional set of predictor
variables, which considers features of both the user who posted the tip and the venue
where it was posted, leads to more accurate results than using each set of features in
isolation. Our models, when applied to Foursquare tips, are also more robust than
state-of-the-art popularity prediction methods, as they can be applied to any tip, at
or after posting time.
Keywords: micro-reviews, popularity, prediction, social networks, user behavior.
xvii

List of Figures
3.1 Screenshot of a Foursquare Venue Page. . . . . . . . . . . . . . . . . . . . 35
4.1 User Tipping Activity on Foursquare. . . . . . . . . . . . . . . . . . . . . . 42
4.2 Number of Friends and Followers and Number of Mayorships per User. . . 44
4.3 Fraction of Likes Received from the User’s Social Network (Friends and
Followers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Visiting and Tipping Activities per Venue. . . . . . . . . . . . . . . . . . . 46
4.5 Distributions per Venue Category. . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 Content Features of Foursquare Tips and Yelp Reviews. . . . . . . . . . . . 48
4.7 Correlation between User Attributes (top 3% users with largest percentages
of tips with links). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 User Profiles: Attribute Distributions. . . . . . . . . . . . . . . . . . . . . 53
4.9 Venue Category Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.10 Words Commonly Used in Users’ Tips. . . . . . . . . . . . . . . . . . . . . 56
4.11 Correlation between User Attributes (only users with at least 10 tips). . . 57
4.12 Degree Distribution of the User Network (log scale). . . . . . . . . . . . . . 60
4.13 Distribution of Tip Popularity over Time. . . . . . . . . . . . . . . . . . . 68
4.14 Distribution of Percentage of Likes Received During the First Month after
Posting Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.15 Distribution of Time Until x% of Total Likes are Received for the Most
Popular Tips (G1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.16 Social vs. Non Social Likes: Distribution of Percentage of Likes Received
over Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.17 Cumulative Distributions of Popularity Peak for Most Popular Tips (G1). . 74
5.1 Monitoring Time Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Temporal Data Split into Train and Test Sets. . . . . . . . . . . . . . . . . 89
xix
5.3 Correlations between the top-10 most popular tips at time tr and at time
tr + δ (δ in months). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Effectiveness of Ranking for Varying Target Time tr+δ: NY Scenario (Avg
and 95% Confidence Intervals). . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Tips Ranking Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Effectiveness of Ranking for Varying Target Time tr+δ: NY Food Scenario
(Avg and 95% Confidence Intervals) . . . . . . . . . . . . . . . . . . . . . . 95
5.7 Effectiveness of Ranking when Removing One Feature at a Time: NY Sce-
nario (Avg and 95% Confidence Intervals for All Considered Days) . . . . . 97
5.8 Effectiveness of Ranking When Using Only 4 Features for δ = 1 month (Avg
and 95% Confidence Intervals) . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1 Monitoring Time Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 Chronological Split of Training and Test Sets: Sliding Windows Over Time. 112
6.3 Macro-Average Results for Two Popularity Levels. . . . . . . . . . . . . . . 115
6.4 Macro-Average Results for Three Popularity Levels. . . . . . . . . . . . . . 116
6.5 Results for Tips in the Low Popularity Category. . . . . . . . . . . . . . . 118
6.6 Results for Tips in the High Popularity Category. . . . . . . . . . . . . . . 119
6.7 Distribution of Most Important User Feature for Predicting a Tip’s Popu-
larity Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.8 Distributions of the Most Important Venue Features for Predicting a Tip’s
Popularity Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.9 Distribution of the Most Important Content Feature for Predicting a Tip’s
Popularity Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.10 Macro-Average Precision and Recall for OLS using one feature at time . . 124
6.11 Macro-Average Results for OLS After Removing Each Collinear Feature. . 126
6.12 Recall for OLS When Removing One Feature at a Time. . . . . . . . . . . 127
6.13 Macro-Average Results for Various Monitoring Times ε (δ = 1 month). . . 129
6.14 Macro-Average Results for Various Target Times δ (ε=0). . . . . . . . . . 132
6.15 Model Accuracy in the Training Set (Each point is a Sector in the 10-
Dimensional Space Defined by the Top-10 Features). . . . . . . . . . . . . 140
6.16 Model Accuracy in the Testing Set (Each Point is a Sector in the 10-
Dimensional Space Defined by the Top-10 Features). . . . . . . . . . . . . 141
xx
List of Tables
3.1 Summary of Our Venue Dataset. . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Summary of Our User Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Summary of Users Tipping Activities. . . . . . . . . . . . . . . . . . . . . . 43
4.2 Summary of Visiting and Tipping Activities at Venues. . . . . . . . . . . . 45
4.3 Summary of Tip Textual Characteristics. . . . . . . . . . . . . . . . . . . . 48
4.4 Summary of User Attributes Across Clusters. . . . . . . . . . . . . . . . . 52
4.5 Results of the Manual Inspection of a Sample of Users from Each Cluster. 56
4.6 Summary Statistics for User Influence Networks per Venue Category as well
as for All Categories (General). . . . . . . . . . . . . . . . . . . . . . . . . 59
4.7 Kendall τ Correlation Values Between Rankings Lists. . . . . . . . . . . . 64
4.8 Top-5 Most Influential Users Overall and per Venue Category According to
Each Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 Distribution of Likes for Groups of Tips . . . . . . . . . . . . . . . . . . . 68
4.10 Rich-get-Richer Analysis: Coefficients α (and 95% Confidence Intervals)
and R2 of Linear Regressions from (log) Popularity in tr to (log) Popularity
tr + δ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1 Tip’s Syntactic Content Features. . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Complete Set of Features for Tip Popularity Prediction . . . . . . . . . . . 88
5.3 Overview of Datasets and Scenarios of Evaluation . . . . . . . . . . . . . . 89
5.4 Features Ranked by Information Gain . . . . . . . . . . . . . . . . . . . . 96
6.1 Distribution of Candidates for Prediction Across Different Popularity Levels 103
6.2 Complete Set of Features for Tip Popularity Level Prediction . . . . . . . . 111
6.3 Confusion Matrix for a Three-Class Classification Task . . . . . . . . . . . 113
6.4 Examples of Confusion Matrices for a Two-Class Classification Task . . . . 117
6.5 Features Ranked by Information Gain. . . . . . . . . . . . . . . . . . . . . 121
6.6 Features with High Collinearity with at Least One Other Feature. . . . . . 125
xxi
6.7 Macro-Average Results of Models that Use Early Popularity Measurements
(only tips with at least 1 like, ε=168 hours, δ=1 month). . . . . . . . . . . 130
6.8 Geographical Model Specialization: Macro-Average Results . . . . . . . . . 134
6.9 Categorical Model Specialization: Macro-Average Results. . . . . . . . . . 136
xxii
Contents
Acknowledgments xi
Resumo xv
Abstract xvii
List of Figures xix
List of Tables xxi
1 Introduction 1
1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Dissertation Goals and Contributions . . . . . . . . . . . . . . . . . . . 4
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Organization of this Dissertation . . . . . . . . . . . . . . . . . . . . . 9
2 Literature Review 11
2.1 Information Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Predicting the Quality of User Generated Content . . . . . . . . . . . . 13
2.2.1 Helpfulness of Online Reviews . . . . . . . . . . . . . . . . . . . 14
2.2.2 Opinion Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Spam Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Analysis of Online Content Popularity . . . . . . . . . . . . . . . . . . 20
2.3.1 Popularity Prediction Models . . . . . . . . . . . . . . . . . . . 21
2.3.2 Information Propagation and Social Influence Models . . . . . . 24
2.4 Analyses of Location-Based Social Networks . . . . . . . . . . . . . . . 27
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Foursquare: Case Study 33
3.1 Foursquare: Key Elements and Features . . . . . . . . . . . . . . . . . 33
xxiii
3.2 Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Crawling Methodology . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Venue Dataset (Dataset 1) . . . . . . . . . . . . . . . . . . . . . 37
3.2.3 User Dataset (Dataset 2) . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Tipping Activity on Foursquare: Characterization and User Influence 41
4.1 Characterization of Tipping Activity . . . . . . . . . . . . . . . . . . . 41
4.1.1 User Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 Venue Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.3 Tip Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 User Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Suspicious Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Uncovering User Profiles . . . . . . . . . . . . . . . . . . . . . . 51
4.3 User Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 User Behavioral Patterns . . . . . . . . . . . . . . . . . . . . . . 57
4.3.2 User Influence Network . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.3 Measuring User Influence . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Dynamics of Tip Popularity Evolution . . . . . . . . . . . . . . . . . . 67
4.4.1 Popularity Evolution . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.2 The Role of the Social Network . . . . . . . . . . . . . . . . . . 71
4.4.3 Popularity Peak . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4.4 The Rich-Get-Richer Phenomenon . . . . . . . . . . . . . . . . 73
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Predicting the Popularity Ranking of a Set of Tips 79
5.1 Popularity Prediction Task . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Ranking Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Tip Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.1 User Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.2 Venue Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Tip’s Content Features . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.1 Ranking Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.2 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.3 Experiments Removing Features . . . . . . . . . . . . . . . . . . 94
xxiv
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6 Predicting the Popularity Level of a Tip 101
6.1 Popularity Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Tip Popularity Prediction: Formal Definition . . . . . . . . . . . . . . . 103
6.3 Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.1 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . 105
6.3.2 Ordinary Least Square Regression (OLS) . . . . . . . . . . . . . 106
6.3.3 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . 106
6.4 Tip Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.5 Methodology Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.6 Experimental Results: Predictions at Posting Time . . . . . . . . . . . 114
6.6.1 Analysis of the Groups of Features . . . . . . . . . . . . . . . . 114
6.6.2 Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.7 Experimental Results: Other Prediction Scenarios . . . . . . . . . . . . 127
6.7.1 Prediction Results Varying the Monitoring Period ε . . . . . . . 128
6.7.2 Prediction Results Varying Target Prediction Window δ . . . . 131
6.8 Model Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.8.1 City-Based Model Specialization . . . . . . . . . . . . . . . . . . 134
6.8.2 Category-Based Model Specialization . . . . . . . . . . . . . . . 135
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7 Conclusions and Future Work 143
7.1 Main Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography 149
xxv

Chapter 1
Introduction
In the last years, we have seen an increasing amount of data, especially on personal
interests and activities, shared on the Web. This was possible thanks to the success of
online social networks (OSNs) which not only have enhanced the connectivity among
people but also have allowed the dissemination and visibility of user-generated content
(UGC), previously restricted to some niches. Particularly, users are no longer only the
target of information on products and services published in advertising campaigns but
also often act as media and content producers, commenting on and evaluating previous
experiences. Usually, these recommendations are posted as online reviews of products,
services, and businesses. Previously, people used to share their opinions by word-of-
mouth, orally to each other or through anonymous comments deposited in suggestion
boxes. Nowadays, the social Web allows people to interact and freely share opinions
on products, services or companies in real-time and in large scale.
In fact, the number of product and service reviews available online has been
increasing at many retailer websites such as Amazon1, and Walmart2, as well as on
specialized review websites such as Epinions3, TripAdvisor4 and Yelp5. More and more
people base their buying decisions on online reviews written by others [Chen et al.,
2004]. Indeed, several studies and surveys have found evidence that online reviews
affect product sales [Chevalier and Mayzlin, 2006; Ante, 2009; Li and Hitt, 2008].
Moreover, these user generated reviews nourish the relationship between customers and
real businesses, offering constructive criticism and possibly a competitive advantage to
business owners. Reviews also work as a benchmark of their offered products and
1http://www.amazon.com
2http://www.walmart.com
3http://www.epinions.com
4http://www.tripadvisor.com/
5http://www.yelp.com
1
2 Chapter 1. Introduction
services, and may be used to drive more effective marketing strategies [Chen and Xie,
2008].
The number of reviews on a single product or service available may be large,
varying greatly in quality. Some reviews may contain spam or misleading and fake
information [Lappas, 2012; Lin et al., 2014], which may make it hard for users to
find helpful reviews. To support that task, many websites allow users to evaluate
reviews, by voting in their helpfulness. Unfortunately, this feedback is usually very
sparse [O’Mahony and Smyth, 2009]. Moreover, ranking reviews based solely on the
helpfulness votes received may not be useful for promoting recently posted reviews,
with few or no votes, which, regardless of their potential helpfulness, are doomed to be
outranked by older reviews that have already received more votes. Thus, those reviews
may never gain visibility.
This problem has already inspired a series of studies attempting to automatically
predict the quality and helpfulness of a review [Kim et al., 2006; Liu et al., 2007;
T.Ngo-Ye and Sinha, 2014], estimated by the number of people who found the review
useful. The same metric can be also used as a measure of the review’s popularity
as it provides a lower bound on the number of people who actually read the review.
In order to compare with the state-of-the-art models, we here use the terms quality,
utility, helpfulness and popularity interchangeably.
Both user and business owners can benefit from such predictions of review popu-
larity, as they can drive the design of automatic review filtering and recommendation
schemes as well as review ranking methods, which in turn can help users find poten-
tially more valuable reviews (or reviews that are likely to draw more attention in the
future). Predicting the potential popularity of a review can also stimulate reviewers
to post higher quality reviews as a rapid feedback can be provided to review authors.
Similarly, such predictions can also offer valuable feedback to business owners who are
able to more quickly identify (and fix) aspects of their services or products that may
affect revenues most, since potentially more popular reviews may contain information
about how a product is seen by a larger fraction of the customers.
However, previous efforts to predict the popularity of online reviews focused on
longer, more verbose and formally structured reviews, such as those present in systems
like Amazon and TripAdvisor, often exploiting textual and content related features
(e.g., review length, readability) [Kim et al., 2006; Liu et al., 2007; T.Ngo-Ye and Sinha,
2014]. Yet, with the diffusion of smartphones, new services were created targeting
mainly social networking users who spend most of their time accessing information
through mobile apps. In this environment, the communication is usually briefer mainly
because of the limited amount of information that can be displayed on the mobile
3screen. This limitation may have also influenced the creation of new review services
(Foursquare1, Google+ Local2), and the expansion of traditional desktop services to
the mobile environment (e.g., Yelp, Trip Advisor). In these services, users write micro-
reviews or tips, which are typically much more concise (e.g., up to 200 characters),
often written while the information is still fresh in the user’s mind, and may contain
much more subjective and informal content, varying from a narrow recommendation
(“You must try the apple pie.”) to a general warning (“Stay away from this place”). In
this dissertation, we focus on this special type of review, the micro-review (or tip)3,
popularized by Foursquare.
Unlike several traditional reviews systems, where users are allowed to assign un-
helpfulness signals to a review, tips are rated by other users by simply clicking on a
“like” mark. The number of “likes” received by a tip can then be seen as an estimate
of its helpfulness or popularity. However, the lack of a “like” does not imply that a tip
was not helpful or interesting, as it may not have been seen by any user. Moreover,
utility is an abstract concept that can be more comprehensive than the fact that a user
has given a “like”. This further contributes to make the automatic prediction of tip
popularity much harder than in systems that offer a rating scale (e.g., 1 to 5), such as
Yelp and Epinions. Moreover, tips are also different from product reviews since they
may be active for much longer periods (e.g., a restaurant or an airport tip) and some
product reviews may become inactive when a new version of a the same product is
released.
Finally, the problem of predicting tip popularity is also inherently different
from other efforts to forecast the attention to other user-generated content, such
as tweets [Suh et al., 2010; Hong et al., 2011], news posts [Bandari et al., 2012],
videos [Borghol et al., 2012; Brodersen et al., 2012; Figueiredo et al., 2014b] or ques-
tions in a forum [Anderson et al., 2012; Li et al., 2012] which exploited mainly aspects
related to the user who posted the information or the content itself (e.g., category).
Unlike tweets, news, videos and questions, tips are associated with specific venues, and
tend to be less ephemeral (particularly compared to news and tweets), as they remain
associated with the venue (and thus visible to users) for a longer time.
1https://foursquare.com/
2http://www.google.com/+/learnmore/local/
3We will use the word tip and micro-review interchangeably in the text.
4 Chapter 1. Introduction
1.1 Basic Concepts
This research focus on the Foursquare micro-reviews, known as tips. Foursquare is
the most popular location-based social network (LBSN) where users can share their
current location with friends and followers through check-ins. Check-ins are performed
by users via devices with GPS (Global Positioning System) when they are close to a
specific (physical) location, named venue, which has an associated page in the system.
Thus, venues are virtual places that are grouped into a large variety of categories such
as airports, monuments or squares that represent real locations like John Kennedy
International Airport (New York, United States), Taj Mahal (Agra, India) or Eiffel
Tower (Paris, France).
In addition to check-ins, users can post tips about a given location on the corre-
sponding venue’s page. Foursquare tips are limited to 200 characters and may contain
more informal or subjective content. than longer reviews. They can be informative
(“This place opened in 2002”), contain some recommendation (“Try the Fettuccine”)
or even a report of users’ experiences (“Best place for tacos” or “Avoid lunch time”).
Unlike Yelp reviews, for example, which are much more extensive and often written
“after the fact”, Foursquare tips are usually written “during the moment” using a mobile
device, and thus tend to be more brief and direct, avoiding many details about specific
characteristics of the venue.
When visiting a venue’s page, users may assign a “like” mark to a previously
posted tip in sign of agreement with the tip’s content and/or intention to visit the
(physical) location with which the tip is associated. The aggregate number of likes
is an estimation of the tip popularity and the same metric is used by Foursquare for
ranking the tips in the venue’s page.
1.2 Dissertation Goals and Contributions
In this dissertation, we investigate how users exploit micro-reviews focusing particularly
on how the popularity of such pieces of content evolves over time and which factors
impact this evolution. Then, we explore how these factors can be combined to develop
popularity prediction models.
Towards that end, we focus our study on micro-reviews on Foursquare, also called
tips, which was responsible for the popularization of this feature [Vasconcelos et al.,
2012b], and is currently one of the most popular systems that support micro-reviewing.
We first characterize user interactions through tips, uncovering relevant user behavioral
patterns which impacts tip popularity. The understanding of such patterns is important
1.2. Dissertation Goals and Contributions 5
as they can provide useful insights into which features should be exploited in the design
of the prediction models as well as a better comprehension about the prediction results.
We then develop regression and classification methods that exploit the most important
factors to predict the future popularity of a tip as soon as it is posted, or at most, a
short period afterwards. We develop solutions to two different prediction tasks: (1)
predicting the popularity ranking of a set of tips, and (2) predicting the popularity
level of a tip, whereas the former focuses on the relative (future) popularity of a group
of tips, the latter tackles whether a particular tip will achieve a certain popularity level
in the future.
A ranking of the most popular tips can be used to summarize a large set of tips
focusing on the most popular ones for a scenario of interest. For example, a list of
tips with a greater potential to become popular posted at any venue in the user home
city. Moreover, to estimate the popularity of a single tip can benefit both users and
venue owners. For instance, the system can offer different filtering strategies to the
users based on the prediction, while venue owners can quickly react to opinions that
may have a greater impact on decision making.
Specifically, our investigation tackles the following three questions:
1. What are the most common user behavior and interaction patterns while using
micro-reviews? How does the popularity of a tip evolve over time? How is it
affected by the social network of the tip’s author? To which extent does the rich-
get-richer phenomenon impact the popularity evolution of tips? (Chapter 4)
First, we present a characterization of user behavior on Foursquare. Our analyses
were performed over a collected Foursquare dataset consisting of more than 1,5
million users, more than 6 million tips, and 5 million likes. This study consisted of
two main phases. First, we characterized venues and users with respect to number
of tips, number of likes and to-dos1 as well as percentage of tips containing links
(i.e., URLs or email addresses). We have also identified four groups of users with
different tipping behavior, including one that is consistent with spamming.
Using a larger Foursquare dataset, containing over 10 million tips and 9 million
likes posted by over 13,5 million users, we modeled the user interactions through
tips and likes using a graph to identify the most influential users. To that end, we
proposed a variation of the PageRank algorithm in which each arc of the graph is
weighted by the number of tips posted by each node (user). Using the modified
PageRank, we were able to identify users that were influential by consistently
1Users can also save tips in to-do lists.
6 Chapter 1. Introduction
receiving feedback in their tips. Moreover, we found users that were influential
in a given venue category, which suggests that the category of the venue must be
taken into account in the tip popularity prediction task.
Furthermore, we characterized how the popularity of different sets of tips evolves
over time, and how it is affected by the social network of the user who posted
the tip (its author). We observed that tips experience a very slow popularity
evolution, compared to other types of user-generated content (UCG), such as
news articles and photos. Moreover, the social network of the tip’s author has
an important influence on the tip popularity throughout its lifetime, but espe-
cially in earlier periods after posting. Compared to other types of UCG, such as
YouTube videos, we observe a weaker presence of the rich-get-richer phenomenon
in the popularity evolution of tips, suggesting that other factors, but the current
popularity, may significantly impact the tip’s future popularity.
2. Which are the most important factors for predicting the popularity of Foursquare
tips? How can we tackle the problem of predicting the future popularity of tips?
(Chapter 6.4 and 6)
We identified three important entities related to the Foursquare system that may
impact a tip’s popularity: the user who posted the tip, the venue where it was
posted, and its content. We investigated the potential benefits from exploiting
these aspects to predict the popularity of a tip (or a group of tips) will achieve
at a future time.
To that end, we considered two different tasks. The first prediction task aimed
at ranking a group of tips based on their predicted popularity at a given future
time. We exploited a regression model using the aspects from the three most
important entities as predictors. Moreover, we evaluated the stability of the tip
popularity ranking over time, assessing to which extent the current popularity
ranking of a set of tips can be used to predict their popularity ranking at a future
time. We then found that the set of features used in our model can improve the
prediction accuracy, given that enough training data is available.
The second prediction task is more challenging since it tackles the problem of
predicting the popularity level of a single tip1. We addressed this problem by
formalizing it as a classification task. Since over 80% of the tips received no
like at all, great part of this dissertation is focused on predicting the popularity
1The tip popularity level is defined according to a range of values which will be defined in Chapter
6
1.2. Dissertation Goals and Contributions 7
of a tip at posting time or when there is no information about the tip current
popularity. For that goal, we employed classification and regression methods
along with an extended set of features related to the tip author, venue and content
as predictors. We investigated the relative importance of each predictor variable,
finding that features extracted from both the user and the venue are among the
most important ones on Foursquare.
3. To which extent can we improve prediction by monitoring the tip for a short period
after posting? How do the prediction models behave as we predict further into
the future? Can we improve prediction accuracy by building specialized models?
(Chapter 6)
The slow popularity evolution of a Foursquare tip also raises a question as to
how robust our solutions are to long-term predictions. By monitoring the tips for
a certain time after their creation, we expect to add to our models information
about how its popularity is evolving, which may contribute to improve predic-
tions. We also investigated how far into the future we can predict tip popularity
with reasonable accuracy, that is, we analyzed how robust our prediction mod-
els are when we perform long-term predictions. Our intent is to analyze when
our models become less accurate since we expect that the accuracy drops as the
prediction are performed further into the future. We found significant improve-
ments in prediction accuracy as we extend the initial monitoring time, although
prediction accuracy may drop as we predict for more than 2 months ahead in the
future.
We also investigated whether factors related to a specific geographic region (e.g.,
city) or category of venue impact how the popularity of a tip evolves over time.
To that end, we built specialized prediction models using only tips posted in a
specific city or in a specific venue category, and compared to which extent such
models improve over the single general model. We found that model specialization
does bring some (limited) improvements if performed at the city-level, whereas
category-based specialization does not bring clear and consistent gains.
In summary, the key contributions of this dissertation are: (1) a solid under-
standing of how users exploit micro-reviews and the factors that impact the popularity
of such content on Foursquare; (2) the design of cost-effective methods to predict the
future popularity of micro-reviews. For content providers and system administrators
such prediction methods can be exploited to improve their systems in, for example,
the design of automatic review filtering, recommendation strategies and identification
8 Chapter 1. Introduction
of malicious behavior (e.g., spammers or detractors that can harm a user or business
reputation). For marketers or advertisers the popularity prediction is valuable since
a popular review may be tied directly to a product or service revenue which can be
estimated ahead in time and negotiated by all parties involved. Furthermore, the sys-
tem can provide incentives to users whose contributions increase the overall value of
content on the site. More broadly, the knowledge uncovered in our study may help un-
derstanding how the dynamics of the user community in the target application works.
1.3 Challenges
The prediction of the popularity of micro-reviews poses several challenges.
• New content type: tips have inherent characteristics that distinguish them from
other types of content and that might impact their popularity evolution. For
example, tips are associated with specific venues, and thus are visible to all users
that visit the venue, including those that are drawn to it by other reasons (e.g.,
other tips). Also, tips usually contain opinions that might interest others for
much longer periods of time than other types of content such as news and tweets.
Thus, tips may remain live in the system, attracting attention (and likes), for
longer periods.
• Content analysis : most of the previous proposed models to automatically esti-
mate the popularity or helpfulness of a review formulate the problem as clas-
sification or regression problem using observed features, i.e. textual or social
features. Textual features are usually related to the structure, syntactic, read-
ability and sentiment of the review’s content. However, most of tips do not follow
any formal structure (e.g., capitalizing the first letter, punctuations, truncated
sentences, etc.) and they may present an informal vocabulary (word abbrevia-
tions, emoticons, slangs, etc.) [Thurlow and Brown, 2003; Grinter and Eldridge,
2003; Thelwall et al., 2010]. These variations cause problems since most content
features rely on readability metrics that requires a good text structure. More-
over, typical sentiment analysis algorithms assume that the text is written on
standard spelling and grammar, so the current algorithms are unlikely to work
well on that scenario.
• Feature selection: popularity can be affected not only by the differences in tip
content, but also because of a multitude of factors with complex and unknown
interactions such as users, venues, the system interface and the rich-get-richer
1.4. Organization of this Dissertation 9
effect. Thus, one of our main challenges is to formalize these factors and assess
the impact that they have on tip popularity.
• Data sparsity : both tip and user interactions are extremely sparse, as will be
shown in Chapter 4. Our analyses of the features related to the main entities
– user, venue and tip’s content – revealed that most of them exhibit very large
variability, with great concentration on few users, venues and tips. For instance,
according to our analyses, 49% of the tips were posted by only 10% of the users.
Such very skewed distribution of number of likes per tip brings technical chal-
lenges to the prediction task modeling [He and Garcia, 2009; Liu et al., 2009] (e.g.,
severe class imbalance when predicting the popularity level of tips). Moreover,
over 80% of the tips used in our experiments received no like at all, which means
that these tips have no information about their popularity. Thus, this limits the
effectiveness of the state-of-the-art methods that uses early measurements, since
they cannot be applied to these tips and shows the robustness of our model since
we are able to perform predictions at posting time.
• Model evaluation: to evaluate our prediction models, our dataset has been chrono-
logically split into training and test sets. However, this splitting schema has more
restrictions than the cross-validation schema which may also cause the class im-
balance problem.
1.4 Organization of this Dissertation
This dissertation is organized as follows. In Chapter 2, we survey the literature on four
topics closely related to our present study, namely information credibility, prediction of
review helpfulness and of popularity of online content as well as location-based social
network. Chapter 3 introduces the main elements and features of Foursquare. It also
describes our crawling methodology as well as a summary of the collected datasets.
Next in Chapter 4, we present a characterization of tips usage aiming to identify rel-
evant user profiles and obtaining valuable insights into prediction models. We also
analyze the dynamics of tip popularity in this chapter. Chapter 6.4 describes how we
addressed the first prediction task (the ranking task), and the features selected for the
ranking experiments. Chapter 6 describes our investigation of the second prediction
task (the classification task), and the models developed to predict the popularity level
of a tip. We also analyze the impact in the accuracy of our models when varying mon-
itoring time and prediction target time as well as the impact of model specialization.
10 Chapter 1. Introduction
Finally, Chapter 7 concludes this dissertation, presenting some directions for future
work.
Chapter 2
Literature Review
In this dissertation, we study the popularity of micro-reviews. We use the number of
likes received as our measure of the tip’s popularity. Moreover, the same measure can
be seen as an estimation of the tip helpfulness or quality since it reflects the number
of people who found a tip useful and indirectly assessed its quality. Thus, there are
two groups of studies related to our problem: one aims at assessing the helpfulness or
quality of reviews, whereas the other tackles the prediction of the popularity of online
content.
Most previous work on assessing the helpfulness of reviews has typically focused
on automatically determining the quality, helpfulness or utility of reviews using tex-
tual features. However, such features are more appropriate for longer and formally
structured reviews. Micro-reviews are shorter than traditional reviews, usually having
length constrained to around 200 characters in order to be published and read on a
variety of platforms. This size constraint has led users to write reviews using non-
standard textual artifacts (e.g., emoticons) and informal language [Bermingham and
Smeaton, 2010].
Moreover, in some micro-reviews systems, such as Foursquare, the micro-reviews,
known as tips, are rated by other users only by simply marking them as “liked”, as op-
posed to in other review systems where reviews are rated through rating or helpfulness
votes. Likes are not as informative as ratings, since the lack of these marks cannot be
seen as an unhelpfulness signal. These aforementioned factors make the prediction of
the helpfulness or popularity of micro-reviews a challenging task. To the best of our
knowledge, no previous study has tackled the popularity of micro-reviews, but there
are several threads of related research which we will review in this chapter that have
guided us during our model design.
We start by briefly discussing studies on information credibility in Section 2.1.
11
12 Chapter 2. Literature Review
We conjecture that the helpfulness or popularity of a micro-review can be influenced
by the perceived credibility of the reviewer. Based on this conjecture, some of our
proposed features (see Chapter 6) are inspired by some of these credibility directives.
Next, in Section 2.2, we discuss previous analyzes of the quality of various types of
user generated content with particular focus on the assessment of the quality of online
reviews. In this section we also briefly review other related efforts towards automati-
cally detecting the polarity (positive, neutral, negative) of online reviews and detecting
spam of fake reviews. Such studies can be considered complementary to this disserta-
tion. For example, we use sentiment scores as content features exploited as input to our
popularity prediction models (Chapter 6). Moreover, as a result of our characterization
of user tipping activity, we were the first to uncover evidence of spamming activity on
Foursquare (Chapter 4).
From the perspective of predicting the popularity of online content, we also survey
recent work on popularity prediction models and on information propagation and social
influence in Section 2.3. These proposed models are highly influenced by the target
application and by the type of data (e.g., number of video views, number of retweets,
number of digg votes) which makes unfeasible the creation of a generic prediction model
[Tatar et al., 2011]. However, some of our analyses and proposed features are inspired
and/or adapted from these previous work. Finally, we present previous analyses of
location-based social networks (LBSNs) in Section 2.4.
2.1 Information Credibility
The Web 2.0 has empowered users to express their opinions by interacting with others
through social networks and by publishing a wide variety of user-generated content
such as blogs, online forums, product or service reviews, among others. However,
not all information available on the Web is credible or comes from reputable sources.
Credibility affects how customers perceive the quality of online services and influences
their decision-making processes.
In one of the first studies on the source of information credibility, Hovland and
Weiss [1951] designed an experiment in which news stories with identical content were
presented to volunteers as coming from two different sources (i.e., high-credibility or
low-credibility). They identified the perceived “expertise” and “trustworthiness” of
the source as factors impacting credibility. Fogg et al. [2001] defined credibility as
believability, applying the two dominant factors as expertise and trustworthiness to
identify credibility on websites. They designed a questionnaire to determine which
2.2. Predicting the Quality of User Generated Content 13
factors have the greatest impact on user’s perceptions of credibility. They found that
the evaluated web site attributes fall into seven dimensions: five of them increase
perceptions of credibility (real-world feel, ease of use, expertise, trustworthiness, and
tailoring) while the other two contribute with negative perceptions of credibility (the
commercial implications of the site, and amateurism). This study was performed for
websites, but we use these dimensions to guide our choice of some features used in our
popularity prediction model.
Credibility has also been analyzed in the social media domain. For the task
of exploring trending topics on Twitter, Castillo et al. [2011] studied the information
credibility of topics defined by a set of tweets. They used Amazon Mechanical Turk to
gather user judgments about the credibility of a tweet, and extracted the most relevant
features from each topic. They defined a complex set of features over messages, users,
topics and propagations which were used to build a classifier to automatically assess the
level of credibility of the topic. Based on a credibility framework for blog post retrieval
proposed by Rubin and Liddy [2006], Weerkamp and de Rijke [2012] defined two groups
of credibility indicators, namely post-level (e.g., spelling, timeless, post length) and
blog-level (e.g. regularity, expertise, comments) indicators. Concerning online reviews,
several studies were developed to automatically assess their helpfulness considering
various credibility indicators. Those studies are discussed in the next Section.
2.2 Predicting the Quality of User Generated
Content
We start by discussing quality in different contexts, and then focus on online reviews,
our target domain. Our research is inspired by several previous studies that focused on
analyzing the quality of socially generated content, including the quality of Wikipedia
articles [Dalip et al., 2011, 2014], video or news comments [Siersdorfer et al., 2010; Hsu
et al., 2009; Chen et al., 2011], and user-contributed answers on community question
answering (CQA) forums [Anderson et al., 2012; Li et al., 2012].
Dalip et al. [2011] used Support Vector Regression (SVR)[Drucker et al., 1997] to
estimate the quality of articles in collaborative digital libraries (e.g., Wikipedia) using
features related to the text structure, citation network and article revision history. The
same authors also studied the impact of feature selection on a multi-view algorithm
for assessing quality in collaborative encyclopaedias [Dalip et al., 2014]. In [Siersdorfer
et al., 2014], the authors proposed a Support Vector Machine (SVM) based model to
predict the acceptance by the user community of a comment posted on YouTube and
14 Chapter 2. Literature Review
Yahoo! News. The proposed model uses a term-based representation of comments (TF-
IDF or term frequency – inverse document frequency) to automatically classify them
as likely to obtain a high overall rating or not. With a similar goal, Hsu et al. [2009]
proposed an SVR-based model to rank comments posted by users on Digg based on
their quality. They exploited features such as the comment posting time, the number of
articles submitted, and comment length. Chen et al. [2011] focused on user reputation
in comment rating environments (Yahoo! News and Yahoo! Buzz). They showed that
the quality of a comment judged editorially is almost uncorrelated with the ratings that
it receives, but can be predicted using standard text features (e.g., length, spelling, and
readability scores). Closely related to our target problem of estimating the popularity
of a (micro-)review is the problem of predicting if a question will have long lasting
value. With that particular goal, Anderson et al. [2012] demonstrated that features
that mapped the user activity related to a question (e.g., pageviews) within a short
interval after it was posted can help predict the number of page views that a question
will receive. Li et al. [2012] investigated the quality of questions in CQA services,
defined by a combination of the following features: the number of tags-of-interest
(reflecting the attractiveness of a question), the number of answers, and the amount
of time for getting the best answer. They also proposed a mutual reinforcement-based
label propagation algorithm to predict the quality of a question using features of the
question’s text and of the askers profile. Finally, Momeni et al. [2013] developed a
classifier for predicting useful comments on YouTube and Flickr exploiting, for that
task, not only textual features, but also features that describe the author’s posting and
social behavior, such as the number of links posted and the size of the author’s social
network.
In this dissertation, we also apply regression methods used in some of those
studies, particularly the SVR method. However, we applied these techniques in a
novel context, using other sets of features, to automatically predict the popularity of
micro-reviews.
2.2.1 Helpfulness of Online Reviews
We now turn to previous efforts to predict the quality (helpfulness or utility) of online
reviews, which are more closely related to our work.
The quality of reviews can have a significant impact on purchase decisions for
future customers. Many currently popular websites, such as Amazon, Epinions and
Trip Advisor, provide mechanisms for users to give some feedback (or review) about
products or services provided. The large amount and wide variability of quality of
2.2. Predicting the Quality of User Generated Content 15
the reviews available in some websites motivate the use of filtering, reputation and
personalized recommendation mechanisms to help users to find useful reviews [Kim
et al., 2006; Hsu et al., 2009; O’Mahony and Smyth, 2009]. Indeed, some websites,
such as Amazon, allow users to indicate whether he/she finds a review helpful. These
meta ratings help users filter relevant reviews more efficiently [Siersdorfer et al., 2010],
and summarize a general opinion about a product or service. However, such type of
feedback is still sparse, with many reviews, especially the most recent ones, failing
to attract any feedback. This problem has inspired several research studies about
automatic prediction of the quality of reviews. The task of assessing the quality [Liu
et al., 2007; Lu et al., 2010; Yu et al., 2010], utility [Zhang and Varadarajan, 2006; Liu,
2010] or helpfulness [Kim et al., 2006; Zhang and Tran, 2008; O’Mahony and Smyth,
2009; Tsur and Rappoport, 2009; Korfiatis et al., 2012; Ngo-Ye and Sinha, 2012] of a
review, is typically addressed by employing classification or regression-based solutions
using a set of observed features, often textual features, as predictors, and the users’
votes as ground-truth.
Liu et al. [2007] identified three types of biases in the Amazon review ranking
system. The first type was observed through an imbalance voting pattern, where
users tend to value others’ opinions positively more often rather than negatively. The
second one, the rich-get-richer effect, named as winner circle bias by the authors, was
characterized by a larger amount of votes accumulated by the top reviews, while the
third type (early bird bias) observed a clear trend that the earlier a review is posted,
the more votes it will get. Moreover, the authors proposed an SVM-based approach to
detect low-quality reviews, based on a manually determined ground-truth in accordance
with proposed set of specifications for judging the quality of a review. However, the
proposed model is based only on features suitable for longer (e.g., more verbose) and
structured reviews (e.g., number of positive sentences, number of product features or
brand names in the review).
Similarly, Danescu-Niculescu-Mizil et al. [2009] found that the perceived helpful-
ness of a review depends not only on its content, but also on the relation of its score to
other scores. The authors investigated the dependency between helpfulness of product
reviews from Amazon users, and concluded that users tend to consider reviews that
agree with the average item rating as helpful. We attempt to capture similar trend in
our models by using features related to the specific venue where the tip was posted,
including characteristics of previously posted tips.
O’Mahony and Smyth [2009] proposed a classification-based approach to rec-
ommend helpful reviews in TripAdvisor, using features related to the user reviewing
history, as well as the scores previously assigned to the hotels by the users. As an
16 Chapter 2. Literature Review
extension of that work, the same authors considered structural features (e.g. ratio
of uppercase characters, number of words, etc.) and readability features (e.g. scores
indicating the difficult in reading the text) to develop a classification technique to au-
tomatically identify the most helpful reviews [O’Mahony and Smyth, 2010]. Korfiatis
et al. [2012] observed that review readability has a greater effect on the helpfulness
ratio of a review than its length. We make use of some of the features defined by these
three studies in our model, but we extend them by using social network information
and features capturing the sentiment (or polarity) of the micro-reviews.
Other studies have considered estimating the helpfulness of a review using regres-
sion models. Basically, these studies aim at ranking the reviews by their helpfulness
score (defined by the fraction between positive and negative votes) or at estimating
their average rating, which is usually a real value between zero and five. Ghose and
Ipeirotis [2007] studied the economic impact of online reviews using product reviews
from Amazon. They proposed two mechanisms for ranking product reviews: a con-
sumer oriented ranking mechanism which ranks the reviews according to their expected
helpfulness, and a manufacturer-oriented ranking mechanism which ranks the reviews
according to their expected effect on sales. Their experimental results showed that
subjectivity analysis can give useful clues about the helpfulness of a review and about
its impact on sales.
Zhang and Varadarajan [2006] found that syntactic features, such as number of
proper nouns, comparatives and modal verbs extracted from the text reviews, are the
most effective predictors for SVR and linear regression to predict utility of a product
review. They observed that the perceived utility of a product review highly depends
on its linguistic style. Kim et al. [2006] also used SVR to rank reviews according to
their helpfulness, exploiting textual features such as length and the unigrams1, and
the rating score given by the reviewers. They concluded that the review length and
the number of stars in product rating were the most useful features for the regression
model. Liu et al. [2008] proposed a non-linear regression model that incorporated
the reviewers’ expertise, the review timeliness, and its writing style for predicting the
helpfulness of movie reviews. They found that timeliness was a good predictor as the
general helpfulness of a movie review declines for older reviews. Reviewer expertise was
also found to be a useful feature, motivating the exploration of features that effectively
describe user preference. The authors also used their proposed regression model as a
classifier to retrieve only reviews having a predicted helpfulness higher than a certain
threshold. In this dissertation, we also use regression methods to classify a tip into
1Unigrams are defined as the TF-IDF statistic of each word occurring in the review.
2.2. Predicting the Quality of User Generated Content 17
multiple levels of popularity based on their predicted popularity. However, the textual
features proposed in [Liu et al., 2008] are once again more suitable for longer reviews
and the timeliness factor is not observed in our Foursquare dataset.
T.Ngo-Ye and Sinha [2014] compared several text regression models for predicting
the number of people who would find a review helpful, using datasets from Amazon
and Yelp. Their proposed models exploit the words extracted from reviews and the re-
viewer engagement characteristics such as reputation, commitment and current activity
as input features. The authors found that incorporating features capturing reviewer’s
engagement and using a subset of unique review words selected by a dimension reduc-
tion method (Correlation-based Feature Selection) help predict review helpfulness. We
used some of these reviewer’s engagement features such as frequency (number of reviews
written before the current review) and monetary value (average number of helpfulness
votes received for all her previous posted reviews) as predictors in our model. We also
exploit the words micro-review’s content by taking their sentiment as predictors.
Tsur and Rappoport [2009] proposed an unsupervised method to rank book re-
views according to their helpfulness. Their method works in stages: first, the algorithm
identifies the terms that are less frequent, but contributes more information that is rel-
evant to a specific product (dominant terms). These terms will constitute the core of a
virtual optimal review. The reviews are then converted to a feature vector representa-
tion defined by the terms in the virtual core, and ranked according to their distances
from the core. Martin and Pu [2014] developed a method to predict the helpfulness of
a review using emotion features extract from the review text. The authors based their
study on three product review datasets (Yelp, TripAdvisor, and Amazon) and used a
general lexicon of emotion words (GALC [Scherer, 2005]) to extract words that con-
vey emotions to the readers. They applied supervised classification algorithms (SVM,
Naïve Bayes, and Random Forest) to estimate if a given review is helpful or not. The
authors’ framework showed an improvement up tp 9% when compared to models that
use only text statistics or readability features.
Lu et al. [2010] exploited contextual information about the authors’ identities and
social networks for improving the prediction of review quality in Ciao, a community
review web site. They proposed a generic framework for incorporating social context
information by adding regularization constraints to a text-based predictor. Their re-
sults show that adding social context as additional features can improve predictions
significantly over text-based predictions, but none of them outperforms the combined
model (using both textual and social features) when there is sufficient amount of train-
ing data. Hong et al. [2012] built a classification system to automatically assess review
helpfulness based not only on textual features but also on features that represent the
18 Chapter 2. Literature Review
user preferences. Such features capture if a review has attributes that the user prefers
to know, whether the user who wrote the reviews was a buyer of the product, and
the divergence of the polarity of the review from the mainstream opinion. Moghad-
dam et al. [2012] proposed a series of probabilistic factorization models to address the
problem of personalized review quality prediction. Their models are based on the as-
sumption that the observed review ratings depend on latent features of the reviews,
reviewers, raters, and products. Lee and Choeh [2014] used neural networks to predict
helpfulness of Amazon reviews. They found that characteristics of the product such as
its list price, its sales rank, and textual characteristics of reviews such as the average
number of words in a sentence, the number of words, and the number of one-letter
words in the review are important for estimating helpfulness. Tang et al. [2013] an-
alyzed various types of social context (i.e. author, rater, connection and preference
contexts) to predict unknown helpfulness ratings of reviews using matrix factorization
based methods. As in Moghaddam et al. [2012], the authors claim that the helpfulness
of a review is not necessarily the same for all users. Moreover, the dual role of a user
(i.e. author and rater) must be considered as separated contexts. We adapted some of
these features to our domain.
In sum, those prior studies are based mostly on content features, which are suit-
able for more verbose and objective reviews, and thus may not be adequate for predict-
ing the popularity of tips, which tend to be more concise and subjective. Moreover,
previous studies did not address how helpfulness as perceived by users (or popularity)
of reviews evolve over time, as we do in this dissertation.
2.2.2 Opinion Mining
Other related studies on customer reviews focus on opinion mining, particularly on
classifying a review as positive or negative based on sentiments of the reviewers cap-
tured by the textual content. For example, Pang et al. [2002] analyzed several su-
pervised classification approaches using different sets of features including unigrams,
bigrams, adjectives and part-of-speech tags to classify the sentiment of movie reviews.
Bermingham and Smeaton [2010] compared the performance of supervised classifiers
(Naïve Bayes, SVM) and an unsupervised lexicon-based classifier on microblog data
and reviews. These two studies motivated the use of the part-of-speech tags and the
same lexicon (SentiWordNet) used in [Bermingham and Smeaton, 2010] as features in
our popularity prediction models. Moraes et al. [2013b] evaluated the effectiveness of
four methods for automatic polarity detection of Foursquare tips: SVM, Naïve Bayes,
Maximum Entropy and one unsupervised method based on SentiWordNet. The ex-
2.2. Predicting the Quality of User Generated Content 19
perimental results showed that the unsupervised approach produced results that were
statistically tied to those of the best supervised method (Naïve Bayes) without the
cost of labeling. We use the same steps of the unsupervised approach to generate the
scores used in our sentiment features. Gonçalves et al. [2013] compared eight popular
sentiment analysis tools for social networks in terms of coverage (i.e., the fraction of
messages whose sentiment is identified) and agreement (i.e., the fraction of identified
sentiments that are in tune with ground truth). The authors found that the methods
have varying degrees of coverage and agreement and that no single method is always
best across different text sources. Moreover, the problem setting in these studies differs
from ours as we use the sentiment of the review as a feature to predict its popularity
rather than trying to predict the sentiment itself.
Nguyen et al. [2013] proposed a heuristic to select a small set of reviews that
cover as many tips as possible with as few sentences as possible. By covering the
tips, the authors expected to identify the review content that is more important to
provide a summary of the content of the tips. They claimed that tips are good for
quick zooming in on what is interesting about a item. However, when there is a large
collection of tips, they may be repetitive and fragmented. Thus, the authors claimed
that by selecting the reviews that cover the tips, they would obtain a readable, flowing
text that would summarize and expand upon the tip content. The problem tackled by
Nguyen et al. [2013] differs from our target problem as we are not aiming at selecting
tip for summarization purposes.
2.2.3 Spam Detection
The identification of spam or fake reviews was also analyzed in a few prior studies.
Spam reviews have distinct characteristics from low quality reviews [Li et al., 2011].
Low quality reviews may be biased and/or may be due to poor writing, but they
reflect the user’s real opinion. A spam review, on the other hand, may be fraudulent,
and is often added to the review system with a clear intention or goal to achieve [Ma
and Li, 2012]. Jindal and Liu [2008] studied opinion spam in the context of product
reviews. They identified three types of spam reviews: untruthful or fake reviews, which
give undeserving positive reviews to promote some target objects or malicious negative
reviews to defame some other object reputation; reviews on brands only, which do not
comment on the products but only on the manufacturers or sellers of the products;
and finally non-reviews, which can be advertisements or other irrelevant pieces of text
containing questions, answers or random texts. The second and third types of spam
reviews were detected using a supervised learning technique using manually labeled
20 Chapter 2. Literature Review
training examples while the first spam type was detected by verifying whether reviews
involved many opinions opposing to the majority of the other reviews. Lappas [2012]
presented a study of fake reviews from the perspective of the attacker, formalizing
the factors that determine the success of an attacker and exploring different attack
strategies. Akoglu et al. [2013] proposed an unsupervised network-based framework
to detect fraudulent users and fake reviews in online review networks using textual
features and the reviewers’ social networks as input features. Lin et al. [2014] proposed
six features to find the spam based on the review content and reviewer behaviors. They
applied supervised and unsupervised methods to identify the review spam as early as
possible.
In this dissertation, our characterization study (Chapter 4) revealed the presence
of spamming activity in Foursquare tips. Specifically, we revealed the existence of
users who post tips whose contents are unrelated to the nature or domain of the venue
where the tips were left [Vasconcelos et al., 2012b]. We discuss this further in Chapter
4. Indeed more recent studies analyzed this problem using machine learning techniques
to detect user behavior related to tip spamming in LBSNs [Costa et al., 2013; Aggarwal
et al., 2013].
2.3 Analysis of Online Content Popularity
Broadly related to our prediction task of popularity of micro-reviews is the work of
assessing the popularity of online content. We review prior efforts in this direction by
first describing, in Section 2.3.1, studies about popularity prediction models in several
systems such as Twitter, YouTube, Digg, Boards.ie (community forums), and Facebook.
Next, in Section 2.3.2, we discuss prior studies on information diffusion models using
both explicit (created from users’ contacts) [Leskovec et al., 2007; Bakshy et al., 2009]
and implicit (created by the users’ interactions) network links [Gruhl et al., 2004] as
well as studies on the identification of influential or experts users [Zhang et al., 2007;
Adamic et al., 2008; Agarwal et al., 2008; Cha et al., 2010; Bakshy et al., 2011].
Inspired by some of those prior studies, we investigate the properties of the im-
plicit network built from the user interaction through tips, and propose a method to
identify influential users on Foursquare (Chapter 4). In our context, influential users
can be seen as reputable or experts users, who post high quality or helpful reviews and
are highly rated by other users. Therefore, popularity here can be considered as an
implicit measure of assessing credibility of users and tips [Abbasi and Liu, 2013].
2.3. Analysis of Online Content Popularity 21
2.3.1 Popularity Prediction Models
Several studies have addressed the problem of predicting the popularity of newly up-
loaded content. Most studies exploited textual features extracted from the messages
(e.g., hashtags and URLs) or the topic of the message, as well as user related fea-
tures, such as the number of followers and the source of the message (celebrities or
organization) to predict content popularity in several systems.
For example, in the context of Twitter, Hong et al. [2011] tackled the problem
of predicting the popularity of tweets as a classification task based on several types
of features, including textual content, structural properties of the user graph, meta-
data of users and messages (e.g., number of previous retweets), as well as temporal
information. Suh et al. [2010] built a predictive retweet model using a generalized
linear model with content and contextual features. They also identified that amongst
content features, URLs and hashtags are strongly correlated with retweetability, while
the number of followers and followees as well as the age of the user account are among
the most important contextual features. Bandari et al. [2012] used regression and
classification algorithms to predict the number of times a news URL was posted and
shared on Twitter. They exploited features extracted from the news article, such as the
source of the article, the category, subjectivity of the language, and the named entities
mentioned in the article. Similarly to our work, Hong et al. [2011] and Bandari et al.
[2012] also defined classes or levels of popularity, and developed solutions to predict
which class a given tweet or article will belong to at a certain future time.
Borghol et al. [2012] developed and applied a methodology to assess the impact
of various content-agnostic factors on the popularity of YouTube videos. They focused
on analyzing differences among videos that have essentially the same content (clones)
using a multi-linear regression model to determine which factors most influence video
popularity. In that study, popularity was defined by the number of views during a
given week. Our methodology has some similarities with the one adopted in [Borghol
et al., 2012]. For example, we model several other factors such as the influence and
activity of the user social network as well as specific Foursquare characteristics related
to the venues which have no counterpart on YouTube.
Szabo and Huberman [2010] proposed a log-linear model for predicting long-
term popularity of YouTube and Digg content based only on early measurements of
user accesses. The authors found that long time popularity of a piece of content is
correlated with its early measured popularity. However, according to Yin et al. [2012]
and to what we will show in Chapter 6 prediction suffers inaccuracy if it is purely based
on the number of early measurements. Finally, they concluded that social network does
22 Chapter 2. Literature Review
not affect the content exposure, which contrast to our findings (Chapter 4). Pinto et al.
[2013] extended the simple log-linear model proposed by Szabo and Huberman [2010]
by building multiple linear regression models to predict the video popularity. Unlike
the base model [Szabo and Huberman, 2010] which used the total number of views
up to a reference date as single predictor, the proposed multivariable model uses daily
views during the same period1. The authors also proposed a second model variant
that includes, in addition to daily views, Radio Basis Functions (RBF) to capture the
similarity between training and test data. The authors found that their RBF model
leads to an accuracy gain of 71% over the model proposed by Szabo and Huberman
[2010]. In Chapter 6, we used these two models, the log-linear model Szabo and
Huberman [2010] and RBF models [Pinto et al., 2013] to compare with your proposed
models.
Lerman and Hogg [2010] proposed a stochastic model to predict the popularity
(number of votes) of user’s post generated on Digg based also on early user reactions
to this new content. Their model considers the complex interactions among content
quality, the layout of the website, and the influence among users.
Yin et al. [2012] proposed a model to rank potentially popular items based on
their early votes. The authors evaluated their model using a joke sharing application,
where users can post jokes, and other users can vote if they like or dislike them.
Their model assumes users tend to be conforming to the opinions of the majority in
the user community (conformers) while some others exhibit contrary voting behavior
(mavericks). Each person has different distributions of these two patterns, which can
be learned according to the observed voting history. The authors pointed out that
their model is more suitable for application in which people patterns’ distributions
tend to be stable for items without complex genres (e.g., jokes) as opposed to items
with multiple genres (such as movies). This method may not work in our scenario
since the users are not allowed to mark a tip as disliked on Foursquare. Moreover, the
approach proposed by Yin et al. [2012] is based on ranks of new items while our goal
is not only to rank but also to predict the potential popularity level that an individual
item can achieve. Our proposed models explore other types of features and scenarios,
in which the monitoring time is variable. In particular, we analyze the tip popularity
at posting time, while the above studies require early votes to perform predictions.
Wagner et al. [2012] studied the patterns of user attention towards content shared
within online communities, where attention was measured by the number of replies to
a given post. One of their findings was that the purpose of a community may influence
1In this model, each variable represented the number of views during a given measured day.
2.3. Analysis of Online Content Popularity 23
how individual factors affect the attention pattern of that community. For example,
posts from advice-seeking communities which contain many links in their contents are
less likely to get replies, while posts from content sharing oriented communities which
typically have a high number of links may have a positive impact and make posts more
likely to attract the attention of such a community. They also concluded that the
factors that impact whether a discussion starts tend to differ from factors that impact
the length of the discussion.
Yu et al. [2011] analyzed the popularity of social marketing messages on Face-
book. Using the number of likes to measure the popularity of a message, the authors
evaluated the effectiveness of marketing strategies used by a number of messages from
restaurants, analyzing only their textual content. The messages were grouped into
“more popular” (number of likes above average) or “less popular”, and modeled using
a bag-of-words representation. Two classification methods (SVM and Naïve Bayes)
were used to separate messages into the two popularity classes, and rank the most
discriminative features. There are some major differences between this work and our
proposal. First, their method is limited only to textual content while we also make
use of several features related to the user who posted the tip and the venue where the
tip was posted. Second, their approach using bag-of-words may be not effective for
short and informal messages such as Foursquare tips. Finally, the authors suggest to
overcome the lack of dislike votes using the comments left by the users to disclose both
positive and negative sentiment about the posted marketing message. In our work, we
take an alternative approach and make use of the SentiWordNet scores of each tip term
to capture the polarity of its content.
Tatar et al. [2014] addressed the problem of predicting the popularity of new
articles based on user comments, formulating it as ranking problem. The authors
compared the ranking effectiveness of two prediction methods proposed by Szabo and
Huberman [2010], a linear regression model on a logarithmic scale and constant scaling
model. These methods were compared with several baselines methods and with learning
to rank algorithms. Their results indicate that the linear log popularity model is as
effective as the learning to rank algorithms. We also approach our prediction problem
as a ranking problem in Chapter 6.4. Other efforts focused in the popularity prediction
exploring temporal trends and time series models. Radinsky et al. [2013] developed
methods for modeling the temporal dynamics of queries and click behaviors seen in
a large population of Web searchers. They explored several facets of the dynamics
of Web search behavior, including the detection of trends, periodicities, and surprises
by using current and past user behavioral data. Matsubara et al. [2012] proposed a
unifying model for popularity evolution of blogs and tweets, showing that it can be used
24 Chapter 2. Literature Review
for tail-part forecasts while Yang and Leskovec [2011] developed a clustering algorithm
to uncover the temporal dynamics of Twitter hashtags. As future work, we intent to
explore time series techniques in our popularity prediction problem.
All these previous efforts towards predicting the popularity of a online content
have a similar goal compared to ours. However, our type of data, micro-reviews,
does not fit completely in these models. As mentioned before, some of them have
assumptions that hold only for contents with shorter life cycles (e.g., tweets or news)
or features that do not have a counterpart on Foursquare, such as dislike votes.
2.3.2 Information Propagation and Social Influence Models
An important aspect that may affect the popularity of a content is the social influence
or the fact that a individual may not make decisions independently, but rather are
influenced by the behavior of other individuals. There are some contrasting theories
or views about how an idea, a trend or an innovation can be spread or assimilated by
people.
One of the views on social influence is based on a theory called “the two-step
flow of communication”, in which ideas often flow from the mass media to a group
of individuals (opinion leaders), who are very persuasive or well-connected, and from
those to the social groups they belong to [Katz, 1957]. Since this seminal work, the
opinion leaders have been the subject of several studies [Gladwell, 2002; Walther et al.,
2010; Wu et al., 2011a]. Moreover, technology changes with the emergence of new
forms of media such as blogs, online communities and social networks have caused the
fragmentation of the mass audience into many smaller audiences. Nowadays, people
can select the information which they want to be exposed to and in some cases people
can generate new information themselves [Wu et al., 2011a]. The media fragmentation
made the traditional adverting strategies less effective. Consequently, marketers have
turned their attention to other marketing strategies (e.g., word-of-month, viral and
buzz marketing) that focus on opinion leaders.
Recently, there have been several research efforts focused on analyzing the inter-
play between social structure and information dissemination in real networks. Gruhl
et al. [2004] studied the dynamics of information propagation in weblogs. They inves-
tigated characteristics of this propagation considering the topic discussed in the posts
and the individuals’ posting behavior. Leskovec et al. [2007] used a person-to-person
recommendation network on a e-commerce website to study how individuals are influ-
enced as a function of how many of their contacts have recommended a product. They
also presented a model that identifies communities, products and pricing categories for
2.3. Analysis of Online Content Popularity 25
which viral marketing is effective. Bakshy et al. [2009] studied diffusion of “gestures”
between friends in the social network of the Second Life virtual game. By examining
the cascading trees, they found that roughly 48% of data transfers occur along the so-
cial graph. We observed similar results in the high percentage of liking activity coming
from the tip author’s friends or followers, as discussed in Chapter 4.
Other studies explored algorithms using network structure to identify experts
or influential users. For example, Zhang et al. [2007] discussed several approaches
to identify and rank experts in a Java forum, using network-based algorithms, such
as PageRank [Page et al., 1998] and HITS [Kleinberg, 1999]. Adamic et al. [2008]
investigated Yahoo! Answer forums using network and textual analysis finding that
user interactions vary depending on the forum topic. They also found that some
categories of Yahoo! Answer forums are characterized by the presence of experts while
others have a different dynamics. We employed some of the network analyses performed
in that work in our characterization of tipping activity in Foursquare (Chapter 4)
aiming at identifying influential users.
Agarwal et al. [2008] proposed a graph-based algorithm to identify influential
bloggers based on their blog posts. The authors defined four features: recognition,
activity, novelty, and eloquence which they expected to be present in an influential
post. They weigh these four features to produce a combined score for each blogger.
Cha et al. [2009] studied the spread of photo bookmarks on Flickr and found that social
links are a dominant method for information propagation. Moreover, their results show
that even popular photos do not spread neither widely nor rapidly through the Flickr
network, contrary to viral marketing intuition.
Another view on social influence says the information diffusion process depends
less on the properties of the influentials and more on the global conditions of the net-
work. Watts and Dodds [2007] investigated this hypothesis using a series of computer
simulations of several social network configurations. They found that, in general, most
of the social changes are driven not by highly influential people but rather by easily in-
fluenced individuals who influence other easily influenced individuals, and so on. Thus,
a epidemic outbreak of social influence satisfies what the individuals are asking for it
(susceptibility) and not because someone is pushing for it [Smith, 2013]. Whether or
not influence can spread widely depends mostly on the network structure. Therefore,
if the network permits spread (global cascades), virtually anyone can start disseminat-
ing some information. The individuals who later seemed to be influential may simply
be accidents of circumstances according to the author’s study. Recently, Figueiredo
et al. [2014a] used an epidemic model to capture revisits by the same user and the
impact of external events on the popularity evolution of objects in social media appli-
26 Chapter 2. Literature Review
cations. Cheng et al. [2014] examined the problem of predicting the growth of cascades
over social networks. Using a month of complete photo-resharing data from Facebook,
the authors found that the temporal features (e.g., cascade speed) are predictive of a
cascade’s eventual shape.
Other studies focused on user influence on Twitter. The Twitter graph, where
the nodes represent users and edges represent following relationships, has been exten-
sively used to predict influence on this social network [Bakshy et al., 2011; Weng et al.,
2010; Cha et al., 2010; Quercia et al., 2011]. The studies have shown that influentials
are not accidental, but they have some specific characteristics, such as the presence of
homophily between the follower relationship [Weng et al., 2010] great personal involve-
ment [Cha et al., 2010], the low degree of passivity of the followers [Romero et al., 2011],
or specific linguistic qualities that reflect the person’s personality and mood [Quercia
et al., 2011].
Bakshy et al. [2011] studied the distribution of retweet cascades on the propa-
gation of influence, and explored various marketing strategies governed by the cost of
identifying the influential users. Weng et al. [2010] proposed a new PageRank-like al-
gorithm to measure the topic-sensitive influence of a Twitter user, comparing it against
ranks based on the number of followers and based on the traditional Page Rank. Their
algorithm was motivated by an observation of high reciprocity among follower relation-
ships in their dataset, which they attributed to homophily. We also adopt a similar
method based on PageRank algorithm to identify influential users on Foursquare based
on a graph built from the user interaction through tips (Chapter 4). Cha et al. [2010]
compared three metrics of influence, namely, number of followers, number of retweets
and number of mentions, concluding that popular users with large number of followers
are not necessarily influential in terms of spawning retweets or mentions. They also
concluded that most influential users can hold significant influence over a variety of
topics. Finally, Quercia et al. [2011] investigated the connection between the use of
the language, in particular the expressed sentiment on the user’s tweets, and the in-
fluence of users on Twitter. They found that influential users have specific ways to
structure linguistically their tweets, and tend to express negative sentiment in part of
their tweets.
The aforementioned studies offer important insights into properties of user in-
teractions in social networks. However, the location-based social networks have other
characteristics or dimensions that have not been explored by those studies. In the
specific context of predicting tip popularity, not only the social network influence, but
also other entities (e.g, venues) should be taken in account as they may influence how
the tip propagates through the user population. Next, we survey related work on
2.4. Analyses of Location-Based Social Networks 27
location-based social networks.
2.4 Analyses of Location-Based Social Networks
Prior studies of location-based social networks(LBSNs) tackled problems such as the
identification of user mobility profiles [Li and Chen, 2009a; Noulas et al., 2011b; Rossi
and Musolesi, 2014] and characterization of the use of the LBSNs [Scellato et al., 2010;
Scellato and Mascolo, 2011; Noulas et al., 2011a; Scellato et al., 2011a], the design
of human mobility models [Cho et al., 2011; Cheng et al., 2011b; Noulas et al., 2012;
Silva et al., 2014], the development of mechanisms for recommending friends or places
[Berjani and Strufe, 2011; Ye et al., 2010; Li and Chen, 2009b; Scellato et al., 2011b;
Gionis et al., 2014], the design of location-based search methods [Cheng et al., 2011a],
event prediction [Georgiev et al., 2014] as well as the investigation of privacy related
issues [Lindqvist et al., 2011; Pontes et al., 2012b,a]. To our knowledge, we were the
first to analyze the use of tips, likes and to-dos on Foursquare, as discussed in Chapter
4.
By the time we started investigating, we were aware of only two previous stud-
ies that aim at uncovering user profiles in LBSNs. In Li and Chen [2009a], the au-
thors applied two different clustering approaches to identify user behavior patterns on
BrightKite. One approach exploited the geographic position associated with user up-
dates (i.e., check ins, photos, and notes) to classify them into four groups according
to their mobility, namely, home, home-vacation, home-work, and other. The second
approach clustered users based on multiple attributes such as total number of updates,
social features, and mobility characteristics, and led to the identification of five groups,
namely, inactive, normal, active, mobile, and trial (or non loyal) users. The second
study was performed on Foursquare [Noulas et al., 2011b]. The authors used a spectral
clustering algorithm to group users based on the categories of venues at which they had
checked in, aiming at identifying communities and characterizing the type of activity
in each region of a city. We here also identify user profiles on Foursquare (Chapter
4). Rossi and Musolesi [2014] proposed a trajectory-based approach where a user is
identified simply considering the trajectory of spatio-temporal points given by his/her
check-in activity. However, unlike these previous studies, we focus on user profiles in
terms of their tipping activity, revealing relevant patterns (including some illegitimate
or spamming activity). Wang et al. [2014] proposed a framework to discover overlap-
ping and hierarchical communities of LBSN users considering both user-venue check-in
and the attributes of users and venues (e.g. venue category).
28 Chapter 2. Literature Review
Some other studies focused on the properties of the social networks in LBSNs. For
example, Scellato et al. [2010] analyzed the social, geographic and geo-social properties
of four social networks that provide location information about their users, namely
BrightKite, Foursquare, LiveJournal and Twitter. They showed that LBSNs are char-
acterized by short-distance spatial-clustered friendships, while in the other types of net-
works, such as Twitter and LiveJournal, users have heterogeneous connection lengths.
An analysis of Gowalla users showed that the number of friends follows a double Pareto-
like distribution, whereas the numbers of check ins and places are better described by
log-normal distributions [Scellato and Mascolo, 2011]. The authors also analyzed the
temporal variations of such distributions, observing that users tend to add new friends
at a faster rate than they give check ins and go to new places. Noulas et al. [2011a]
analyzed the dynamics of user check ins and the presence of spatio-temporal patterns
on Foursquare. They found that user heterogeneity was observed with respect to the
number of friends, average distance and social triads. Another study also pointed out
that users with fewer friends tend to generate social triangles on a small geographic
scale, while users with more friends tend to belong to geographically wider triangles
[Scellato et al., 2011a].
Modeling human mobility requires access to spatial and temporal information
about the places people visit. LBSNs are rich sources of this kind of data. Indeed,
towards that goal, Noulas et al. [2012] studied the urban mobility patterns in sev-
eral metropolitan cities around the world by analyzing a dataset containing check ins
of Foursquare users. Human mobility patterns were also investigated on Gowalla,
Brightkite and cell phones trace datasets [Cho et al., 2011], with the conclusion that
humans experience a combination of strong short range spatially and temporally peri-
odic movement that is not influenced by the social network structure, while the long-
distance travel is more influenced by the social links. The authors proposed a mobility
model that combines periodic daily movement patterns with the social movement ef-
fects caused by the friendship network. Finally, Cheng et al. [2011b] studied human
mobility patterns by analyzing spatial, temporal, social and textual aspects associated
with tweets containing check-ins. They observed that LBSN users follow reproducible
patterns, and that social economical factors are related with mobility. Silva et al.
[2014] investigated the potential of LBSNs to build participatory social networks and
exploited them to study city dynamics.
The task of recommending friends or places to LBSN users, has also been tackled
by previous work. Using collaborative filtering techniques, Berjani and Strufe [2011]
proposed a personalized recommender for places in Gowalla based on the number of
check ins at spots, whereas Ye et al. [2010] proposed a collaborative recommendation
2.4. Analyses of Location-Based Social Networks 29
algorithm that uses the number of Foursquare check ins of commonly visited places
to perform recommendations. Li and Chen [2009b] proposed a three-layered recom-
mender model using attributes of user profiles (preferences), social graphs (friendship),
and mobility patterns (distance of visited places) to recommend friends in Brightkite.
Another supervised learning framework to recommend places and friends was proposed
by Scellato et al. [2011b] and evaluated in a longitudinal dataset collected from Gowalla.
Using check in information regarding users who visited the same place and friends of
friends, the authors were able to reduce the link prediction space and thus improve
prediction accuracy. Both previous studies [Scellato et al., 2011b; Li and Chen, 2009b]
concluded that the inclusion of information about location-based activity leads to bet-
ter predictions if compared to when only social data is used. Complementary, Cheng
et al. [2011a] used check in traffic patterns, extracted from the temporal dynamics of
Foursquare check ins during a time period, to develop a traffic-driven location clus-
tering algorithm, which in turn, was used to improve the recommendation of nearby
places. Gionis et al. [2014] focused on the problem of recommending customized tours
in urban settings using the same dataset as Cheng et al. [2011b]. Their proposed frame-
work recommends tours considering the different types of venues, the order in which the
user wants to visit each place, the budget constraints expressed in terms of distance,
and the satisfaction that each venue can provide to the user. Finally, Georgiev et al.
[2014] used data from Foursquare to analyze event patterns in three metropolitan cities
(London, New York, and Chicago) to understand to which extent geospatial, temporal,
and social factors influence users’ preferences towards events.
As in other online social networks, information sharing also raises concerns about
exposure of user private data. Another thread of research in LBSNs has been devoted
to discuss privacy related issues. A large-scale study on inferring the home location
of Foursquare users is presented by Pontes et al. [2012b,a]. The authors analyzed
the potential of using publicly available features such as mayorships, tips and likes
as sources of information leakage. Lindqvist et al. [2011] surveyed users about their
motivation for using location sharing services as well as how they manage their privacy.
The authors observed that the majority of interviewed users did not have privacy
concerns since, reportedly, they can manage their privacy selecting what they are willing
to share.
Although the aforementioned studies offer important insights into properties of
user interactions in LBSNs, none of them addressed how users exploit tips and likes
as a means to review or recommend a place or a service as well as to give feedback
regarding previously posted tips. Our characterization study, reported in Chapter 4,
aims at contributing to fill this gap.
30 Chapter 2. Literature Review
2.5 Summary
In this dissertation, we attempt to understand how users exploit micro-reviews (tips) on
Foursquare and we focus on the specific problem of predicting a tip’s future popularity.
Our popularity metric is based on the number of likes received by a tip. Besides
reflecting the popularity of a tip at a given point in time, the same number can also be
used as an estimate of the helpfulness or the quality of the same tip. Thus, studies on
assessing the helpfulness or quality of reviews as well as studies on popularity prediction
of online content are also related to our work.
We started this chapter describing some related work on information credibility
in Section 2.1. Some of those studies pointed that the quality (or the popularity) of a
content is related to the perceived credibility of the author or of the information itself.
Some of the credibility directives, proposed in those studies, guided us in the design of
features used in our prediction models.
Next, in Section 2.2, we discussed previous work focused on automatically de-
termining the quality (or helpfulness, or utility) of reviews, which primarily made use
of either content features and target longer pieces of content. In contrast to those
reviews, Foursquare tips tend to be more concise, subjective and informal. Thus, we
here exploit in addition to content features, attributes related to the user who posted
the tip and the venue where it was posted. Moreover, most previously studied review
systems are based on helpfulness scales (e.g., ratings 1 to 5 or votes of unhelpfulness),
while Foursquare tips are evaluated only by like marks. The lack of a clear helpfulness
scale makes the prediction task more complicated.
Other studies closely related to our prediction problem focus on predicting the
popularity of online content (Section 2.3). However, the models proposed in these
studies are very specific to certain scenarios or type of data (e.g., tweets, videos, and
news), where they are evaluated. In addition, some models require early popularity
measurements of the same content that is the target of the prediction. Our prediction
models explore other types of features and scenarios in which we vary the monitoring
time, including a scenario where prediction is performed at posting time (so no early
popularity measurements is available). In the second part of the same section, we
discussed relevant studies about information propagation and social influence models.
Those studies guided us in some of our analyses described in Chapter 4. In particular
we have adopted an approach already exploited by Weng et al. [2010] and by Adamic
et al. [2008] to characterize and estimate user influence using the network built from
the user interactions through tips.
Finally, in Section 2.4 we surveyed related work on location-based social networks.
2.5. Summary 31
However, most of those studies focus on the user check in dynamics, on the properties
of the social graph, and on related geographical information. Thus, to the best of
our knowledge, there is no previous analysis of how users exploit tips and no previous
studies about assessing tip popularity. Next, we present the main elements and features
of Foursquare and describe the methodology we adopted to crawl it, as well as a
summary of the collected datasets.

Chapter 3
Foursquare: Case Study
Social networks have become increasingly present in our everyday habits. The ad-
vances in mobile communication and development of geographic information systems,
such as the GPS (Global Positioning System) technology, allowed the design of a vari-
ety of context-aware applications. Sharing their current location (check in), uploading
location-tagged photos, commenting in real-time events of services are examples of
features available for users of a new type of online social network: the location-based
social networks (LBSNs). Foursquare and Google+, the currently most popular LB-
SNs1, and social networks that incorporated location-based services (e.g., Facebook
and Yelp) have facilitated new types of relationships between users and places in the
physical world that are registered in the application.
One of the contributions of this dissertation is a comprehensive characterization of
one of the currently most popular location-based social networks, namely Foursquare,
focusing on the user-generated micro-reviews (called tips) that users post on the system.
In this chapter, we first review the main elements and features of Foursquare (Section
3.1). Next, we present the methodology we adopted to crawl the system, as well as
the datasets collected (Section 3.2), which are used in our analyses in the following
chapters.
3.1 Foursquare: Key Elements and Features
Foursquare is a prime example of an LBSN on which users share their locations with
friends. As videos and images are the main objects on YouTube and Flickr, respectively,
the main objects on Foursquare are venues. A venue represents any physical location,
1Gowalla and Brighkite are also examples of LBSNs, but they no longer exist.
33
34 Chapter 3. Foursquare: Case Study
like a store, a restaurant, an university, an airport or a monument, where users can
check in. Users may check in at venues when they are physically close to those venues,
using GPS-equipped mobile devices. Once users check in, they may choose to share
their locations with friends. Every time a user checks in a venue, she collects points,
namely badges, on Foursquare. If a user has more “check ins” in a certain venue
than any other user in the past 60 days, she becomes the venues’ mayor. Venues are
created by Foursquare users who become owners of that place. However, venues can
be claimed by real business owners. In this case, venues are verified by Foursquare
and, if approved, the real owners of the venue can start offering promotions and special
deals to users who frequently check in at that venue. Foursquare also maintains a
set of nine pre-defined venue categories, namely, “Arts & Entertainment”, “Colleges
& Universities”, “Food”, “Great Outdoors", “Nightlife Spots”, “Professional & Other
Places” ,“Travel & Transport”, “Shops & Services”, “Residences”.
Foursquare users are categorized into standard (or regular) users, celebrities, and
brand pages1. Standard users automatically become celebrities when they achieve
more than 1,000 friends in the system2; Brand pages, in turn, are users that represent
companies or businesses (e.g., History Channel, Starbucks). Foursquare allows two
types of social relationship among their users, namely, friendship and the follower-
followee relationship. The type of relationship that a user may establish depends on
the user’s category: standard users can only have friends and followees, brand pages
can have only followers and followees, whereas celebrities can have friends, followees
and followers.
In addition to check ins, users can also post tips (i.e., comments or reviews) at spe-
cific venues, commenting on their previous experiences when visiting the corresponding
places. Tips can contain helpful information to guide others in their choices, such as
suggestions, recommendations or disapprovals (e.g., “I love this apple pie”, “The bath-
rooms are not clean”), practical advices or directions (e.g., “where is the closest ATM
machine from the museum?”) and factual comments that reveal fun and surprising
facts about a location (“This place was founded in 1788”) [Sarah Best, 2012].
Foursquare tips are typically much more concise (the length is restricted to 200
characters) and often more informal and subjective than other reviewing systems. For
example, on systems like TripAdvisor, Amazon and Yelp, reviews are often longer
and more formally structured, and often carry very specific information about a prod-
uct/service. Nevertheless, tips may nourish the relationship between users and real
businesses, offering valuable feedback that business owners can benefit from to im-
1http://aboutfoursquare.com/user-type-comparison/
2http://aboutfoursquare.com/foursquare-converts-most-popular-users-to-celebrity-accounts/
3.1. Foursquare: Key Elements and Features 35
prove their products. Moreover, tips may be key features to attract future visitors to
both the venue and the corresponding physical place. We note that unlike user check
ins, which are visible only to the user’s friends, tips are visible to everyone. Thus,
tips have the potential to significantly impact online information sharing and business
marketing.
Figure 3.1: Screenshot of a Foursquare Venue Page.
Users can also evaluate previously posted tips by clicking on a “like” mark or
saving them in a to-do list, in sign of agreement with the posted content or interest
in the information provided in it. “Like” and to-do marks ultimately serve as feedback
from other users with regards to the helpfulness or interestingness of the tip. Examples
of the most popular tips of our dataset: “The park opened in 1971 & is the world’s
largest and most visited recreational resort.”, “Go on a Sunday - it’s a nonstop party
all day and make sure you get a pitcher of Mojitos all for yourself.” and “You can shop
days and nights. They don’t sleep.”.
A newly posted tip goes to the venue page, displayed sorted by the number of likes
received or by posting time, but only the former is available in the mobile application
(Figure 3.1). The order at which tips are displayed tp the user may also vary depending
on the authors of the tips. For example, tips posted by the user’s friends and followees
are shown first to her. Moreover, the user also receives notifications when any friend
or followee posts a tip1.
A number of factors may impact a tip’s future popularity, including: (1) the
website layout which gives more visibility to already popular tips, thus contributing to
the rich-get-richer effect [Liu et al., 2007]; (2) the popularity of the venue where the
1Likes are not notified to the user’s social network, as in Facebook.
36 Chapter 3. Foursquare: Case Study
review was posted; (3) characteristics of the user who posted it (including the number of
active friends/followers); and (4) the content itself. Most of these factors are analyzed
in Chapter 4, and mapped into features exploited by our popularity prediction models
in Chapter 6.
3.2 Measurement Methodology
We study how users explore micro-reviews (tips) by observing certain properties of
each entity that plays a central role in the problem – namely user, venue, and the tip’s
textual content – in order to estimate the number of likes received by those during a
given time. We use two datasets collected from Foursquare: one was collected from a
set of venues, whereas the other was gathered from a set of users. In this section, we
briefly describe the strategy adopted to crawl Foursquare in Section 3.2.1. We then
summarize the two datasets in Section 3.2.21 and 3.2.32.
3.2.1 Crawling Methodology
Our study is based on two datasets collected from Foursquare using the system API.
We start describing how the venue dataset was collected. Our crawling strategy, which
relies on a set of worker processes and a master process, exploits the fact that each
venue in Foursquare receives a unique and sequential numeric identifier (ID)3. Given
M (an estimate of) the largest ID assigned to a venue in the system, the master process
randomly selects an ID according to a uniform distribution in the [0,M ] range, and
gives it to the next idle worker. We chose to perform a random selection of IDs, as
opposed to sequentially selecting each possible value, to minimize the chance of a bias
towards older user accounts (which, we conjecture, have smaller IDs). The worker then
sends a request to the Foursquare API to gather information about the corresponding
venue. In particular, for each collected venue, the crawler collects all its tips, the
identifications of the users who posted each of them, the number of likes and to-dos
each tip received, the number of users who checked in the venue, the venue category
as well as its geographic coordinates.
A series of initial experiments consisting of sending HTTP GET requests to the
pages of specific venues identified by their IDs was executed to verify their existence.
We tried increasing values of IDs, starting with 0. The largest ID for which we did get
1We named this dataset as dataset 1.
2We named this dataset as dataset 2.
3We observe that Foursquare venues IDs are no longer sequentially assigned.
3.2. Measurement Methodology 37
a response corresponding to a valid webpage was 20 million. We experimented with
many IDs greater than that value, but in all those cases, the response was “Not Found”.
Thus, we speculate that, at the time of our crawling, 20 million was the largest venue
ID in the system. Thus, we set M equal to this value, and used it as input to our
crawler. The venue crawling process ran from May 23rd to July 19th 2011, gathering
data corresponding to more than 1.6 million venues as further described in Section
3.2.2.
After this crawling, we learned that the Foursquare API made available infor-
mation about its users. We then decided to stop the venue collection, and started
collecting the users (instead of the venues) from August to October 2011. This de-
cision was made since the number of users on Foursquare (around 10 million users1)
at that time was smaller than the number of venues, and we were interested only in
venues with at least one tip. Thus, it would be most cost-effective to collect the users.
We used the same crawling methodology used for the venues to collect data about
the users. We collected profile data for each user, including name, user type, home city,
total number of check ins, list of friends, list of mayorships, and list of tips. For each
tip posted by the user, the worker also collected the total number of likes received,
the set of users who marked it as liked, as well as the tip’s content, timestamp and
venue identifier where the tip was posted. Finally, for each venue associated with a
tip, the crawler collected the total number of user check ins, total number of unique
visitors, and its category. At the end of this period, we were able to collect more
than 13 million users. We believe that this represents a large fraction of the total user
population, since, as previously reported2. Next, we describe each collected dataset,
the total number of registered users varied from 10 million in June 2011 to 15 million
in December of the same year.
3.2.2 Venue Dataset (Dataset 1)
Our venue crawler ran fromMay 23rd to July 19th 2011, gathering data corresponding to
more than 1.6 million venues. Table 3.1 summarizes the collected dataset. Associated
with the crawled venues, we were able to identify almost 1 million tips, and more than
half million unique users, out of whom 1,248 are brand users. Moreover, 3.8% of the
venues are verified, while 18.5% of them had received, by the time of the crawling, at
least one tip. Since our focus is on understanding how users exploit tips, likes and
1https://foursquare.com/infographics/10million
2http://www.socialmedianews.com.au/foursquare-reaches-15-million-users/
38 Chapter 3. Foursquare: Case Study
Table 3.1: Summary of Our Venue Dataset.
Number of venues 1,601,412
Number of venues with at least one tip 296,217
Number of verified venues 61,378
Number of users 526,651
Number of brand (pages) users 1,248
Number of tips 984,251
Total number of likes for all tips 1,407,835
Total number of to-dos for all tips 393,574
to-dos, our analyses in the Section 4.2 are performed over the venues with at least one
tip.
3.2.3 User Dataset (Dataset 2)
Table 3.2: Summary of Our User Dataset
Number of tips 6,817,992
Number of tips in English 4,374,922
Number of likes1 5,740,954
Number of tips with at least one like 2,341,579
Number of venues with at least one tip 3,194,556
Number of users who posted at least one tip 1,831,747
Number of users who marked at least one tip as like 756,734
Number of users with at least one tip marked as like 910,486
Number of venues with at least one tip marked as like 1,254,843
Number of verified venues 219,418
Our complete user dataset, collected from August to October 2011, contains
almost 16 million venues, and over 10 million tips. However, to avoid introducing
biases towards tips that are either too old or too recent, we restricted our analyses to
tips and likes created between January 1st 2010 to May 31st 2011. After applying this
filter, we ended up with over 6 million tips, and over 5 million likes, posted in slightly
more than 3 million venues by more than 1,8 million users. Table 3.2 provides some
statistics about our analyzed user dataset. Note that around 34% of the tips received,
during the considered period, at least one like. As shown in Table 3.2, more than 4
1Some likes were filtered since they had timestamps inconsistent (earlier) than the timestamp of
the associated tips.
3.2. Measurement Methodology 39
million tips in our user dataset are in English. To identify them, we used a Linux
dictionary (myspell) to filter tips with fewer than 60% of the words in English.
40 Chapter 3. Foursquare: Case Study
3.3 Summary
In this chapter, we discussed the main elements and features of Foursquare, the cur-
rently most popular location-based social network. We also presented our crawling
methodology and the summary of our two datasets. Next, we use these datasets to an-
alyze how users interact using tips to understand what factors affect the tip popularity.
These datasets capture the three dimensions (user, venue and tip textual content) that
are relevant for our study.
Chapter 4
Tipping Activity on Foursquare:
Characterization and User Influence
Understanding how users behave when they interact with each other through tips
or related features (e.g., likes and to-dos) is important to derive insights into which
factors impact tip popularity as well as insights that can help the interpretation of
the prediction results. In this chapter, we first study several factors impacting the
popularity of a tip, including attributes related to the three entities that play central
role in the tip popularity prediction problem, namely, users (Section 4.1.1), venues
(Section 4.1.2), and tips (Section 4.1.3). These factors are incorporated as features in
our prediction models, introduced in Chapter 6.4. Next, we identify four groups of
users with very different behaviors regarding their usage tips and likes (Section 4.2).
One of the identified group consists of potential spammers, as those users post tips that
are unrelated to the venue. Towards better understanding user interactions, notably
the presence of influential (or expert) users, we discuss methods to automatically infer
a user’s influence level on Foursquare in Section 4.3. Finally, in Section 4.4, we analyze
the dynamics of tip popularity on Foursquare.
4.1 Characterization of Tipping Activity
In this section, we analyze how users exploit the tips and likes on Foursquare using
the larger user dataset (dataset 2) presented in Section 3.2.31. We start by focusing on
the users, and analyze how they interact through tips and how their tips are evaluated
(Section 4.1.1). Next, we discuss how users interact with venues by posting tips and
1Since dataset 1 is a subset of dataset 2, the findings of this section are also valid for dataset 1.
41
42
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
marking them as liked (Section 4.1.2). Finally, we analyze features extracted from the
tips’ contents (Section 4.1.3).
4.1.1 User Analysis
We start our characterization by focusing on the users and analyzing the total number
of tips posted as well as the total number of likes received and given by each user.
The number of likes received by a user refers to the total number of times that any tip
posted by the user was marked as liked by others. Thus, it reflects the popularity of
the collection of tips posted by the user. The number of likes given by a user reflects
the feedback given by her on other users’ tips. Our characterization is performed over
all users in the dataset 2 (see description in Section 3.2.3). We note that these are, in
the vast majority (99.8%), standard users.
(a) Number of Tips (b) Number of Likes
Figure 4.1: User Tipping Activity on Foursquare.
The complementary cumulative distribution functions (CCDF) of these measures
are shown in Figure 4.1, with both axes in logarithmic scale. Complementary data is
also shown in Table 4.1, which provides maximum, mean, and median values as well as
coefficient of variation (CV)1 for these three metrics2. Clearly all three distributions
are heavy-tailed: most users posted very few tips and/or received/gave a few likes,
while a small fraction of the users posted a lot of tips and/or received/gave a lot of
feedback on previously posted tips. For instance, 46% of the users post only one tip,
whereas 19% and 48% of them receive and give only one like, respectively. In contrast,
1Ratio between standard deviation to the mean.
2Recall that, since we are analyzing the filtered user dataset, all users have at least one tip.
4.1. Characterization of Tipping Activity 43
1,499, 2,318 and 2,935 users posted more than 100 tips, received more than 100 likes,
and gave more than 100 likes, respectively. The maximum number of tips posted by a
single user is 5,791, whereas a single user received almost 209 thousand likes, as shown
in Table 4.1. The heavy tail nature of these distributions indicates that most likes are
concentrated on tips posted by few users, suggesting that the tips posted by such users
may experiment the rich-get-richer phenomenon, which suggests that a tip will attract
new likes at a rate proportional to the number of likes already acquired [Borghol et al.,
2012]. We further analyze this in Section 4.4. Note also that the median number of
likes per user is, on average, only 0.48, which implies that many users have this feature
equal to 0. This will impact our prediction results, as discussed in Chapter 6.
We also analyze whether users who post more tips tend to give/receive more likes.
Figure 4.1b shows that there are some users who tend to receive a proportionally much
larger number of likes in their tips, indicating high popularity of their tips. Correlation
between variables was assessed by the non-parametric Spearman’s rank correlation
coefficient (ρ) test [Zwillinger and Kokoska, 2000], defined as :
ρ = 1− 6
∑
(di)
2
n(n2 − 1) (4.1)
where n is the number of paired ranks, and di is the difference between the paired
ranks. The correlation (ρ) computed over the number of tips posted and the number
of likes received is moderate (0.54), while between the number of tips and the number
of likes given by each user is lower (0.37). We also analyzed the correlation between
the number of likes received and number of likes given by each user, finding an even
lower correlation (0.35) between them. Thus, in general, users who tip more do not
necessarily receive more likes, and users who give more likes do not always receive more
likes.
Table 4.1: Summary of Users Tipping Activities.
Metric Maximum Mean Median CV
# of tips per user 5,791 3.72 2.0 3.25
# of likes received per user 208,619 3.13 0 63.40
# of likes given per user 14,090 4.38 2 4.50
Median of the # of likes per user 657 0.48 0 2.77
Mean the # of likes per user 858 0.58 0 2.70
Std of the # of likes per user 632.81 0.34 0 3.88
# of friends/followers per user 318,890 44.79 17 15.95
# of mayorships per user 325 1.21 0 2.42
44
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
(a) Number of Friends and Followers (b) Numbers of Mayorships, Tips and Likes at the
target venue
Figure 4.2: Number of Friends and Followers and Number of Mayorships per User.
Next, we analyze the total number of friends or followers of the user, the total
number of mayorships won by each user, as well as the number of tips or likes posted
or received by a user at the same venue where she was a mayor. Figure 4.2 shows the
distributions of these measures, with both axes in logarithmic scale.
The distribution of the number of friends and followers indicates that the social
network is quite sparse among users who post tips: a user has on average only 44 friends
or followers, whereas 37% of the users have at most 10 friends/followers, although the
maximum reaches 318,890 (MTV user). Moreover, we analyze the user familiarity
with the venue through two points of view: number of mayorships accomplished, and
if the user was a mayor in the venue where he posted a tip. We conjecture that a user
who frequently visits the same venue will have a higher probability of writing more
interesting/helpful/popular tips about the place. For instance, 41% of the users who
post a tip had at least one mayorship in the same venue. In contrast, 37 users are
mayors of more than 100 venues, 47 users have more than 10 posted tips and 32 users
have more than 20 received likes in venues where they where mayors. We note that
users who post tips have at least one mayorship on average but the correlation between
the number of tips posted and number of mayorships is relatively low (0.36). Similarly,
the correlation between the number of likes received and the number of mayorships
won by each user is also relatively low (0.20). Thus, our conjecture is not as strong as
we expected. Yet, it is not negligible, thus, we capture them in some of the features
exploited by our popularity prediction models (Chapter 6).
Finally, we analyze the cumulative distribution of the fraction of likes received by
4.1. Characterization of Tipping Activity 45
Figure 4.3: Fraction of Likes Received from the User’s Social Network (Friends and
Followers).
a user that comes from her social network, that is, likes that were given by her friends
or followers. Figure 4.3 shows that, for 70% of the users, at most 50% of the likes come
from their friends and followers. In other words, the social network has influence on
the popularity of the tips posted by a user.
4.1.2 Venue Analysis
We now turn to the characterization of the venues. Figure 4.4a shows the CCDFs of the
number of posted tips and the number of likes accumulated by all tips per venue, while
Figure 4.4b shows the distributions of the total number of check-ins and the number of
unique visitors. Complementary data is shown in Table 4.2, which provides maximum,
mean and median values for these three metrics. Recall that, since we are analyzing
the filtered dataset, all venues have at least one tip.
Table 4.2: Summary of Visiting and Tipping Activities at Venues.
Metric Maximum Mean Median CV
# of tips per venue 2,419 2.13 1 2.23
# of likes per venue 7,103 1.80 0 11.60
Median # of likes per venue 390 0.45 0 2.81
Mean # of likes per venue 390 0.52 0 2.79
Standard deviation of # of likes per venue 774.44 0.26 0 8.27
# of check ins per venue 484,683 217.33 48 6.18
# of unique visitors per venue 167,125 87.35 15 5.87
Once again, Figure 4.4 shows that the four distributions are heavy tailed: most
venues, when tipped, receive only a few likes as most likes are concentrated on tips
46
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
(a) Tipping Activity (b) Visiting Activity
Figure 4.4: Visiting and Tipping Activities per Venue.
posted at few venues. Note that the maximum number of likes per venue exceeds 7,000,
but is, on average, only 1.80. The Spearman’s rank correlation coefficient computed
between the total number of tips and the total number of likes per venue is moderate (ρ
= 0.50), implying that tipping, in general, tends to be somewhat effective in attracting
visibility to a venue: the larger the number of tips, the higher the amount of user
feedback the venue tends to receive.
Some venues also concentrate most of the check-ins, visitors and tips. For in-
stance, while the median number of check ins per venue is only 48, the mean exceeds
217. Moreover, around 70 venues have more than 100,000 check-ins each, and one venue
(i.e., Los Angeles International Airport) has almost half million check ins. Similarly,
around 10 venues have more than 100,000 unique visitors. Regarding the total number
of tips per venue, we find that 66% of the analyzed venues have only one tip, while 667
of them receive more than 100 tips each. One venue (i.e., Super Bowl Arlington) in
particular receives more than 2,000 tips. The correlation between the total number of
check ins and number of unique visitors is 0.86, revealing a very strong correlation, as
one might expect. Moreover, we find that the correlation between either the number of
check ins or the number unique visitors and total number of tips posted in the venue
is moderate to high (around 0.52) indicating that, to some extent, popular venues do
tend to attract more tips. The same moderate to high correlation is also observed
between the total number of likes and the number of check ins (ρ = 0.50) and the
number of unique visitors (ρ = 0.46). As temporal information related to check ins are
not available in our dataset, we are not able to tell whether the larger number of check
ins is due to the larger amount of feedback received by the tips posted at the venue or
vice-versa. However, regarding the median number of likes per venue, we found a low
4.1. Characterization of Tipping Activity 47
correlation (ρ = 0.3) with the total number of check ins, which means that not all tips
posted at the same popular venue receive comparable number of likes. Whereas this
is probably due to various levels of interestingness of tips posted at the same venue, it
might also reflect the rich-get-richer effect.
(a) Number of Venues, Tips, and Likes (b) Number of Check ins and Unique Visitors
Figure 4.5: Distributions per Venue Category.
Foursquare also maintains a set of nine pre-defined venue categories, namely,
“Food”, “Travel & Transport” (Travel), “Great Outdoors” (Outdoors), “Nightlife Spots”
(Nightlife), “Professional & Other Places” (Professional), “Residences”, “Shops& Ser-
vices” (Shops), “Colleges & Universities” (Education), “Arts & Entertainment” (Enter-
tainment). Figure 4.5 shows histograms of the number of venues, tips, likes, check ins
and unique visitors in each category, in our dataset.
Approximately 43% of the venues in our filtered dataset (i.e., venues with at
least one tip) are from the Food and Shops categories. These categories are also the
ones that attract most tips and receive most feedback on their tips. For instance,
the two venues that receive the largest number of tips are the Super Bowl Sunday
event and Soekarno-Hatta International Airport in Jakarta, whereas the venues that
have the largest number of likes in their tips are the Madison Square Garden and
the Disneyworld’s Magic Kingdom Park. In contrast, the categories that receive the
smallest number of tips and likes are Residences and Colleges and Universities.
4.1.3 Tip Analysis
Finally, we analyze the tips, characterizing not only the total number of likes received by
each tip but also three properties of the tip’s content, namely, the number of characters,
the number of words, and the number of URLs and e-mail addresses. Figure 4.6a shows
48
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
(a) Content and Feedback Received by Foursquare
Tips.
(b) Content Features of Yelp Reviews.
Figure 4.6: Content Features of Foursquare Tips and Yelp Reviews.
that, as expected, most tips receive a very small number of likes (66% of tips receive no
like at all), whereas some tips have a high popularity among the users. For instance,
95 tips receive more than 1,000 likes. One such example is a tip posted at the Magic
Kingdom Park venue giving historical facts about the place.
We also observe that most tips are very short, with, on average, approximately
60 characters and 10 words. The maximum numbers of characters and words are,
respectively, 200 (limit imposed by the application) and 66, as shown in Table 4.3.
Also, the vast majority (98%) of the tips carry no URL or e-mail address. Moreover,
we find no strong correlation between the size of the tip and the number of likes received
by it (ρ under 0.07).
This is a preliminary analysis, other features that capture other textual proper-
ties such as readability, informativeness, part-of-speech tagging of each tip word, and
polarity of the tip’s sentiment (positive, neutral, negative) will be exploited in Chapter
6.4 and 6.
Table 4.3: Summary of Tip Textual Characteristics.
Metric Maximum Mean Median CV
# of words per tip 66 10.25 8 0.78
# of characters per tip 200 59.78 46 0.75
# of URLs per tip 9 0.02 0 8.27
# of likes per tip 5352 0.84 0 10.74
Finally, we also observe that Foursquare tips are much shorter than reviews in
4.2. User Profiles 49
Yelp (Figure 4.6b), which confirms our hypothesis about the nature of tips being dif-
ferent from other types of online reviews previously studied1.
In this section, we observed that most users posted very few tips and/or received
few likes while most tips and likes are concentrated on a small fraction of the users (Fig-
ure 4.1). Next, we further analyze user tipping behavioral patterns by first identifying
and characterizing relevant and typical user profiles (Section 4.2), and then character-
izing the properties of the network that emerges from user interactions through tips
and can be used to assess user influence (Section 4.3).
4.2 User Profiles
In the previous section, we found that the correlation between the number of tips and
the total number of likes received by the tips of a user is only moderate (0.54). Thus,
users who tip more do not necessarily receive more feedback on them. In other words,
if we take the total number of likes received by all tips posted by a user as an estimate
of the user influence in the system, such influence is only moderately correlated with
her degree of tipping activity, seeming much more related to the tipped venue2.
In this section, we discussed various user behavior patterns observed in our
Foursquare dataset, with respect to the use of tips, likes and to-dos. We start an-
alyzing users who have suspicious behavior, in particular those who post tips with
links (Section 4.2.1). Next, based on the patterns of users tipping activity, we classify
users into four tipping profiles in Section 4.2.2. The analysis made in this Section were
performed over the dataset 1 described in Section 3.2.23. The results of this Section
were published in [Vasconcelos et al., 2012b]4.
4.2.1 Suspicious Behavior
According to the Foursquare’s terms of service, the introduction of links to unrelated
sites across various venues is considered spamming, and users who are caught doing
it should have their accounts deactivated [Foursquare, 2011]. We found some users
who had the majority of their tips containing links, i.e., URLs and email addresses.
This finding raises a concern about suspicious behavior, particularly because only links
1This analysis was performed using a Yelp dataset published in
http://www.yelp.com/academic_dataset
2We define in Section 4.3 other ways to measure influence.
3The total number of to-dos were also used as a tip feedback since they were public available in
that dataset.
4Likes were referred to as dones in that work.
50
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
included in the tip’s text were accounted for. In other words, links placed in the “More
info” field, expected to be directly related to the target venue (e.g., the venue’s website),
as well as related pictures, placed in a separate field, were disregarded.
To delve deeper into this issue, we selected users with at least 10 tips and at least
60% of their tips containing links for further analysis. This selection corresponds to 3%
of all users who posted at least 10 tips. Figure 4.7a plots the percentage of tips with
links versus the number of tipped venues for these users, whereas Figure 4.7b shows the
percentage of tips with links versus the total number of likes and to-dos. In this plot,
users are grouped into two sets based on the maximum geographical distance between
any pair of venues tipped by them. We refer to such distance as the diameter of the
venues tipped by the user, and use it to assess the scale (local or global) of the user’s
tipping activity.
The graphs show no clear correlation between the percentage of tips with links
and the total number of likes and to-dos or the number of tipped venues. Indeed, the
Spearman coefficients are -0.17 and 0.13, respectively. In other words, there are many
users with a large percentage of tips with links who posted tips at only a few venues,
which cannot necessarily be configured as spamming, according to Foursquare’s rules.
Moreover, there are also users who, despite the large percentage of tips with links, did
receive a large number of likes and to-dos (see discussion below).
However, Figure 4.7a also shows that several of the selected users post tips with
links at a large number of different venues. Take for instance “User 5”, “User 6” and
“User 7” in that figure. They post tips at more than 100 different venues, and all
tips contain links. These numbers reveal a behavior pattern that is consistent with
spamming, and violates Foursquare’s terms of service. Moreover, the total numbers of
likes and to-dos for “User 5” and “User 7” are only 12 and 32, respectively, whereas
“User 6” receives no feedback. Interestingly, we found no clear correlation between
suspicious behavior and diameter of the tipped venues. In other words, our results
reveal potential spamming activity both locally and globally.
We note that not all users who posted tips with links at many venues are neces-
sarily engaged in spamming activity, as the linked webpage might be somewhat related
to the tipped venue. Take, for instance, “User 8” in Figure 4.7a, who posts 286 tips,
90% of which contains links, at 261 different venues. Despite the large number of tips
with links, those tips receive a total of 92 likes and to-dos. We manually investigated
this user, finding that it corresponds to a large business chain that placed the same
tip to all of its stores advertising a promotion. The tip contains a link to an external
webpage that should be visited by those interested in participating to learn more about
it. A reasonably large number of users (92) marked the tip as liked or added it to their
4.2. User Profiles 51
to-do lists. In this case, the link pointed to a content that was related to the tipped
venue. Indeed, as we further discuss in Section 4.2.2.2, some users who post many tips
containing links at many venues aiming at spamming might still receive many likes
and to-dos from others. In other words, they might succeed in triggering the interest
of many users.
60 70 80 90 1000
50
100
150
200
250
300
Percentage of Tips with Links
N
um
be
r o
f T
ip
pe
d 
Ve
nu
es
 
 
User 7
User 5
User 8
User 6
Diameter > 40 km
Diameter ≤  40 km
(a) # Tipped Venues vs. % of Tips with Links.
60 70 80 90 1000
100
200
300
400
500
600
700
800
Percentage of Tips with Links
N
um
be
r o
f L
ik
es
 +
 T
o−
Do
s
 
 
User 7
User 5
User 8
User 6
Diameter > 40 km
Diameter ≤ 40 km
(b) # Likes and To-Dos vs. % of Tips with Links.
Figure 4.7: Correlation between User Attributes (top 3% users with largest percentages
of tips with links).
4.2.2 Uncovering User Profiles
In the previous sections, we discussed various user behavior patterns observed in our
Foursquare datasets, with respect to the use of tips and likes. We here go one step
further, and identify user profiles. We do so by applying a clustering algorithm to
group users based on three attributes, namely the number of tipped venues, the total
number of likes and to-dos, and the percentage of tips with links.
We selected the Expectation-Maximization (EM) clustering algorithm, which is
a well-known algorithm used for clustering in the context of mixture models [Dempster
et al., 1977]. We ran the EM implementation in Weka [Weka Machine Learning Project,
2012], which has a built-in iterative mechanism to determine the number of clusters.
The mechanism is based on ten-fold cross-validation: for each candidate number of
clusters, it breaks the data into 10 folds, 9 are used as training sets and 1 as testing
set. It builds the clusters on the training sets and, given those clusters, computes the
log-likelihood for each instance in the testing set. The log-likelihood values are summed
up and then averaged over all 10 folds. The number of clusters selected is the one with
maximum (average) log-likelihood.
52
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
Table 4.4: Summary of User Attributes Across Clusters.
Attribute Cluster 0 Cluster 1 Cluster 2 Cluster 3Avg CV Avg CV Avg CV Avg CV
Number of Venues 21.99 0.94 1.97 0.52 13.23 0.52 43.81 1.41
Percentage of Tips with Links 83.11 0.20 3.88 2.35 0.62 5.21 7.02 1.71
Number of Likes and To-Dos 20.41 1.82 7.35 1.52 29.53 2.09 1350.58 5.48
Number of Users 222 190 5660 477
Because of the large variability observed in the values of the user attributes,
particularly number of tipped venues and total number of likes and to-dos, which makes
the clustering task harder, we converted all values to a log scale, and normalized the
results afterwards. Next, we first present the clustering results (Section 4.2.2.1), and
then discuss some findings of a manual inspection of selected users (Section 4.2.2.2).
4.2.2.1 Clustering Results
We applied the EM clustering algorithm over all users with at least 10 tips. The
algorithm identified 4 clusters, referred to as, throughout this section, clusters 0, 1, 2
and 3. Table 4.4 shows, for each cluster, averages and CVs for each user attribute. It
also shows the number of users in each cluster. Complementarily, Figure 4.8 shows, for
each cluster, the cumulative distribution function (CDF) of each attribute.
Cluster 0, which includes around 3% of all clustered users, is characterized by a
much larger percentage of tips with links (83% on average). This is consistent across
most users of the cluster, as shown in Figure 4.8c and summarized by the low CV.
Indeed, this attribute clearly distinguishes these users from the others. The number of
tipped venues also tends to be large, though smaller than for users in cluster 3. These
patterns are consistent with the suspicious, potential spamming, behavior discussed in
Section 4.2.1. Moreover, in general, users in cluster 0 do not tend to receive a large
number of likes and to-dos.
Cluster 1 consists of focused users who are neither very active nor influential:
they tend to post tips at only a few venues, and do not receive many likes and to-dos
from others. These are mostly occasional users1. Users of cluster 2, on the contrary,
are much more active: they tend to leave tips at a larger number of venues, mostly with
no links, getting many more likes and to-dos in return. Around 86% of all clustered
users are in this group.
1We did observe, among users of cluster 1, many tips posted at the same venue within a very short
time. For instance, around 10% of users of cluster 1 had an average inter-tip time below 2 seconds,
suggesting that the user might have posted the same tip multiple times without knowing. This might
reflect lack of experience with the application.
4.2. User Profiles 53
(a) Number of Tipped Venues
(b) Number of Likes and To-Dos (c) Percentage of Tips with Links
Figure 4.8: User Profiles: Attribute Distributions.
Finally, cluster 3, containing around 7% of the considered users, is characterized
by the largest total number of likes and to-dos. These users also tend to post tips at
a large number of venues. Therefore, we expect that most very influential users who
target a large number of venues fall into this cluster. Moreover, as shown in Figure
4.8a, this cluster also contains users who post tips at only a few venues, indicating that
this cluster also contains some very influential but focused users.
We also analyzed the distribution of venues tipped by users in each cluster across
the various venue categories maintained by Foursquare. Figure 4.9 shows the distri-
butions. The fractions of venues tipped by users from both clusters 0 and 3 vary
only slightly across categories, except for “Colleges & Universities” (Education), which
clearly attracts much fewer tips than the other categories. Indeed, it is the least pop-
ular category among users of all four clusters. In other words, neither users who seem
54
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
engaged in suspicious activity (cluster 0) nor those who tend to be very influential in
the system (cluster 3) are focused, collectively, on specific categories. In contrast, the
venues tipped by the occasional users (cluster 1) are more concentrated in the “Food”
category. The same can be said, to a lesser extent, for users of cluster 2.
Figure 4.9: Venue Category Distributions.
4.2.2.2 Manual Inspection
As discussed in Section 4.2.1, we did find evidence of suspicious behavior, consistent
with spamming activity. Note that this evidence was based only on the percentage
of tips posted by a user containing links to external sites and email addresses and,
to a lesser degree, on the feedback those tips received from other users. However, an
interested spammer may find other ways of reaching users. For instance, a spammer
interested in selling a product may write a text advertising it, and post it as a tip.
One such case was a tip posted to various venues, including a Japanese restaurant and
an university, whose contents advertised a fitness center. The tip’s text is completely
unrelated to the nature and business domain of the target venue.
We here consider as spam a tip whose content is unrelated to the tipped venue,
typically an advertisement for a product that is, in nature, unrelated to that kind of
venue. Given this definition, we note that, some spam tips might indeed be successful
in getting a large number of likes and to-dos, since many users may find the advertised
product interesting, despite it being unrelated to the venue where the tip was posted.
To further investigate the existence of tip spamming in Foursquare, we manually
inspected a sample of users from each cluster. For each sampled user, we inspected the
4.2. User Profiles 55
contents of her tips and the venues at which those tips were posted. In case the tip
contained a link, the contents of the page pointed to by the link were also inspected.
Each sampled user was inspected independently by three volunteers, who labeled
the user as either spammer or not. A user was labeled spammer if the contents of at
least 50% of her tips were not related with the nature or domain of the tipped venues1.
The volunteers were instructed to be conservative: in doubt, they should label the
user as not spammer. Majority voting was used for final classification, although the
volunteers agreed in the vast majority (93%) of the cases. The volunteers also counted
down the number of inspected users who are brand users.
Table 4.5 presents our results, showing, for each cluster, the number of inspected
users, the number and percentage of them labeled as spammers by the volunteers, and
the number and percentage of them identified as brands. The sample size for each
cluster was defined so as to have a maximum error in our estimates of 5% with 95%
confidence [Jain, 1991].
Most of the users labeled as spammers are, as expected, in the cluster 0 sample.
Indeed, all users sampled from that cluster were labeled as spammers, mostly because
they posted tips with links pointing to unrelated content. However, our results also
show the presence of spammers in clusters 1, 2 and 3. Most of them were classified as
such because the text of their tips advertised a product unrelated to the nature of the
venue. Moreover, some users of cluster 3 labeled as spammers receive a large number
of likes and to-dos, indicating that, despite posting unrelated content, they trigger the
interest of many users2.
Table 4.5 also shows that there are brand users in all four clusters, although the
vast majority of them are in cluster 3. As discussed in Chapter 3, brand users are
special Foursquare users who are expected to use the system to reach their followers
and potential customers. The brand users of cluster 3 indeed succeeded in promoting
themselves in Foursquare: they are very influential in the system, receiving a large
number of likes and to-dos. Interestingly, we also found 4 brand users among the 127
users sampled from cluster 0 who were labeled as spammers.
We also analyzed the words commonly used by users labeled as spammers in their
tips, contrasting them with the vocabulary of the remaining users. Figure 4.10 shows
the word clouds of both user groups. Note that words related to service or product
1Recall that, given our data collection methodology for the dataset 1, our analysis is based only
on a subset of all tips posted by each user.
2We note that most of these users receive only a couple of likes and to-dos in each tip. How-
ever, because they posted a large number of tips at various venues, collectively, those tips ended up
attracting a large number of users.
56
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
Table 4.5: Results of the Manual Inspection of a Sample of Users from Each Cluster.
Cluster Number of Sampled Users Number of Spammers Number of Brands
0 127 127 (100%) 4 (3.2%)
1 181 13 (7.2%) 2 (1.1%)
2 237 37 (15.6%) 7 (2.9%)
3 203 41 (20.2%) 58 (28.6%)
(a) Users Labeled as Spammers. (b) Users Labeled as Not Spammers.
Figure 4.10: Words Commonly Used in Users’ Tips.
advertisement, such as business, apartment, cellular, franchise, gadget and, iphone are
more frequently used by users labeled as spammers.
Finally, we note that the concept of spamming in Foursquare is a very subjective
and thus controversial matter, possibly even more than in other systems. What one
classifies as spam, another person might interpret as creative marketing strategy. Take
for instance the case, reported in Catalyst Marketers Blog [2010], of a tip advertising
a product sold by a certain venue vi, posted at another venue vj, which is indeed a
business competitor of vi. When customers check the tips left at vj, they will see the
advertisement and might indeed be drawn towards its competitor. In other words,
the tip might contribute to steal business away from the venue at which it was left1.
Moreover, obliviously of any marketing game, users might find the tip interesting, and
mark it as a to-do or as liked. This might happen even for tips advertising products
completely unrelated to the tipped venues, as discussed above. Thus, identifying and
dealing with tip spamming and spammers is a hard task. Some recent efforts to address
this problem were reported by Costa et al. [2013]; Aggarwal et al. [2013].
In this section, we observed the presence of four user profiles defined based on
their tipping activity. In the next section, we investigate the properties of the network
built from user interactions through tips.
1During our manual inspection, we took a conservative approach, and did not consider such cases
as spamming.
4.3. User Influence 57
4.3 User Influence
We now further focus on user interactions established among users through the use of
tips and likes. Recall that user influence is important for estimating the popularity
of the tips posted by the user. To that end, we present an empirical analysis of user
influence patterns on Foursquare. We start analyzing influence as the number of likes
and to-dos received by the tips of each user. Focusing on the geographical location
of the venue where the tips were posted, we present a preliminary analysis about how
influence is geographically distributed (Section 4.3.1). Next, we present other ways
to measure influence, defining a graph composed by users who interact to each other
either posting tips or marking them as like (Section 4.3.2). Thus, we proposed a method
based on PageRank to rank influential users in Section 4.3.3. In this analysis we used
the dataset 2.
4.3.1 User Behavioral Patterns
Figure 4.11a shows the number of tipped venues versus the total number of likes and
to-dos received by each user. Note the logarithm scale in both axes. As in the pre-
vious sections, we group users into those with diameter shorter than or larger than
40 kilometers. Figure 4.11b shows a similar graph, plotting the number of tips versus
the total number of likes and to-dos of each user. To improve graph readability, both
figures only show users with at least 10 tips. Note that, given the strong correlation
between number of tips and number of tipped venues per user, the two graphs are very
similar.
100 101 102 103
100
102
104
106
Number of Venues
N
um
be
r o
f L
ik
es
 +
 T
o−
Do
s
 
 
User 4
User 2
User 3
User 1
Diameter > 40 km
Diameter ≤ 40 km
(a) Number of Venues vs. Number of Likes and
To-Dos.
101 102 103 104
100
102
104
106
Number of Tips
N
um
be
r o
f L
ik
es
 +
 T
o−
Do
s
 
 
User 1
Diameter > 40 km
Diameter ≤ 40 km
(b) Number of Tips vs. Number of Likes and
To-Dos.
Figure 4.11: Correlation between User Attributes (only users with at least 10 tips).
58
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
Both graphs show the existence of different classes of users. On one hand, we find
users who, despite posting a large number of tips and/or posting tips at a large number
of venues, receive only a comparatively small number of likes and to-dos. One such
user, marked as “User 1” in both graphs, posted approximately 104 tips at 100 different
venues, but received only 4 to-dos and likes. By manually inspecting a sample of those
users, located in the right bottom corner of the graphs, we found that they tend to be
ordinary users who, in spite of being very active in the system and contributing with
a lot of tips regarding many different venues, do not get a lot of feedback from others.
In contrast, the right top corner of the plots show users who not only post a large
number of tips at a large number of different venues but also receive a lot of feedback
on them. Those users, many of which are famous brands such as “Bravo”, “History
Channel”, “The Wall Street Journal” and “National Post”, seem engaged in providing
many recommendations (tips) on a large variety of venues, and clearly succeed in
reaching and attracting the attention of many users. Clearly, those users are very
influential in the system. Note that, interestingly, the graphs show users with large
numbers of tips (and tipped venues) as well as large numbers of likes and to-dos in
both user sets, suggesting the presence of local and global activity.
Both graphs also show some users who, despite posting only a few tips at a few
venues, receive a comparatively very large amount of feedback, and thus can also be
considered very influential. Examples are “User 2” and “User 3” in Figure 4.11a, who
received 2708 and 1337 likes and to-dos, respectively, despite targeting only a couple
of venues with a few tips. Some of these highly focused influential users are brands,
such as “Six Flags”, while others are ordinary users (e.g., “User 4”). Once again, we
found focused users (i.e. user who post tips at the same venues) with strong activity
both locally and globally.
As a side note, we point out that, as expected, both graphs show a slight trend
for users with larger numbers of tips and tipped venues also having larger diameter.
Perhaps more interesting is the presence of some very active users, with tens to hun-
dreds of tips and tipped venues, who are focused on venues in a local region (diameter
under 40 kilometers).
4.3.2 User Influence Network
We now further focus on user influence analyzing the interactions established among
users through the use of tips and likes1. To that end, we focus on venues of each category
separately, and build a user network representing the relationships established when a
1We disregard to-do marks here.
4.3. User Influence 59
user likes a tip posted by another user at a venue of a specific category. More precisely,
we build a directed weighted graph where each node represents a user, and an arc from
a node ui to a node uj indicates that ui liked a tip posted by uj. The weight of an arc
indicates the number of likes from ui to tips posted by uj at venues of the category. We
also build a user network considering tips and likes at venues of all categories, referring
to it as General.
Table 4.6 summarizes the main characteristics of the graph built for each venue
category and for the general graph. Some basic graph properties such as the number
of nodes, i.e. the number of users who received or gave at least one like, as well as
the number of arcs in each graph are reported. The table also shows the average
node degree, the number of reciprocal arcs (two users who liked each other’s tips),
and the size of the largest strong connected component (SCC) for each graph. The
SCC represents the set of nodes that can be reached from any other, following the
arcs from the user who liked a tip to the tip’s author. A larger SCC indicates the
presence of a community where users interact posting and liking each others’ tips.
Complementary data is also shown in Figure 4.12, which provides the indegree and
outdegree distributions for the user network built for each venue category. The degree
distribution is a function describing the number of users in the network with a given
degree (number of neighbors).
Table 4.6: Summary Statistics for User Influence Networks per Venue Category as well
as for All Categories (General).
Category # Nodes # Arcs Avg degree Reciprocal arcs SCC
General 1,143,914 2,283,949 2.0 61,233 (2.75%) 147,282
Arts 177,876 222,952 1.25 1,288 (0.60%) 116
Education 95,274 94,608 0.99 1,766 (1.90%) 16
Food 650,543 964,012 1.48 17,497 (1.85%) 34,117
Outdoors 140,479 140,559 1.00 979 (0.70%) 28
Nightlife 253,631 309,081 1.22 5,421 (1.79%) 1,559
Professional 221,091 199,578 0.90 4,104 (2.10%) 26
Residences 138,320 98,319 0.71 5,336 (5.74%) 9
Shops 321,049 362,310 1.00 4,094 (1.14%) 31
Travel 186,970 223,881 1.20 1,451 (0.65%) 361
As we observe in Figure 4.1b, a few users receive a proportionally large number of
likes in their tips (high indegree) and a few users give a significant number of likes (high
outdegree). Although all categories exhibit heavy tailed distributions, we see that the
Food graph is the largest graph in terms of nodes, arcs and node average degree, which
shows that this category has the highest level of user activity in terms of likes. Indeed,
60
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
analyzing the indegree distributions, we observe that Food and Nightlife, which have
the distributions with the longest tails, are categories where the tip authors receive the
most amount of feedback, i.e., the largest number of likes in their tips. Moreover, the
outdegree distributions suggest that Foursquare users have more interest in tips from
the Entertainment and Food categories.
On the contrast, the Residences category has one of the smallest graphs, con-
firmed by the shorter tails of its indegree and outdegree distributions (Figure 4.12) and
a small numbers of nodes and arcs. This reflects the lower user activity in this category,
and can be explained by the fact that the venues in such category represent private
locations, usually the user residence (its geographical location is only revealed to the
user’s friends). Even though the tips are publicly available to all users, the contents
of the tips in this category may not be of great interest to other users, which is also
reflected by the small size of the strong connected component and the relatively large
fraction of reciprocal arcs.
100 101 102 103 104 105
Degree of node
10-6
10-5
10-4
10-3
10-2
10-1
100
(P
 >
 x
)
Entertainment
Travel
Outdoors
Nightlife
Food
Shops
Professional
Education
Residences
(a) Indegree
100 101 102 103 104
Degree of node
10-6
10-5
10-4
10-3
10-2
10-1
100
(P
 >
 x
)
Food
Entertainment
Nightlife
Travel
Shops
Education
Outdoors
Professional
Residences
(b) Outdegree
Figure 4.12: Degree Distribution of the User Network (log scale).
4.3.3 Measuring User Influence
The most simple and intuitive approach to estimate user influence on Foursquare based
on tipping activity is to compute the number of likes received by each user. By using
this strategy, we would consider a user who received 100 likes on a single tip as influen-
tial as a user who posted 100 tips and received one like in each. One could argue that
the former should be considered more influential than the latter. An alternative ap-
4.3. User Influence 61
proach is PageRank, a method used in other contexts to identify experts of influential
users, that will be explained in Section 4.3.3.1.
We also propose a new method based on PageRank that addresses the shortcom-
ing of the number of likes approach (baseline) and traditional Page Rank, by considering
both the structure of the network and the number of tips posted by each user. We refer
to this method as Normalized PageRank (Section 4.3.3.2). We compare our proposed
approach against these two strategies (number of likes and traditional PageRank) using
the graphs for each venue category in Section 4.3.3.3.
4.3.3.1 Measuring User Influence using PageRank
PageRank is a link analysis algorithm proposed by Page et al. [1998] for ranking web
pages. The ranking system, based on a random walk algorithm, evaluates the prob-
ability of finding a random surfer on any given page. The algorithm assumes that
the presence of a hyperlink from page i to page j is an evidence of the importance of
page j. In addition, this importance is determined by the importance of i itself, and is
inversely proportional to the number of pages i points to.
Intuitively, we can interpret the distribution of PageRank values in terms of a
random walk [Henzinger et al., 1999]. Let’s consider the case of a network representing
web pages connected by hyperlinks. The PageRank value of a page can be interpreted
as the fraction of time a random surfer would spend visiting the page by interactively
following links from page to page [Zhang et al., 2007]. In other words, if a surfer visits
page (node) i, the random walk is in state i. At each step, the Web surfer either follows
a link chosen uniformly at random from those on the current page with probability α
(damping factor), or the web surfer jumps (teleport) to any other page on the Web
chosen uniformly at random (with probability 1− α).
By regarding pages as nodes and hyperlinks as arcs between nodes, Equation 4.2
computes the PageRank P (j) of a node j, which belongs to a network of size N :
P (j) =
(1− α)
N
+ α
∑
i∈M(j)
P (i)
Ni
(4.2)
where, M(j) is the set of nodes that have an direct arc to j, P (i) and P (j) are the
PageRank values of nodes i and j respectively, and Ni is the number of edges coming
out of i. The PageRank formula consists of two components weighted by the damping
factor α (0 ≤ α ≤ 1), usually set to 0.851. The first component 1
N
represents the
probability that a surfer will jump to j from any other random node from the network,
1We tried various values around 0.85, but it did not make significant difference.
62
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
while the second component
∑
i∈M(j)
P (i)
Ni
models the contribution of incoming arcs to
node j normalized by the number or incoming arcs ( 1
Ni
).
The key idea of PageRank is to allow propagation of influence along the network
of web pages, instead of simply counting the number of web pages pointing at the
web page. This algorithm has proven to be a useful tool for ranking nodes in a graph
in many contexts, such as the identification of potential experts in specialized forums
[Zhang et al., 2007], influential users on Twitter [Haveliwala, 2002; Weng et al., 2010;
Kwak et al., 2010] and, spam detection [Gyöngyi et al., 2004].
In our context, PageRank can be applied to identify influential users in terms
of their tipping activity, using the same user network defined in Section 4.3.2. As
already stated, PageRank computes the influence of a user using the direct and indirect
influence of all other users by propagating the individual influences over the network.
Thus, a user has a high probability to be influential if an influential user has liked
his tip. A drawback of this approach, in the specific case of Foursquare, is that the
number of tips posted by the user is not considered. For instance, if a user got 100
likes in 10 tips while another user got the same number of likes in 200 tips, the method
would favor the latter. Once again, one might argue that an influential user should
consistently produce popular content, i.e., their influence should hold across all their
tips.
Therefore, we propose an alternative strategy based on the original PageRank
algorithm, which we call Normalized PageRank, described in the next section. Our
algorithm takes into account: the number of likes received, the number of tips posted
by the user, and the PageRank of the users who liked the tips of this user. We assume
that a user should be considered influential if she has a large average number of likes
in her tips and receives likes from other influential users.
4.3.3.2 Normalized Page Rank
The Normalized PageRank proposed here differs from the traditional PageRank by
taking the number of tips posted by the user into account. We weigh the arcs by how
frequently one user likes tips from another. Thus, for the Normalized PageRank, the
weight I of each arc (i, j) is weighted by the number of tips posted by user j that
received a like from user i:
I(i, j) =
w(i, j)
t(j)
(4.3)
where w(i, j) is the number of likes given by user i in tips posted by user j, and t(j) is
the total number of tips posted by j. This definition captures and quantifies the success
4.3. User Influence 63
of the tips posted by the user j in terms of the amount of feedback (likes) received
from others. Therefore, the larger the number of tips posted by j are marked as liked
by i, the greater is the influence of j on i. The normalized PageRank is computed as
follows:
P (j) =
(1− α)
2
 1
N
+
t(j)∑
k∈V
t(k)
+ α ∑
i∈M(j)
I(i, j) ∗ P (i) (4.4)
where α is the damping factor; N and V are, respectively, the total number and the
set of users represented in the graph, and M(j) is the set of arcs pointing to j, i.e. the
set of users who liked any tip posted by j.
As in Equation 4.2, the rank score P (j) of a user is composed of a part correspond-
ing to the rank contribution from the other users linked to j and a part corresponding
to the probability of a random jump to j from any other user represented in the net-
work. The probability of selecting any user during a random jump is also divided into
two components: the first one ( 1
N
) has the same goal of the traditional PageRank,
and the second part makes the probability of random jump to j proportional to the
number of tips posted by j, i.e., t(j). The factor 1∑
k∈V
t(k)
normalizes this probability by
the total number of tips used to built the entire graph. Thus, this part of the formula
makes that users who publish many tips have a higher probability to get visited by the
random surfer. Finally, the factor I(i, j) defined in Equation 4.3 weighs the PageRank
P (u) of each user u that likes at least one tip posted by v.
4.3.3.3 Experimental Results
In the previous section, we proposed a novel ranking model, Normalized PageRank,
describing the modifications performed on the traditional PageRank to capture the
number of tips posted by each user and the frequency that a user likes a tip posted
by another user. Any measure of influence is necessarily subjective [Romero et al.,
2011]. Thus, there is no clear ground truth of what should be the ranking of users
by influence. Thus, as in previous work focused on social influence [Zhang et al.,
2007; Cha et al., 2010; Weng et al., 2010], we assess the performance of our proposed
method by comparing it against other algorithms: a baseline method based solely on
the number of likes received, and against the traditional Page Rank method (defined
in Section 4.3.3.1). Our goal here is not to point out which is the best method, but
show differences in the three approaches.
Moreover, as in previous studies [Zhang et al., 2007; Weng et al., 2010], our main
64
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
evaluation metric is the correlation between the rank lists generated by the different
methods, measured by the Kendall τ coefficient [Kendall and Gibbons, 1990]. This
coefficient is in the range of −1 ≤ τ ≤ 1: τ = 1 means that the two lists are exact the
same whereas τ = −1 implies that one list is the reverse of the other. Table 4.7 lists
the τ values between the rank lists generated by each pair of methods, whereas Table
4.8 lists the top-5 most influential Foursquare users, using each of the graphs described
in Table 4.6, according to each method.
Table 4.7: Kendall τ Correlation Values Between Rankings Lists.
Category # likes vs. # likes vs. PageRank vs.
PageRank Normalized PageRank Normalized PageRank
General -0.0048 0.0985 0.5041
Entertainment -0.0421 -0.0819 0.5100
Education -0.0086 -0.0617 0.5412
Food -0.0371 0.0220 0.4859
Outdoors -0.0861 -0.1503 0.4758
Nightlife -0.0253 -0.0349 0.5134
Professional -0.1236 -0.1793 0.4654
Residences -0.3187 -0.3101 0.4296
Shops -0.0702 -0.0829 0.4637
Travel -0.0677 -0.0967 0.4711
4.3. User Influence 65
Ta
bl
e
4.
8:
To
p-
5
M
os
t
In
flu
en
ti
al
U
se
rs
O
ve
ra
ll
an
d
pe
r
V
en
ue
C
at
eg
or
y
A
cc
or
di
ng
to
E
ac
h
M
et
ho
d.
C
at
eg
or
y
N
u
m
b
er
of
li
ke
s
P
ag
eR
an
k
N
or
m
al
iz
ed
P
ag
eR
an
k
G
en
er
al
H
is
to
ry
C
ha
nn
el
,B
ra
vo
,
H
is
to
ry
C
ha
nn
el
,B
ra
vo
,M
T
V
,
H
is
to
ry
C
ha
nn
el
,B
ra
vo
,M
T
V
,
M
T
V
,W
al
lS
tr
ee
t
J.
,V
is
it
PA
W
al
lS
tr
ee
t
J.
,Z
ag
at
W
al
lS
tr
ee
t
J.
,Z
ag
at
E
nt
er
ta
in
m
en
t
H
is
to
ry
C
ha
nn
el
,E
xp
lo
re
C
hi
ca
go
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
V
is
it
PA
,W
al
lS
tr
ee
t
J.
,N
H
L
N
H
L,
V
is
it
PA
,E
xp
lo
re
C
hi
ca
go
La
V
id
au ,
T
LC
,E
xp
lo
re
C
hi
ca
go
E
d
u
ca
ti
on
A
ri
zo
na
St
at
e
U
,M
iz
zo
u,
H
is
to
ry
C
ha
nn
el
,S
po
rt
s
A
ut
ho
ri
ty
,
bo
ok
re
nt
er
.c
om
,N
or
th
ea
st
er
n
U
.,
U
W
-M
ad
is
on
,C
al
,S
ta
nf
or
d
U
M
iz
zo
u,
St
an
fo
rd
U
.,
G
ra
ph
ic
M
as
te
r
c
Sp
or
ts
A
ut
ho
ri
ty
,M
iz
zo
u,
ol
d
m
ai
nu
Fo
od
B
ra
vo
,Z
ag
at
,E
at
er
.c
om
,
B
ra
vo
,Z
ag
at
,T
hr
ill
is
t,
B
ra
vo
,F
oo
ds
po
tt
in
gc ,
Za
ga
t,
W
al
lS
tr
ee
t
J.
,T
hr
ill
is
t
V
is
it
PA
,W
al
lS
tr
ee
t
J
B
ri
tt
r.
u ,W
al
lS
tr
ee
t
J.
O
u
td
oo
rs
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J,
E
xp
lo
re
C
hi
ca
go
,W
in
do
w
s
Li
ve
P.
G
.,
E
xp
lo
re
C
hi
ca
go
,B
ra
vo
,
E
xp
lo
re
C
hi
ca
go
,H
ea
lt
hy
P
oo
ls
u
V
is
it
PA
W
in
do
w
s
Li
ve
P.
G
.
N
ig
ht
li
fe
M
T
V
,B
ra
vo
,L
og
oT
V
,
M
T
V
,B
ra
vo
,T
hr
ill
is
t,
B
ra
vo
,M
T
V
,T
hr
ill
is
t,
T
hr
ill
is
t,
H
is
to
ry
C
ha
nn
el
H
is
to
ry
C
ha
nn
el
,L
og
oT
V
H
is
to
ry
C
ha
nn
el
,L
og
o
T
V
P
ro
fe
ss
io
n
al
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,
E
xp
lo
re
C
hi
ca
go
,N
at
io
na
lP
os
t,
E
xp
lo
re
C
hi
ca
go
,V
is
it
PA
,
or
eg
on
vo
te
s.
or
gu ,
U
SF
A
th
et
ic
s
u ,
H
uff
po
st
G
ra
ph
ic
M
as
te
r
c
W
al
lS
tr
ee
t
J.
,L
ia
m
B
.u
R
es
id
en
ce
s
H
is
to
ry
C
ha
nn
el
,G
re
en
sb
or
o
N
C
u ,
H
is
to
ry
C
ha
nn
el
,H
uff
po
st
,L
ee
c ,
D
av
id
K
.u ,
B
et
h
D
.u ,
Le
ec
,
H
uff
po
st
,L
ee
c ,M
iz
zo
u
M
iz
zo
u,
Su
pe
rm
od
el
m
e
sa
ra
h
w
.u ,
ca
pt
ur
e
th
e
m
ar
ke
t
u
S
h
op
s
B
ra
vo
,V
is
it
PA
,G
ra
ph
ic
M
as
te
rc ,
B
ra
vo
,V
is
it
PA
,G
ra
ph
ic
M
as
te
rc ,
B
ra
vo
,A
ba
g’
s
lif
eu ,
A
T
&
T
,
M
az
da
,(
re
d)
H
is
to
ry
C
ha
nn
el
,M
az
da
ce
w
u ,V
is
it
PA
T
ra
ve
l
H
is
to
ry
C
ha
nn
el
,N
at
io
na
lP
os
t,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
,
H
is
to
ry
C
ha
nn
el
,W
al
lS
tr
ee
t
J.
W
al
lS
tr
ee
t
J.
,N
at
io
na
lP
os
t,
K
LM
B
ra
vo
,K
LM
,E
xp
lo
re
C
hi
ca
go
K
LM
,A
T
&
T
,T
ub
us
u
c
ce
le
br
it
y
us
er
u
or
di
na
ry
us
er
66
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
We note that the baseline method generates a ranked list different from those
generated by the other two methods since the values of τ are very far from 1 for all
graphs. It is observed that the Normalized PageRank has a higher agreement with the
traditional PageRank than with the baseline method. Considering the rankings built
for users across all categories (general), four of the five most influential users appear
in the three rankings (see Table 4.8). However, the two strategies based on PageRank
are able to identify one user (Zagat) that was not listed as one of the top-5 users by
the baseline.
Moreover, some of the top-5 most influential users across all categories do not
appear in the top-5 of some venue categories which suggests that some of them are
more influential in certain categories. For instance, the MTV user was identified by
the three methods as top influential in the Nightlife category. We also observe that
different methods identify different users as top-5 in some categories. For example,
the Foodspotting user is among the top-5 most influential users in the Food category
listed only by the Normalized PageRank method, while Greensboro NC (Residences)
and History Channel (Shops) are listed only by the baseline and traditional PageRank
approaches lists, respectively. We also note that the differences between the top-5
most influential users identified for each method are larger for categories with smaller
number of users who post tips (e.g., Education, Residences, and Professional)
We also observe that most of the top-5 influential users in any list are composed by
brand users. This is expected since these users are engaged to promote their businesses,
providing many tips on a large variety of venues. Their presence on top positions of
the ranks suggests that they are succeeding in attracting attention to the their posted
content. In addition, we note that some methods are able to identify some ordinary
users as influential in some of the Foursquare categories. For instance, the Greensboro
NC user is listed by the baseline method among the top-5 most influential users in the
Residences category.
Our results show that there are differences between the general ranking lists
(all categories) and the lists generated for each venue category. Thus, users that are
influential in general, are not as influential in certain categories. This observation
is important for the task of automatically predicting the tip’s popularity, since a tip
posted by a user in a venue category which he is an influential/expert has more potential
to become popular. Moreover, while most of the top-5 listed users are brand users, the
PageRank based approaches are able to identify ordinary users and celebrities as top
influentials in some categories.
Finally, the proposed method offers other few contributions that are not entirely
related to our scope of popularity prediction. The ranking generated by the baseline is
4.4. Dynamics of Tip Popularity Evolution 67
more susceptible to the attack of malicious users than the PageRank based approaches,
since malicious users can easily create many identities to inflate the number of likes
received by a user. Even though PageRank is more robust to attacks than the baseline,
there are studies [Du et al., 2007; Adal et al., 2012] that describe methods for artificially
boosting pageranks scores (e.g., link farms).
Moreover, the presented techniques can be exploited to improve search and con-
tent recommendation services (e.g., by prioritizing content posted by influential users),
as well as detection of the malicious users. Foursquare has a limited number of moder-
ators who are responsible for filtering malicious activities. The Normalized PageRank
method minimizes the recommendation of spammers since our method takes into ac-
count the user relative influence and the amount of feedback received from other users.
We believe that this methodology can be used on other social networks where in-
fluence is measured based on the interaction between users (e.g., recommendation of
posts on Facebook, photos on Instagram, etc.). As future work, a interesting direction
is the comparison with another link-based ranking algorithm, for example the HITS
method [Kleinberg, 1999].
4.4 Dynamics of Tip Popularity Evolution
In this section, we analyze the dynamics of tip popularity in Foursquare. We start
by discussing how the number of likes of a tip evolves over time (Section 4.4.1), and
how it is affected by the social network of the tip’s author (Section 4.4.2). We then
analyze tip popularity at and around the peak (Section 4.4.3), and assess to which
extent the rich-get-richer phenomenon is present in the popularity evolution of tips
(Section 4.4.4). For these analyzes, we used the dataset 2.
For the sake of analyzing tip popularity dynamics, we group tips with at least
one like by breaking their popularity distribution (Figure 4.6a) into 10 slices, each
one containing tips whose popularity fall into a certain range of the distribution1. For
example, slice 0-10% contains the top-10% most popular tips, while slice 10%-20%
contains the tips whose popularity fall between the 10th and 20th percentile of the
popularity distribution. This partitioning is the same used in van Zwol [2007] for
analyzing Flickr photos, since it is more balanced and less biased towards the more
popular tips. Table 4.9 shows the number tips as well as total number of likes per slice.
We also examine the fraction of likes coming from the social network (friends
and followers) of the user who posted the tip (i.e., the tip’s author). Table 4.9 shows
1Note that we exclude tips with no likes from these slices.
68
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
Table 4.9: Distribution of Likes for Groups of Tips
Slice # of Tips Total # of Likes % Social Likes Group
0-10% 23,746 202,804 30.8% G1
10-20% 23,746 72,824 48.4% G2
20-30% 23,746 47,492 49.0% G3
30-40% 23,746 47,492 49.0% G3
40-50% 23,746 24,163 48.2% G4
50-60% 23,746 23,746 49.1% G4
60-70% 23,746 23,746 48.5% G4
70-80% 23,746 23,746 48.2% G4
80-90% 23,746 23,746 48.5% G4
90-100% 23,750 23,750 48.4% G4
the percentages of likes coming from the social network, referred to as social likes, for
tips in each slice. We note that for all slices but the first one, almost half of the likes
received by tips come from the user’s social network, highlighting the importance of
friends and followers to the popularity of those tips. In contrast, for the most popular
tips, the fraction of social likes is smaller (31%), suggesting that most likes probably
come from venue visitors. We further analyze the importance of the social network to
tip popularity in Section 4.4.2.
1h 3h 6h 12h24h48h 1w 1m 2m 6m0
20
40
60
80
100
%
 o
f 
u
n
iq
u
e 
ti
p
s 
m
ar
ke
d G1
G2
G3
G4
(a) Fraction of tips that received at least one like
1h 3h 6h 12h24h48h 1w 1m 2m 6m0
20
40
60
80
100
%
 o
f 
to
ta
l l
ik
es
G1
G2
G3
G4
(b) Fraction of total likes
Figure 4.13: Distribution of Tip Popularity over Time.
We aggregate the slices into 4 major groups, as shown in Table 4.9. Groups 3
and 4 contain tips that received, on average, 2 and 1 likes, respectively. We analyze
tip popularity separately for each slice. However, as the same conclusions hold for tips
in different slices of the same group, we present results for each group only.
4.4. Dynamics of Tip Popularity Evolution 69
4.4.1 Popularity Evolution
We start by analyzing how the popularity of tips in each group of slices defined in Table
4.9 evolves over time. We focus on the first six months after the tip is posted. Figure
4.13a plots the fraction of unique tips in each group that received at least one like
within the first x hours (h), weeks (w) or months (m) after posting time. We observe
that within the first 48 hours, 29% of the tips in the most popular group (G1) had
already received at least one like, while in one and two months this fraction grows up to
80% and 92%, respectively. That is, 20% of the top-10% most popular tips take more
than one month to attract their first likes. This slow popularity evolution is even more
clear for tips in the other (less popular) groups. Figure 4.13b shows the cumulative
fraction of the total number of likes (as observed in our dataset) received by tips in
each group over time. Note that, for all four groups, between 41% and 48% of the likes
are received after 2 months since posting time.
Thus, in general, tips tend to live long in the system, presenting a gradual increase
of interest. Indeed, tip popularity evolves much more slowly compared to other types
of content, even for tips that end up becoming very popular. For example, news articles
have a very short lifespan [Tatar et al., 2014] acquiring all comments within the first
day of publication, while a large fraction of views of Flickr photos are generated within
the first two days after upload [van Zwol, 2007]. In contrast, we here find a significant
fraction of tips that can take quite a few months to attract likes and become popular.
This longer lifecycle was also observed in the acquisition of fans by Flickr photos [Cha
et al., 2009].
We further analyze the popularity evolution of tips in each group by showing in
Figure 4.14 the curves of the 10th and 90th percentiles as well as the median of number
of likes over time during the first one month since the tip was posted. For all groups,
the 10th percentile curve is equal to zero through the whole period, implying that 10%
of the tips in each group did not receive any like within the first month in the system.
Around half of the most popular tips (G1) starts receiving likes after 7 days since
posting time, achieving only 20% of the total likes after a month. For the second most
popular group (G2), we note that half of the tips start receiving likes after 15 days.
In contrast, tips in group G3 and G4 take more than 20 and 30 days, respectively, to
start attracting likes.
We also analyze the amount of time it takes for a tip to receive at least X% of
their total likes, for X equal to 10, 50, 70, 90 and 100%. Figure 4.15 shows those
distributions for the most popular tips (G1). Note that 57% of the tips in this group
take at least 2 (3) months to reach 50% (70%) of its total observed popularity. In sum,
70
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
0 5 10 15 20 25 30
Days since tip postage
0.0
0.2
0.4
0.6
0.8
1.0
C
u
m
u
la
ti
ve
 %
 L
ik
es
Median
10th percentil
90th percentil
(a) G1
0 5 10 15 20 25 30
Days since tip postage
0.0
0.2
0.4
0.6
0.8
1.0
C
u
m
u
la
ti
ve
 %
 L
ik
es
Median
10th percentil
90th percentil
(b) G2
0 5 10 15 20 25 30
Days since tip postage
0.0
0.2
0.4
0.6
0.8
1.0
C
u
m
u
la
ti
ve
 %
 L
ik
es
Median
10th percentil
90th percentil
(c) G3
0 5 10 15 20 25 30
Days since tip postage
0.0
0.2
0.4
0.6
0.8
1.0
C
u
m
u
la
ti
ve
 %
 L
ik
es
Median
10th percentil
90th percentil
(d) G4
Figure 4.14: Distribution of Percentage of Likes Received During the First Month after
Posting Time.
many tips do take a few months to attract likes, even those that end up being the most
popular ones.
The great variability in popularity evolution across tips, allied to the somewhat
slow popularity dynamics observed in the figures above, motivate the use of prediction
methods that exploit early popularity measurements. Yet, they also raise a question as
to whether the joint use of other features along with such measurements can improve
prediction accuracy over exploiting only the latter (as in Szabo and Huberman [2010];
Pinto et al. [2013]). Similarly, the slow popularity evolution also raises a question as
to how robust the prediction models are to long-term predictions. We address these
questions in Chapter 6.4.
4.4. Dynamics of Tip Popularity Evolution 71
0 5 10 15 20
Number of months x
0.0
0.2
0.4
0.6
0.8
1.0
P
(X
∙
x)
10%
50%
70%
90%
100%
Figure 4.15: Distribution of Time Until x% of Total Likes are Received for the Most
Popular Tips (G1)
4.4.2 The Role of the Social Network
The popularity evolution of a tip is directly related to how users find the tip: either by
visiting the venue page or through activity notifications from their friends and followees.
Thus, the number of likes received by a tip depends on a combination of its visibility
and interest by the social network of the tip’s author and by others.
Recall that in Section 4.1.1, we briefly analyzed the role of social network, con-
cluding that the network influences the tip popularity, but there we focused on ag-
gregate popularity. Now, we assess the role of social network of the tip’s author on
how the tip popularity evolves over time for tip with different popularity groups. To
that end, we revisit Figure 4.13b by separating likes coming from the author’s social
network (social likes) and likes coming from other users (non-social likes). Figure 4.16
shows the cumulative fraction of likes, in both categories, for tips in each group. Note
that the author’s social network has an important influence on the popularity of a tip
throughout its lifetime: at least half of all likes received in any period of time (up to 6
months since posting) come from the author’s social network, for tips in all four groups.
This fraction is higher in the earlier periods after posting time, and tends to decrease
with time as the tip becomes visible to other users (e.g., venue visitors). For example,
the social likes correspond to 62% of all likes received by the most popular tips (G1)
in the first hour since posting time, decreasing to 54% after 6 hours. Interestingly, the
social network seems to have an even more important role for the least popular tips.
For example, for tips in G2, G3 and G4, the social likes correspond to more than 70%
of all likes received by a tip in the first week in the system.
These results indicate that the social network of a tip’s author may be respon-
sible for boosting its popularity, particularly during early periods after posting. As
72
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
1h 3h 6h 12h24h48h 1w 1m 2m 6m
0
20
40
60
80
100
%
 o
f 
to
ta
l l
ik
es
social
non social
(a) G1
1h 3h 6h 12h24h48h 1w 1m 2m 6m
0
20
40
60
80
100
%
 o
f 
to
ta
l l
ik
es
social
non social
(b) G2
1h 3h 6h 12h24h48h 1w 1m 2m 6m
0
20
40
60
80
100
%
 o
f 
to
ta
l l
ik
es
social
non social
(c) G3
1h 3h 6h 12h24h48h 1w 1m 2m 6m
0
20
40
60
80
100
%
 o
f 
to
ta
l l
ik
es
social
non social
(d) G4
Figure 4.16: Social vs. Non Social Likes: Distribution of Percentage of Likes Received
over Time.
consequence, they also suggest that it might be possible for a recently posted tip to
become more popular than other tips that had already attracted many likes and thus
gained visibility in the system.
4.4.3 Popularity Peak
We further analyze tip popularity evolution by focusing on the popularity peak. Con-
sidering the daily popularity time series of each tip, we define the peak kpi of tip pi as
the largest number of likes received by pi on any single day. We then compute the time
(in number of days) it takes for pi to reach its popularity peak1. We also measure the
1In case of ties, we pick the first day with kpi likes.
4.4. Dynamics of Tip Popularity Evolution 73
fraction of the total likes pi received at, before and after the peak. For this analysis,
we focus on the most popular tips (G1).
Figure 4.17a shows the cumulative distribution of the time until the popularity
peak. Around 18% of the tips experience its popularity peak one day after posting
time, and around 72% of the tips reach their popularity peak within a month since
posting. This implies that most tips do not take too long (less than a month) to reach
its daily popularity peak. Yet, we observe that, for many tips, this peak represents only
a small fraction of the total observed popularity. This is illustrated in Figure 4.17b,
which presents the cumulative distributions of the median, 10th and 90th percentiles
of the fraction of likes received at and after the peak day. As a complement, Figure
4.17c shows the cumulative distribution of the fraction of likes received before the peak
day. Like observed for other types of online content (e.g., videos and news [Crane and
Sornette, 2008; Pinto et al., 2013; Tatar et al., 2014]), some tips do experience heavy
bursts of popularity on the peak day: for 10% of the tips, the daily peak corresponds
to at least 67% of their total popularity (see 90th percentile curve in Figure 4.17b).
However, for half of the tips (median curve), the peak corresponds to only 25%
of all likes. Moreover, Figure 4.17c shows that most tips (82%) receives their first like
in the peak day, and only a very small fraction of the tips (3.3%) receive more than
50% of the likes before the peak day. Thus, a large fraction of tips receive most of their
likes after the peak day, suggesting, once again, that tips experience a slow popularity
evolution.
Contrasting our findings to the acquisition of fans by Flickr photos [Cha et al.,
2009], we observe that both photo fans and tip likes tend to be acquired after a longer
period of time after posting/upload, compared to, for example, tweets. Also, as in
Cha et al. [2009], we do not observe an exponential growth on popularity as suggested
by existing models of information diffusion [Valente, 1995; Figueiredo et al., 2014a].
However, comparing our results (particularly Figure 4.14), with similar ones presented
in Cha et al. [2009], we find that tip popularity seems to increase even more slowly
than photo fans on Flickr. For example, we do not observe a period of steady linear
popularity growth during the first month, as observed for photos.
4.4.4 The Rich-Get-Richer Phenomenon
Most online systems offer their users the option to see different pieces of content (or
objects) sorted by their posting dates or by some estimate of their popularity. The
adopted strategy may have a direct impact on the visibility of different objects. For
example, by displaying objects sorted in decreasing order of popularity, a website may
74
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
0 50 100 150 200 250
# of days from posting until peak
0.0
0.2
0.4
0.6
0.8
1.0
P
(X
∙
x)
(a) Time Until Peak
0 5 10 15 20 25 30
Days after peak day
0.0
0.2
0.4
0.6
0.8
1.0
Fr
ac
ti
on
 o
f 
lik
es
Median
10th Percentil
90th Percentil
(b) % Likes At/After Peak
0.0 0.2 0.4 0.6 0.8 1.0
Fraction of likes before peak
0.0
0.2
0.4
0.6
0.8
1.0
P
(X
∙
x)
(c) % Likes Before Peak
Figure 4.17: Cumulative Distributions of Popularity Peak for Most Popular Tips (G1).
contribute to further increasing the popularity of an object that is already very popular,
a phenomenon that is known as rich-get-richer [Barabasi and Albert, 1999]. Indeed,
prior work has already suggested that popularity of some types of online content (e.g.,
YouTube videos) evolves according to this phenomenon [Borghol et al., 2012; Szabo
and Huberman, 2010].
Foursquare tips may be sorted by the number of likes (in increasing/decreasing
order) or by posting time, but only the former is available in the mobile application.
Thus, we here assess to which extent the rich-get-richer phenomenon can explain tip
popularity evolution.
The rich-get-richer, or preferential attachment, models define that the probability
of a tip pi experiencing an increase in popularity is directly proportional to pi’s current
popularity [Barabasi and Albert, 1999]. As in Borghol et al. [2012], we consider a
model where the probability that a tip pi with lpi likes receives a new like is a power
4.4. Dynamics of Tip Popularity Evolution 75
Table 4.10: Rich-get-Richer Analysis: Coefficients α (and 95% Confidence Intervals)
and R2 of Linear Regressions from (log) Popularity in tr to (log) Popularity tr + δ.
Tips in G1 All Tips
tr+δ tr α R2 α R2
1 month 1 day 0.763 ± 0.016 0.26 0.822 ± 0.006 0.21
1 month 1 wk 0.838 ± 0.009 0.57 0.887 ± 0.004 0.49
2 months 1 day 0.594 ± 0.017 0.17 0.673 ± 0.007 0.13
2 months 1 wk 0.681 ± 0.011 0.40 0.753 ± 0.004 0.31
2 months 1 mo 0.834 ± 0.006 0.74 0.856 ± 0.003 0.65
6 months 1 day 0.309 ± 0.015 0.07 0.397 ± 0.007 0.05
6 months 1 wk 0.394 ± 0.010 0.20 0.489 ± 0.005 0.16
6 months 1 mo 0.504 ± 0.008 0.40 0.562 ± 0.003 0.33
law, i.e., Prob(pi) ∝ lαpi .
We analyze the rich-get-richer effect using a univariate linear regression to observe
the impact of the number of likes of a tip after a monitoring time tr (predictor variable)
in the total number of likes of the tip at target time tr + δ (response variable), using
log-transformed data. The case of α=1 corresponds to a linear preferential selection
[Barabasi and Albert, 1999], and α > 1 implies in a case where the rich gets much
richer with time. The sublinear case (α < 1) results in a (stretched) exponential
popularity distribution, which reflects a much weaker presence of the rich-get-richer
effect [Krapivsky et al., 2000]. We perform this analysis separately for tips in each
popularity group as well as for all tips.
Table 4.10 shows the coefficients α (along with corresponding 95% confidence in-
tervals) and the coefficients of determination R2 of the univariate regressions performed
using various predictor and response variables (defined by different values of tr and δ),
for tips in G1 as well as for all tips. For all considered cases, we find α < 1, which
indicates an exponential popularity evolution that could result in a much less skewed
popularity distribution than suggested by the pure (linear) rich-get-richer dynamics.
This has also been observed for a set of different YouTube videos [Borghol et al., 2012],
although the values of α found in that case (0.93 on average) are much larger than
those we observed in all considered scenarios. This suggests that the rich-get-richer
effect might be weaker in Foursquare tips than in YouTube videos, even considering all
tips jointly. This also implies that other factors might strongly impact tip popularity.
Indeed, as discussed in Section 4.4.2, the social network of the tip’s author is responsi-
ble for a significant fraction of the likes received by the tip, and thus might contribute
to reduce the impact of the rich-get-richer effect.
The univariate regression model has also been proposed as a means to predict the
future popularity of YouTube videos and Digg stories [Szabo and Huberman, 2010].
76
Chapter 4. Tipping Activity on Foursquare: Characterization and
User Influence
This prediction strategy was motivated by a strong linear correlation observed between
the (log-transformed) popularity of objects and earlier measures of user accesses (also
log-transformed). For example, the authors observed Pearson linear correlations above
0.90 between the popularity of Digg stories measured at 1 hour and at 30 days after
upload as well as between the popularity of YouTube videos measured at 7 and 30
days after upload. These correlations are stronger than those observed for tips. For
example, the R2 value of the regression from popularity in 1 week to popularity in 1
month is only 0.57 (for tips in G1) and 0.49 (for all tips), which correspond to linear
correlations of 0.75 and 0.7, respectively1. For shorter monitoring periods tr or longer
values of δ, the R2 values are much lower, indicating that popularity at time tr can
only explain a small fraction of the total popularity acquired by the tip at tr + δ.
This result motivates the development of more sophisticated prediction models,
such as those proposed in Chapter 6.4, which exploit other factors (e.g., characteristics
of the user who posted the tip and the venue where it was posted) to estimate the
future popularity of a given tip.
4.5 Summary
In this chapter, we analyzed how Foursquare users behave when they interact with each
other using tips and likes. This comprehension is important to derive useful insights
about the design of prediction models as well as to guide us in interpreting prediction
results. Our analyses of some selected features, related to the three main entities
that are key to the tip popularity prediction problem (user, venue and tip’s content),
revealed that most of them exhibit very large variability, with great concentration on
few users, venues and tips. In particular, the very skewed distribution of number of
likes per tip brings extra challenges to the prediction task, as it leads to great imbalance
in the training data. In the next chapter, we try to minimize the detrimental impact
of such imbalance on prediction accuracy by employing undersampling in the training
data. Furthermore, we found that there are low correlations between the number of
tips posted by a user and the number of likes the user gives and/or receives, implying
that users who tip more do not necessarily receive or give more feedback on previous
tips.
Moreover, in Section 4.2, we identified four user profiles that differ in terms of their
levels of tipping activity in the system. Two profiles correspond to regular users who
differ in terms of their levels of tipping activity in the system. A third profile consists of
1The R2 is the square of the linear correlation between predictor and response variables.
4.5. Summary 77
users who seem engaged in posting tips at a large variety of venues. These users, some
of which are famous businesses and brands, typically receive a large amount of feedback
from others regarding their tips. Finally, we identified a group of users characterized
by posting tips containing links at many different venues. A manual inspection of a
sample of these users confirmed them as potential spammers, since they posted tips
that are unrelated to the venue. However, we also showed that some spammers do
succeed in attracting the attention of many users.
We also provided the first pieces of evidence of spamming activity in Foursquare.
Spam has been observed in many other online social systems, including Facebook [Gao
et al., 2010], YouTube [Benevenuto et al., 2009], and Twitter [Grier et al., 2010]. As a
result, a number of efforts towards designing effective strategies to detect and remove
spam from these systems are available [Costa et al., 2013; Aggarwal et al., 2013; Thomas
et al., 2011]. Although it is debatable whether the kind of spamming activity we
uncovered here corresponds to malicious/opportunistic acts that deserves punishment,
we hope that our analyses serve as motivation for future discussions on the matter.
In Section 4.3 we modeled user interactions through tips using a graph to identify
the most influential users. We proposed a variation of the traditional PageRank algo-
rithm, previously applied to this purpose [Zhang et al., 2007; Weng et al., 2010], which
is more adequate to the present context as it weighs the arcs by the number of tips
posted by each node (user). Our method performed very similar to PageRank: it was
able to identify some expert/influential users not indicated by the traditional method
in some of the venue categories. These findings suggest that the category of the venue
where a tip is posted must be taken into account in the tip popularity prediction task.
Finally, in Section 4.4, we analyzed the tip popularity dynamics. Although prior
work has tackled the popularity dynamics of various types of user generated content,
we are not aware of prior temporal analysis of online reviews. We found that most tips
have a slow popularity evolution, acquiring most of their likes after a few months, and
that the social network of the tip’s author plays an important role to draw attention to
the tip, particularly soon after posting time. We also found that most tips reach their
daily popularity peak within a month in the system, although most of their likes are
received after the peak. Moreover, compared to other types of content, we observed
a weaker presence of the rich-get-richer phenomenon, indicating a lower correlation
between the early and long-term popularities of the tip. This suggests that the tip
popularity prediction may require more sophisticated models, exploring other factors
related to the tip, besides their current popularity.
In the next chapter, we explore several features analyzed here to design models
to predict a tip’s future popularity.

Chapter 5
Predicting the Popularity Ranking
of a Set of Tips
In Chapter 4, we observed that tips have longer lifespans than other types of online
content (e.g, tweets, photos), and that tip popularity dynamics may be more strongly
influenced by factors other than simply their current popularity (e.g., social network).
We now further analyze this issue by assessing to which extent the relative popularity
of a set of Foursquare tips can be predicted using only their popularity at prediction
time, and to which extent the use of other attributes may improve prediction accuracy.
The popularity of a tip is estimated as the number of likes received by the tip, which
reflects the number of users who agreed with the tip’s author1. There are several
different prediction tasks depending mainly on the application domains and the user
interest.
In this chapter, we address the prediction as a ranking task, which aims at ranking
a group of tips based on their predicted popularity at a future time. Thus, we model
the prediction task as a ranking problem. The ranking of the most popular tips helps
to summarize a large set of tips focusing on the most popular ones for a scenario of
interest (e.g., a city, a venue), instead of looking at the tips individually. Another
way of tacking the popularity prediction problem is to predict the popularity level of
a single tip, formulating it as a classification task. We leave the discussion of the
classification task to Chapter 6. In both chapters, we use dataset 2 to evaluate our
proposed solutions.
We first formally define our prediction task (Section 5.1), present the ranking
strategies (Section 5.2), and the features used as input to them (Section 5.3). We
1The number of likes reflects only approximately the level of agreement with the tip’s content, as
some users may have chosen not to click on “Like" regardless of their opinions.
79
80 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
then discuss our experimental setup (Section 5.4) followed by our experimental results
(Section 5.5).
5.1 Popularity Prediction Task
Our goal in this chapter is to develop models to predict the ranking of a set Pd of
tips posted in the previous d time units (d ∈ (0,∞)) that meet a certain criterion c,
rank those tips according to their expected popularity, measured in terms of the total
number of likes they will receive up to time tr+δ, where tr is the time when the ranking
(i.e., prediction) is performed. Thus, δ defines the prediction window. Criterion c may
be, for example, tips posted at venues of a given city and/or category (e.g., “Food”), or
even at a given venue. An empty criterion implies in no further constraint on set P .
Figure 5.1 illustrates the prediction scenarios considered in our study.
 
 
Set of tips 
Monitor tips 
Time when ranking is performed 
tr  tr + δ 
time 
Figure 5.1: Monitoring Time Scheme.
As in other prediction tasks, the prediction model is learned using a training set,
which consists of a subset of tips along with associated information about the users who
posted them, the venues where they were posted and textual characteristics extracted
from their content. The learned prediction model is then evaluated using a different
set of tips (test set). Both training and test sets are built considering three different
types of entities: a set P = {p1, ..., pK} of K tips, a set U = {u1, ..., uN} of N users
(tip authors), and a set V = {v1, ..., vO} of O venues.
Each tip is represented by a tuple (p, u, v), and each entity (p, u, and v) has a
set of attributes (or features) F associated with it. The features represent the inputs
(predictors) associated with a given instance. There are also relationships between
these sets of entities: a function L : P → V maps each tip pi to a unique venue vi, and
an authorship function A : P → U maps each tip pi to a unique user ui.
Thus, given the input data < p, u, v >, we want to learn a prediction model M
that, for a set Pd of tips posted in the previous d time units, ranks those tips according
to their expected popularity at tr + δ. A tip is represented as an f -dimensional real
vector p over a space feature F built from the information in P, U and V. The model is
5.2. Ranking Strategies 81
thus a function M : Rf → R that maps a tip’s feature vector to a numerical popularity
(number of likes). Our proposed solution to this problem consists in: (1) determining
the set of features used to represent the tips, and (2) applying a learning algorithm to
predict the ranking of a set of tips (Pd), given d, tr, and δ.
Note that different tips in Pd may have been posted at different times within the
time window [tr − d, tr]. Thus, we associate a posting time tpi with each tip pi in Pd.
For evaluation purposes, we consider that each tip pi ∈ Pd is labeled with a numeric
value that represents the number of likes received by pi in the time interval [tpi , tr + δ]
(i.e., the true popularity acquired by pi up to tr+δ), as further discussed in Section 5.4.
The values of these features for a tip pi are computed considering all the information
available up to the time when the ranking is performed (tr).
The choice of criterion c allows for different scenarios where the tip ranking prob-
lem becomes relevant. One scenario is that of a user who is interested in quickly finding
tips with greater potential of becoming popular, and thus of containing valuable infor-
mation, posted in any venue in her home city. A different scenario is that of a user
who is particularly interested in retrieving tips regarding restaurants in her home city
(or neighborhood). A business owner can also benefit from a ranking restricted to tips
posted at venues of a specific category to get feedback about her business and about her
competitors. Also, changes in the current and future tip popularity rankings can help
with indirect analysis such as the influence of certain users whose tips got promoted in
the future and the potential market share gains or losses for certain venues or venue
categories.
5.2 Ranking Strategies
Recall that our goal is to assess to which extent using only the tips’ current popularity
ranking is enough to accurately predict their ranking at a future time. Thus, we here
consider two ranking strategies. The first approach simply uses the ranking of the tips
at prediction time (tr) as an estimate of their ranking at the future time tr + δ. If the
popularity ranking is stable, this approach should lead to perfect predictions. Thus,
by analyzing the effectiveness of this approach we are indirectly assessing the stability
of the tip popularity ranking. We use this approach as our baseline.
In order to assess the potential benefit of exploiting other factors to this prediction
task, we consider a second approach that combines multiple features. To that end, we
rely on an ordinary least square (OLS) multivariable regression model to predict the
popularity of each tip pi in Pd at time tr+δ and then rank the tips by their predictions.
82 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
In this approach, the logarithm of the number of likes of a tip pi, Rt, is estimated as a
linear function of k predictor variables or features (presented in the next section), i.e.:
Rt = β0 + β1x1 + β2x2 + · · · βkxk.
We note that various other algorithms could be used to exploit multiple features
to predict the popularity ranking of a set of tips. Indeed, we also experimented with
Support Vector Regression (SVR) [Drucker et al., 1997] with radial basis function kernel
as well as with a state-of-the-art learning-to-rank algorithm called Random Forests
[Breiman, 2001]. However, when applied with the same set of features, their results
are similar (or even worse in some cases) than those obtained with the simpler OLS
regression1. Thus, in to order to avoid hurt readability and to focus our discussion on
the benefits of adding the other features, we present only OLS results.
5.3 Tip Features
Having presented our prediction models, we now turn to the predictor variables x1,
x2, · · · , xk. These variables are features of the tip pi whose popularity we intend to
predict.
One of the primary goals of this dissertations is to understand which factors
impact the popularity achieved by a tip. This involves defining a set of features and
then assessing their relative importance to the tip’s popularity. We explore several
features related to the three central entities – tip, user, venue – which, intuitively,
should be related to the tip’s popularity. Some of these features have been also explored
to discuss the helpfulness of online reviews [Kim et al., 2006; Zhang and Varadarajan,
2006; O’Mahony and Smyth, 2009] and predict the ratings of (long) reviews [Hsu et al.,
2009; Lu et al., 2010; Siersdorfer et al., 2010].
As a type of online content, tips also can be evaluated for their credibility as
source of information. Fogg et al. [2001] described credibility as a perceived quality
composed by multiple dimensions. The authors investigate how seven web site design
elements, namely Real-World Feel, Ease of Use, Expertise, Trustworthiness, Tailoring,
Commercial Implications, and Amateurism, impact its credibility. The study revealed
that four of these elements – Real-World Feel, Ease of Use, Expertise, and Trustwor-
thiness – have relative impact on increasing credibility. Some of our proposed features
are inspired from these elements.
For example, the Real-World Feel element refers to aspects indicating that a web
site has a physical location and can be contacted. In our specific context, it indicates
1As we will see in Chapter 6, we also found OLS to be as good as (if not better than) SVR when
applied to the (different) task of predicting the popularity level of a given tip (Chapter 6).
5.3. Tip Features 83
to the users that real people wrote the tip and can be reached for questions. Expertise
is related to how respected the user is in the system, which can be reflected by the
number of likes received by that user. The Ease of Use element did not fit in any
of our features since this element reflects usability aspects, which are standard for all
Foursquare users. Finally, Trustworthiness was mapped into features that characterize
users and venues that were reputable.
For the ranking task, we exploit k = 53 features related to the user ui who posted
tip pi, the venue vi where pi was posted, and the textual content of pi. The values
of these features are computed at the time when the ranking is performed (tr). Our
selected features, which are grouped into user, venue and tip’s content features, are
introduced in the next sections.
5.3.1 User Features
User features describe the tip’s author past behavior, aiming to identify key behavioral
attributes that may impact the tip’s future popularity. In other words, our goal is to
identify factors that are useful to access the user credibility.
The full set of user features we consider is the following:
• Number of tips (user_tip_num): total number of tips posted by the user
up to tpi + ε.
• Number of distinct venues (user_tipped_venues): number of venues
where the author posted tips in the past up to time tpi + ε.
• Number of received likes (user_total_likes)1: total number of likes
received by tips previously posted by the author in the past up to time
tpi + ε. Variations of this feature also included in the feature set are: me-
dian (user_median_likes), average (user_avg_likes) and standard devi-
ation (user_std_likes) of the same metric.
• Number of given likes (user_given_likes): total number of likes given by
the tip’s author in the past up to time tpi + ε.
• Number of friends/followers (user_sn_size): number of Foursquare users
who follow or are friends of the tip’s author.
1Based on the Fogg’s design element Expertise.
84 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
• Social Likes (user_sn_likes): fraction of all likes received by the author
coming from his social network in the past up to time tpi + ε (i.e., friends and
followers).
• Tips by the social network (user_tip_by_sn): total number of tips
posted by the author’s social network in the past up to time tpi + ε. Other
variations included are: median (user_median_tip_by_sn) and average
(user_avg_tip_by_sn) of the same metric.
• Likes by the social network (user_like_by_sn): total number of likes
given by the author’s social network (in any tip) in the past up to time tpi + ε.
Other variations included are: median (user_median_like_by_sn), average
(user_avg_like_by_sn) and standard deviation (user_std_like_by_sn)
of the same metric.
• User visibility at venue (user_venue_visibility_total): fraction of
all likes received by the tip’s author that are associated with tips posted
at the same venue of the current tip pi, but after pi was posted.
This feature tries to capture an estimation of the user visibility at the
venue where the tip was posted. Other variations included are: median
(user_venue_visibility_median), average (user_venue_visibility_avg)
and standard deviation (user_venue_visibility_std) of the metric.
• User type (user_type)1: user category defined by Foursquare.
• Number of mayorships (user_mayorships_num): total number of mayor-
ships won by the author.
• Mayor (user_mayor)2: binary feature indicating whether the tip’s author was
a mayor of the venue where the tip was posted.
5.3.2 Venue Features
The second set of features considered are related to the venue where the tip was posted.
The selected features capture the activity at the venue or its visibility by other users.
For example, a tip may have a higher chance of becoming popular if it is posted at a
venue that has more visibility. Another piece of information that we try to capture is
1Celebrity and brand accounts may represent respectable people or organizations. This item was
also listed by Fogg in his study [Fogg et al., 2001]
2Based on the Fogg’s design element Expertise.
5.3. Tip Features 85
the strategy adopted by Foursquare to display the tips posted at the same venue, which
also may impact the visibility of a tip. Foursquare tips may be sorted by their number
of likes or by posting time, but only the former is available in the mobile application.
Thus, because of this placement, the top tips may accumulate more and more likes,
while recent reviews may be rarely read and thus not rated, a manifestation of the
rich-get-richer effect. Indeed, as we discussed in Chapter 4, the rich-get-richer effect on
Foursquare is present, but it is weaker compared to other types of content (e.g., video
and photos).
The following venue features are considered:
• Number of tips (venue_tip_num)1: total number of tips posted at the venue
up to tpi + ε.
• Number of received likes (venue_total_likes)2: total number of likes re-
ceived by all tips previously posted at the venue until tpi + ε. Other vari-
ations of this metric included are: median (venue_median_likes), average
(venue_avg_likes) and standard deviation (venue_std_likes).
• Number of check ins (venue_cks_num)2: total number of checkins at the
venue.
• Number of visitors (venue_visitors_num)2: total number of unique visi-
tors.
• Verified venue (venue_verified)3: binary feature indicating whether the
tipped venue was verified by Foursquare.
• Category (venue_category): venue category defined by Foursquare.
• Position in the ranking by likes: position of the tip in the ranking
of tips of the venue sorted by number of received likes in both ascending
(venue_like_rk_pos_asc) and descending (venue_like_rk_pos_dsc) or-
ders.
• Position in the ranking by date: position of the tip in the ranking of tips of the
venue sorted by posting date in both ascending (venue_date_rk_pos_asc)
and descending (venue_date_rk_pos_dsc) orders.
1Based on the Fogg’s design element Trustworthiness. We can interpret the number of tips, the
number of received likes, the number of check ins or unique visitors as estimates of the venue popularity.
Popularity can reflect recommendation from other users.
2Based on the Fogg’s design element Trustworthiness.
3Based on the Fogg’s design element Real-world Feel. A venue is verified, when it was claimed by
a real business owner. Thus, the venue exists outside the Internet, in the real world.
86 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
5.3.3 Tip’s Content Features
Finally, we consider various features related to the textual content of the tip. Some
of these features are the number of characters (tip_char_num), number of words
(tip_wrd_num), and number of URLs or e-mail addresses (tip_url_num) in the
content of the tip, which were analyzed in Section 4.1.3. Note that, by considering
tip_char_num and tip_wrd_num, we aim at analyzing to what extent the size
of the tip may impact its popularity. However, we also consider various other features
that capture linguistic and semantic characteristics of the textual content, as discussed
next.
A number of studies [Liu et al., 2008; Lu et al., 2010; O’Mahony and Smyth, 2010;
Wagner et al., 2012] have shown that the linguistic style can be a good indicator of the
utility/helpfulness of the review or quality of other user generated contents [Agichtein
et al., 2008; Chen et al., 2011; Dalip et al., 2011; Momeni et al., 2013]. As in [Liu et al.,
2008; Lu et al., 2010; Momeni et al., 2013], we have chosen to model syntactic features
using the Part-Of-Speech (POS) tagging of the words in the tip’s text. In this step,
each word is assigned a label, which represents its position/role in the grammatical
context (e.g., noun, adjective, verbs, etc.). For our experiments, we used the Stanford
Part of Speech tagger [Stanford NLP Group, 2012], which uses probabilistic methods
to build parse trees for sentences aiming at representing their grammatical structure.
Thus, for each tip, we parse it and count the number of words with each tag. We
then divide each number by the total number of words in the tip, for normalization
purposes. The features created based on the POS tags are shown in Table 5.1.
Table 5.1: Tip’s Syntactic Content Features.
Feature Name Feature Description
tip_pos_nn ratio of nouns
tip_pos_adj ratio of adjectives
tip_pos_adv ratio of adverbs
tip_pos_comp ratio of comparatives
tip_pos_ver ratio of verbs
tip_pos_fw ratio of foreign words
tip_pos_num ratio of numbers
tip_pos_sup ratio of superlatives
tip_pos_sym ratio of symbols
tip_pos_pp ratio of punctuation symbols
Furthermore, we also included three scores – positive, negative and neutral –
that captures the tip’s sentiment. These scores are computed using SentiWordNet
5.4. Experimental Setup 87
[Esuli and Sebastiani, 2006], an established sentiment lexical for supporting opinion
mining in English texts. SentiWordNet is a lexical resource built on top of Word-
Net [Fellbaum, 1998], which is a lexical database of English that groups nouns, verbs,
adjectives and adverbs into sets of synonyms, each expressing a distinct concept. In
SentiWordNet, each term is associated with a numerical score in the [0, 1] range for
positive, negative and objectivity (neutral) sentiment information. We compute the
scores (tip_pos_score,tip_neg_score and tip_neu_score) of a tip by taking the
averages of the corresponding scores over all words in the tip that appear in SentiWord-
Net. We adopted SentiWordNet because it showed to be the best and the cheapest
method among several state-of-the-art supervised strategies to detect the polarity of
tips [Moraes et al., 2013b]. To handle negation, we adapted a technique proposed in
Pang et al. [2002], that reverse the polarity of words between a negation word (“no”,
“didn’t”, etc.) and the next punctuation mark1.
Since different tips in the set Pd may have been posted at different times, we also
add the age of the tip (in number of hours since posting time tpi) and the number of
likes it has already received (tip_age_hours and tip_likes_current, respectively).
All features are summarized in Table 5.2.
5.4 Experimental Setup
We build two scenarios to evaluate the prediction strategies: ranking all tips recently
posted at venues located in New York, the city for which we have the largest number
of tips, and ranking tips posted at venues of a specific category (Food) (also the largest
category) located in New York2. In both scenarios, we consider only tips posted in
the previous month (i.e., d = 30 days), and produce rankings based on their predicted
popularity δ days later. We compare the effectiveness of both prediction strategies
for various values of δ. Table 5.3 summarizes these two datasets, presenting the total
numbers of tips, venues and users in each of them (the two rightmost columns are
discussed below).
As we did for the classification task, we split the tips chronologically into training
and test sets. Figure 5.2 illustrates this chronological splitting used. For comparison
purposes, we also evaluate the baseline only in the test sets.
1Another option is to use the negation detection tool of the Stanford POS tagger.
2Other scenarios, such as ranking tips posted at a venue, are also possible. However, the highly
skewed distribution of tips per venue leads to severe data sparsity, which, in turn, poses a challenge
to the training of the regression model.
88 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
Table 5.2: Complete Set of Features for Tip Popularity Prediction
Type Feature Name Feature Description
User
user_tip_num Total number of tips
user_tipped_venues Number of distinct venues
user_total_likes Total number of received likes1
user_given_likes Total number of given
user_sn_size Number of friends/followers
user_sn_likes Social likes
user_tip_by_sn Total number of tips by the social network 1
user_like_by_sn Total number of likes given by social network1
user_venue_visibility_total User visibility at venue1
user_type Foursquare user category
user_mayorships_num Total number of mayorships
user_mayor If the author was mayor of the venue
Venue
venue_tip_num Total number of tips
venue_total_likes Total number of received likes1
venue_cks_num Total number of check ins
venue_visitors_num Total number of visitors
venue_verified If the venue was verified
venue_category Foursquare venue category
venue_like_rk_pos_asc Like ranking position in ascending order
venue_like_rk_pos_dsc Like ranking position in descending order
venue_date_rk_pos_asc Date ranking position in ascending order
Content
tip_likes_current Total number of likes received until time tr
tip_age_hours Hours since posting time until time tr
tip_char_num Total number of characters
tip_wrd_num Total number of words
tip_url_num Total number of URLs or email addresses
tip_pos_nn Fraction of nouns
tip_pos_adj Fraction of adjectives
tip_pos_adv Fraction of adverbs
tip_pos_comp Fraction of comparatives
tip_pos_ver Fraction of verbs
tip_pos_fw Fraction of non-English words
tip_pos_num Fraction of numbers
tip_pos_sup Fraction of superlatives
tip_pos_sym Fraction of symbols
tip_pos_pp Fraction of punctuation
tip_pos_score Positive tip score
tip_neg_score Negative tip score
tip_neu_score Neutral tip score
1 Median, average and standard deviation are also included.
5.4. Experimental Setup 89
Table 5.3: Overview of Datasets and Scenarios of Evaluation
Scenarios # of tips # of users # of venues # of tips in training sets Avg # of tips in test sets
NY 169,393 55,149 31,737 516 4,697.87
NY Food 81,742 32,961 8,927 244 2,365.0
 
Train set 
Test sets 
07/26/11 12/01/10 12/30/10 Time 
Figure 5.2: Temporal Data Split into Train and Test Sets.
The training set is composed of all tips posted from December 1st to 30th, 2010.
These tips are used to learn the (regression-based) ranking model. We assume the
ranking of the training instances is done on December 30th, and thus use the total
number of likes received by these tips at the target date (i.e., δ days later) as the
ground truth to build the regression model.
Recall that the distribution of number of likes per tip is highly skewed towards
very few number of likes (Chapter 4), which might bias the regression model and
ultimately hurt its accuracy1. Thus, we adopt the following approach to reduce this
skew2. We group tips in the training set according to a threshold ω for the number of
likes received by the tip at the target date. Two classes are defined: all tips with at
least ω likes are grouped into the high popularity class and the others are grouped into
the low popularity class. We then build balanced training sets according to the two
popularity classes by performing under-sampling3. We repeat this process 5 times, thus
building multiple (balanced) training sets and allowing us to assess the variability of our
results. We note that this under-sampling approach (and threshold ω) is applied only
to the training set. The test sets (described next) remains unchanged (imbalanced).
Table 5.3 (5th column) presents the total number of tips in the training sets for each
scenario.
We then use tips posted from December 31st until February 27th 2011 to build 30
different test sets, as follows. Since tips can be continually liked, the predicted ranking
may become stale. Thus, we evaluate the effectiveness of the ranking methods by using
them to build a new ranking by the end of each day (starting on January 29th), always
1Great imbalance in the training set, as observed in our datasets, is known to have a detrimental
impact on the effectiveness of classification and regression algorithms.
2We also adopt the same under-sampling strategy used for the classification task.
3For illustration purposes, we note that the original training set for the NY scenario had 5,225
tips in the low popularity class and only 258 tips in the other (smaller) class.
90 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
considering the tips posted in the previous d = 30 days. Thus, 30 test sets are built
by considering a window of 30 days and sliding it 1 day at a time, 30 times. Table
5.3 (6th column) shows the average number of tips in each test set. For each test
set, we report average results produced by all 5 training sets, and corresponding 95%
confidence intervals.
For both training and test sets, the features of each tip are computed using
all data collected up to the time when ranking is performed (tr), including (for the
regression model) information associated with tips posted before the beginning of each
training set. Moreover, feature values are computed by first applying a logarithm
transformation on the raw numbers to reduce their large variability, and then scaling
these results between 0 and 1. We note that, in order to have enough historical data
about users who posted tips, for both training and test sets, we consider only tips posted
by users with at least five tips. We determine the best parameters of the regression
models by minimizing the least squared errors of predictions for the candidate tips in
the training set.
We evaluate each ranking method by computing the Kendall τ rank distance of
the top-k tips in the rankings produced by the method and the top-k tips in the ideal
ranking defined by the number of likes accumulated by each tip until tr + δ (i.e., the
tip’s label). We refer to it as Kτ@k. Since we are comparing two top-k lists (τ1 and τ2),
we use a modified Kendall τ metric [Konagurthu and Collier, 2013], that uses a penalty
parameter p, with 0 ≤ p ≤ 1, to account for the distances between non-overlapping
tips in τ1 and τ21. The modified Kendall τ is defined as follows:
Kτ(τ1, τ2)@k = (k−|τ1∩τ2|)((2+p)k−p|τ1∩τ2|+1−p)+
∑
i∈τi∩τ2
κi,j(τ1, τ1)−
∑
i∈τ1−τ2
τ1(i)−
∑
i∈τ2−τ1
τ2(i)
(5.1)
where τ1(i) (or τ2(i)) is the position in the ranking of the item i and κi,j(τ1, τ2) = 0 if
τ1(i) < τ1(j) and τ2(i) < τ2(j), or κi,j(τ1, τ2) = 1, otherwise. Kτ@k ranges from 0 to 1,
with values close to 1 indicating greater disagreement between the predicted ranking
and an ideal ranking.
1We use p = 0.5 which was recommended by Fagin et al. [2003].
5.5. Experimental Results 91
5.5 Experimental Results
We discuss our results by first assessing how the popularity ranking of tips varies over
time (Section 5.5.1), and then comparing the prediction based only on the current
ranking (baseline), the regression-based prediction that uses a richer set of features
(Section 5.5.2). Finally, in Section 5.5.3, we analyze the impact in the model accuracy
of removing features one at time according to Information Gain metric.
5.5.1 Ranking Stability
Using the experimental setup described in Section 5.4, we investigate the differences
between the true popularity rankings of tips at times tr and tr+δ, for various values of
δ. To that end, we quantify the correlation between these two rankings using Kendall’s
τ coefficient. Recall that the closer to 1 the value of Kτ is, the larger the disagreements
between both rankings.
Figure 5.3 shows the Kτ@k for each day in the test fold of both NY and NY
Food scenarios, for values of δ varying from 1 to 5 months. We focus on the top-10
most popular tips (k=10). Focusing first on the NY scenario, Figure 5.3a shows that
the disagreements between both rankings increase as we increase δ. Indeed, for a fixed
test day (fixed set of tips), the Kτ@10 varies from 0.26 to 0.72 as we increase δ from
1 to 5 months. Moreover, we can still observe some discrepancies even if we predict
for only one month ahead in the future (δ=1 month). Indeed, as discussed in Section
4.4, over 40% of the likes of most tips arrive after two months since posting time.
Since the tips in each test fold are at most 1 month old, most of them are still at very
early stages of their popularity curves, and the popularity ranking, even considering
only the top-10 tips, will change. Very similar results were also observed for the NY
Food scenario, as shown in Figure 5.3b, although the values of Kτ@10 (and thus the
disagreements between current and future rankings) seem somewhat smaller on some
days, particularly for larger values of δ.
Examining the top most popular tips in each test fold for the NY scenario, we
found that some of them referred to special events occurring in the city. Examples are
a venue created to be a “promoted” venue during the 2010 Super Bowl game event and
a “meme” venue, Snowpocalypse 2010, created to celebrate the major New York City
snow storm [Seward, 2011; Barbierri, 2011]. These tips exhibit a somewhat different
pattern: all of their likes are received until the event occurs. Thus, once they reach the
top of the ranking, they tend to remain there for a while, which contributes to lower
the discrepancies between predicted and future rankings.
92 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
0 5 10 15 20 25 30
Day in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
±=1
±=2
±=3
±=4
±=5
(a) NY
0 5 10 15 20 25 30
Day in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
±=1
±=2
±=3
±=4
±=5
(b) NY Food
Figure 5.3: Correlations between the top-10 most popular tips at time tr and at time
tr + δ (δ in months).
Overall, these results corroborate our discussion in Section 4.4.4, and suggest
that there are some noticeable discrepancies between the current and the long-term
popularity of tips (even within the top-10 most popular tips). Thus, models that use
only early measurements, such as those proposed by Szabo and Huberman [2010] as
well as Pinto et al. [2013], may lead to inaccurate predictions not only of popularity
measures (as will be discussed in Chapter 6) but also of popularity ranking. Next,
we assess to which extent such ranking predictions can be improved by exploiting a
multidimensional set of predictors.
5.5.2 Prediction Results
We now compare the prediction results using only the popularity ranking at time tr
(baseline) against the prediction produced by using the OLS regression model jointly
with the features defined in Section 5.3. Figure 5.4 shows the average daily Kτ@10
along with 95% confidence intervals for the two ranking methods and each value of
δ, for the NY scenario. For δ equal to 1 month, both methods produce τ@k results
below 0.4, showing a high correlation between the predicted ranking and the true
popularity ranking at tr+ δ. However, the OLS regression model produces results that
are significantly better (lower Kτ@10) than those produced by the baseline in 67% of
the days (reductions in up to 69%).
Moreover, as we predict further into the future, increasing δ to 2, 5 and 6 months,
we observe increasing values of Kτ@10 for both methods. This implies that the dis-
crepancies with the true ranking tend to increase as both methods start using outdated
5.5. Experimental Results 93
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(a) δ = 1 month
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(b) δ = 2 months
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(c) δ = 5 months
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(d) δ = 6 months
Figure 5.4: Effectiveness of Ranking for Varying Target Time tr+δ: NY Scenario (Avg
and 95% Confidence Intervals).
and possibly inaccurate data. Yet, the gap between the baseline and the OLS regres-
sion model tends to increase (reaching up to 65% for δ equal to 6 months). This result
shows that taking factors other than simply the current popularity of the tips into
account is important and can improve prediction accuracy of the long-term popularity
ranking.
We note, however, that there are some cases where the baseline performs as
good as the more sophisticated OLS model. These specific cases are explained by the
following: some of the most popular tips (which referred to real events), acquired most
of their likes very early on before time tr (before the event). Figure 5.5 shows a top-5
ranking of tips taken from our experimental datasets. We observed that the first two
tips in the ranking T1 and T2 achieved 95% and 90% of their likes at time tr). These
two tips referred to the Snowpocalypse 2010 event. Thus, they quickly reached top
94 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
Figure 5.5: Tips Ranking Example.
Tip Tip age # likes at # likes at
(days) tr tr + δ
T1 4 128 135
T2 4 85 94
T3 28 16 57
T4 23 25 54
T5 28 5 47
positions of the ranking, remaining there until tr+δ. For such cases, the use of other
features produces only marginal improvements in prediction.
Figure 5.6 shows similar results for the NY Food scenario. In this case, we see
smaller differences between both methods. In most cases, the baseline is just as good
as the more sophisticated OLS method, although the use of the extra features does
provide improvements (up to 30%) in some of the days for large values of δ. These
results reflect the higher stability of the tip popularity ranking in the NY Food scenario,
discussed in Section 5.5.1. Moreover, as shown in Table 5.3, the number of tips in the
training set of this scenario is almost half of that used in the NY scenario, which also
impacts the accuracy of the regression model. That is, the benefits from using more
features are constrained by the limited amount of data to train an accurate model1.
These results highlight that the accurate prediction of the tips popularity ranking
of a set of tips is a challenging task. Although tip popularity ranking remains roughly
stable over short periods of time (e.g., 1 month), there are still significant discrepancies
that occur in the top of the ranking. Moreover, the use of other features related to the
tip’s author, venue and tip’s content can improve prediction accuracy to some extent,
provided that enough information about the features is available to train the model.
5.5.3 Experiments Removing Features
Focusing on the OLS prediction strategy, we now assess the relative importance of each
feature using the well-known Information Gain feature selection technique [Yang and
Pedersen, 1997].
Information Gain, originally used to compute splitting criteria for decision trees,
is often used as a measure of how well a given feature separates the given dataset
1Recall that we did experiment with other prediction strategies based on SVR and Random Forests,
but OLS provided the best results across all scenarios.
5.5. Experimental Results 95
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(a) δ = 1 month
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(b) δ = 2 months
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(c) δ = 5 months
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline OLS
(d) δ = 6 months
Figure 5.6: Effectiveness of Ranking for Varying Target Time tr+δ: NY Food Scenario
(Avg and 95% Confidence Intervals)
[Janecek et al., 2008]. Before computing the Information Gain, we must compute the
overall entropy I for the dataset S, defined as:
I(S) = −
C∑
i=1
pi log2 pi (5.2)
where C is the number of classes and pi is the fraction of instances from class i in S.
It takes the maximum value when the instances are equally distributed amongst the C
classes. Information Gain is the expected reduction in entropy caused by partitioning
the instances according to a given feature. Thus, the Information Gain IG(S, F ) of a
given feature F is defined as:
IG(S, F ) = I(S)−
∑
v∈values(F )
|SF,v|
|S| I(SF,v) (5.3)
96 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
where v is the set of all possible values for feature F, and SF,v is the set of instances
where F has value v. Note that the first term in the equation is just the entropy of the
whole dataset S, and the second term is the expected value of the entropy after S is
partitioned using feature F.
Table 5.4: Features Ranked by Information Gain
Pos. Feature Pos. Feature
1 tip_likes_current 28 venue_std_likes
2 user_total_likes 29 tip_pos_pp
3 user_avg_likes 30 tip_pos_nn
4 user_std_likes 31 venue_avg_likes
5 user_median_likes 32 tip_pos_adj
6 user_tipped_venues 33 tip_pos_adv
7 user_tip_num 34 venue_tip_num
8 user_avg_tip_by_sn 35 tip_neg_score
9 user_venue_visibility_total 36 venue_date_rk_pos_asc
10 user_sn_size 37 venue_total_likes
11 user_tip_by_sn 38 tip_pos_ver
12 user_like_by_sn 39 user_venue_visibility_median
13 user_type 40 tip_neu_score
14 user_std_like_by_sn 41 tip_pos_score
15 user_avg_like_by_sn 42 tip_pos_num
16 user_sn_likes 43 user_median_tip_by_sn
17 user_venue_visibility_std 44 tip_pos_fw
18 user_venue_visibility_avg 45 tip_pos_sup
19 user_given_likes 46 venue_category
20 user_mayorships_num 47 venue_median_likes
21 tip_char_num 48 tip_pos_comp
22 tip_wrd_num 49 user_median_like_by_sn
23 tip_age_hours 50 venue_verified
24 venue_cks_num 51 user_mayor
25 venue_like_rk_pos_asc 52 tip_url_num
26 venue_like_rk_pos_dsc 53 tip_pos_sym
27 venue_visitors_num
Table 5.4 shows features ranked by Information Gain. We found that the most
important feature is, unsurprisingly, the current popularity of the tip. It is followed by
features related to the user’s popularity, such as the total number of likes in previous
tips (user_total_likes). Features related to the social network of the tip’s author
(average number of tips posted by his social network, and the number of followers
and friends) are also in the top-10 most important features (8th and 10th positions,
respectively).
5.5. Experimental Results 97
01020304050
# remaining features
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Figure 5.7: Effectiveness of Ranking when Removing One Feature at a Time: NY
Scenario (Avg and 95% Confidence Intervals for All Considered Days)
The most important venue feature is the total number of check-ins, followed by
the current position of the tip in the ranking of tips of the venue sorted by increasing
number of likes. However, these features, like the other venue related features, are much
less important than the user features, occupying only the 24th and 25th positions of the
ranking. Similarly, the most important content feature is the number of characters in
the tip, but it occupies only the 21st position of the ranking. Thus, unlike in other efforts
to assess the helpfulness of online reviews [Kim et al., 2006; Zhang and Varadarajan,
2006], textual features play a much less important role in the tip popularity ranking
prediction task, possibly due to the inherent different nature of these pieces of content.
Finally, we extend our evaluation of the relative importance of each feature by
assessing the accuracy of the OLS strategy as we remove one feature at a time, in
increasing order of importance given by the Information Gain. Figure 5.7 shows the
impact on the average Kτ@10 as each feature is removed, starting with the complete
set of user, venue and content features. For example, the second bar shows results after
removing the least discriminative feature (fraction of symbols in the tip text). Note
that the removal of many of the least discriminative features has no significant impact
on Kτ@10, indicating that these features are redundant.
However, we observe statistical significant losses (up to 18.8%) when we remove
the features related to the number of check ins at the venue (24th), the Foursquare user
type (13th) and the user’s total number of likes (2nd). After each one of these losses,
we also observe periods of stability (no losses) between the removal of some features.
For example, the period between the removal of the 24th and 13th features. We thus
98 Chapter 5. Predicting the Popularity Ranking of a Set of Tips
built a model that focus only on these four features (number of check ins at the venue,
user type, user’s total number of likes and current popularity of the tip) in which we
observed the largest losses. Figure 5.8 shows the same curves of Figure 5.4a, including
the new OLS model with only four features. This result shows that the model with four
features also produces results that are also statistically better than those produced by
the baseline (in 76.7% of the days) and, on average, is statistically tied with the model
with all features. In sum, using only four features, we are able to produce predictions
that are as accurate as those using the complete set of features.
0 5 10 15 20 25 30
Day
0.0
0.2
0.4
0.6
0.8
1.0
Ke
nd
all
 T
au
@
10
Baseline
OLS with only 4 features
OLS
Figure 5.8: Effectiveness of Ranking When Using Only 4 Features for δ = 1 month
(Avg and 95% Confidence Intervals)
5.6 Summary
In this chapter, we addressed the popularity prediction problem as a ranking task,
which consists in estimating the ranking by popularity of a given set of tips at a future
time. We evaluated the stability of tip popularity ranking over time, observing that
there are some noticeable disagreements between the current and future popularity
rankings, even when considering only the top-10 most popular tips and a time window
of only 1 month. This suggests that predicting the future ranking based only on the
current ranking may not be accurate.
We thus investigated to which extent we can improve such predictions by using a
regression model and exploiting a multidimensional set of features related to the tip’s
author, the venue where it was posted and its content. Our results showed that the
5.6. Summary 99
use of these features can improve to some extent the prediction accuracy, given that
enough training data is available.
Finally, we assessed the relative importance of each features of the regression
model using the Information Gain feature selection technique. We observed a model
that used only four features (number of check ins at the venue, user type, user’s total
number of likes and current popularity of the tip) is as accurate as those using a
complete set of features.

Chapter 6
Predicting the Popularity Level of a
Tip
In the previous chapter, we tackled the popularity prediction problem as a ranking task,
which aims at ranking a group of tips based on their predicted popularity at future time.
In this chapter, we investigate a different prediction task, modeling it as a classification
task. Using some of the features defined in the previous chapter, such task aims to
predict the popularity level of a single tip. Ranking as classification tasks support
different applications. For example, tip ranking supports filtering and recommendation
at finer granularity (as opposed to the number of popularity levels) which is useful to
users and venue owners. The classification task focus on the popularity of a given tip
that was just posted. So, users and venue owners who would like to be able to predict
if a tip will have enough visibility ahead of time can react quickly to promote it if it is
the case.
In this chapter, we also focus on the hardest prediction scenario, i.e., prediction at
posting time, when the only information available about the tip consists of its content
and historical patterns related to the user and the venue associated with it. However,
many prior efforts to predict the popularity of online content exploit information about
how the user population reacts to the content during an initial monitoring time (e.g.,
its early popularity measures). Another challenge is that we estimate the popularity
of a single tip, instead of a relative popularity outputted by the ranking task, which is
also a harder task.
As in previous related efforts, we here tackle the problem of predicting the popu-
larity level of a tip will reach using both regression [Chen et al., 2011; Hsu et al., 2009;
Kim et al., 2006] and classification [Liu et al., 2008, 2007] techniques. Specifically, like
in Liu et al. [2008], when applying regression, we use the predicted value as a threshold
101
102 Chapter 6. Predicting the Popularity Level of a Tip
to classify a tip into different popularity levels, defined based on ranges of number of
likes received. We do so by determining into which range the predicted number of likes
fall.
We start this Chapter discussing the popularity levels we consider in Section
6.1. Next, we formally define the problem of predicting the popularity level of a tip
on Foursquare (Section 6.2). We then discuss the techniques adopted to design our
prediction models (Section 6.3) and the features used as input to them (Section 6.4). We
present our evaluation methodology (Section 6.5) followed by our main experimental
results (Section 6.6). We then investigate the accuracy of our models when varying
monitoring time and prediction target time. Finally, in Section 6.8, we discuss the
impact of model specialization.
6.1 Popularity Levels
In this chapter, we investigate models to predict the level of popularity of a tip at a
certain point in time in the future. For a given tip, its popularity is defined as the
expected number of times the tip will be marked as liked. Moreover, as in previous work
[Hong et al., 2011; Bandari et al., 2012; Anderson et al., 2012], instead of predicting the
exact number of likes a tip will receive, we categorize tips into various level of popularity,
defined by ranges of the number of likes received, and predict this level instead. We
choose to predict the level of popularity instead of the exact number of likes of the tip
because the latter is harder, particularly given the very skewed distribution of number
of likes per tip (shown in Figure 4.6a). Moreover the former should be good enough
for various purposes (e.g., early identification and highlight of future popular tips as
soon as they are posted, so they can be more visible on the site or definition of revenue
schemes for ads embedded on tips). In the previous chapter, we discussed solutions
to an alternative popularity prediction task, which consists of ranking a group of tips
based on their predicted popularity at a future time.
We here consider two scenarios of popularity levels: two levels (low or high pop-
ularity), and three levels (no, low or high popularity). Table 6.1 presents, for each
scenario, the ranges of number of likes per tip in each level as well as a numerical rep-
resentation of each category (used for computation purposes). The rightmost column
will be discussed in Section 6.5.1. In the first scenario, tips with at most 4 likes are in
the low popularity level, whereas the others are grouped into the high popularity level.
In the second scenario, tips with low popularity receive from 1 to 4 likes whereas tips
with high popularity receive at least 5 likes (like in scenario 1).
6.2. Tip Popularity Prediction: Formal Definition 103
Table 6.1: Distribution of Candidates for Prediction Across Different Popularity Levels
Scenario Popularity Level Tips’s Category Number Number of
of Likes Candidate
per Tip Tips
1 Low 0 < 5 703,827High 1 ≥ 5 3,427
2
No 0 0 589,044
Low 1 1-4 114,783
High 2 ≥ 5 3,427
We note that we also experimented with other number of categories (e.g., 4 dif-
ferent categories) as well as different range definitions. Fundamentally, the limiting
factor here is the imbalance between the tips. So, even if we consider two levels, one
composed by tips that did not received any like and another level with tips that re-
ceived at least one like, the imbalance would be still severe and influential in the results.
Another point to consider when defining these categories is the availability of enough
examples from each category for training, i.e., learning the model parameters. We here
show results for the two scenarios presented in Table 6.1, although, in general, the
main conclusions hold with all other categorizations we experimented with. We also
note that we tried to cluster tips using a set of attributes (the attributes discussed in
Section 6.4) and various types of clustering techniques (e.g., k-means [Hartigan and
Wong, 1979], x-means [Pelleg and Moore, 2000], spectral clustering [Shi and Malik,
2000] and density-based clustering [Ester et al., 1996]). However, the resulting clusters
were not stable in the sense that clustering results varied with the seeds provided to
the algorithm, and no single number of clusters could be determined as the best one.
The same clustering behavior was previously observed in the prediction of quality of
questions in question answering services [Li et al., 2012]. This may be due to the lack
of discriminative attributes specifically for the clustering process, which does not imply
that the attributes cannot be useful for the prediction task, our main goal here.
6.2 Tip Popularity Prediction: Formal Definition
The problem we tackle in this chapter can be formally defined as follows. Given a
tip pi, posted at time tpi, predict the popularity level of pi at a given future target
time tpi + δ, where the popularity of the tip is estimated by the total number of likes
received until tpi + δ (prediction window). Moreover, we consider various scenarios
104 Chapter 6. Predicting the Popularity Level of a Tip
where the predictions are performed at time tpi + ε, where ε < δ defines a period
after posting time, during which the tip is monitored up until prediction. Note that ε
may be zero, corresponding to predictions at posting time. Figure 6.1 illustrates the
prediction scenarios considered in our study.
We also consider sets V and U of venues and users, respectively, where ui ∈ U is
the user who posted pi, and vi ∈ V is the venue where it was posted. Each tip pi, U
and V has a set of features F associated with it. Collectively, the features associated
with pi, ui, and vi, are used as inputs to a prediction model (see below) representing
the given tip instance. For evaluation purposes, each tip pi ∈ P is labeled with a
numeric value that represents the popularity level of pi at time tpi + δ. The values of
the features associated with each entity are computed using information available up
to tpi + ε, which defines when the prediction is done.
Next, we define the learning methods used in our experiments to predict the
popularity level of a tip.
Figure 6.1: Monitoring Time Scheme.
6.3 Prediction Methods
We explore the use of two techniques – regression and classification – in the design of
our models to predict the popularity level of a tip. We experiment with one classifi-
cation algorithm – Support Vector Machine (SVM) [Joachims, 1998] – and two types
of regression algorithms – ordinary least squares (OLS) multivariable linear regression
and Support Vector Regression (SVR) [Drucker et al., 1997] algorithm. These methods
are described in the following sections.
As already mentioned, we focused on predicting the tip’s popularity level instead
of the total number of likes received by it. Previous work, mentioned in Chapter 2,
has predicted the helpfulness of a review defined as a fraction x
y
, x out y people found
the following review helpful [Liu et al., 2008; Hong et al., 2012], the review average
rating (real number between 0 and 5) [Lu et al., 2010; Moghaddam et al., 2012], and
the number of tweets about an article [Bandari et al., 2012]. All these previous efforts
were pursued using regression approaches. We here experiment with two types of
6.3. Prediction Methods 105
regression algorithms. Both produce as output a real value, which is rounded to the
nearest integer that represents a popularity level, as done for predicting the helpfulness
of movie reviews in Liu et al. [2008]. We chose to do so after performing some initial
experiments comparing the use of regression models to predict the number of likes of
a tip and its popularity level. These experiments indicated using regression to predict
the popularity level leads to better results.
6.3.1 Support Vector Machines (SVM)
A classification task usually involves separating data into training and testing sets.
Each instance in the training set contains one target value (class label) and several
observed variables or features. The goal of the classification algorithm is to learn a
model based on the training data which predicts the class labels of the test data given
only the test data features.
The support vector machine (SVM) learning algorithm [Joachims, 1998] is among
the most popular supervised classification methods, and has proven successful across
many domains such as document classification [Manevitz and Yousef, 2002], detection
of malicious users [Benevenuto et al., 2012], as well as prediction of the localization of
proteins [Hua and Sun, 2001].
Let a training set of instance-label pairs (xi, yi), i=1, · · · , k, where xi ∈ Rk is
a vector representing the observed features and yi is the class to which the sample
instance belongs, which, in our case, corresponds to the tip’s popularity level. A
function φ maps the training vectors xi into a higher dimensional space, such that the
data under consideration has become separable by a hyperplane with maximum margin.
SVM selects the plane that lies furthermost from all defined classes (maximal margin
hyperplane) from a set of infinite number of existing separating hyperplanes. Since the
xi vector can have many dimensions or even be infinite, the φ function computation
becomes non trivial and computationally costly. To simplify this computation, SVM
uses a class of functions called kernel functions. In particular, in our experiments,
we used both linear and radial basis function (RBF) kernels, available in the LIBSVM
open source package [Chang and Lin, 2001]. We note that the latter (RBF) does handle
non-linear relationships between the target value and the predictor variables (features).
106 Chapter 6. Predicting the Popularity Level of a Tip
6.3.2 Ordinary Least Square Regression (OLS)
We also considered an ordinary least square (OLS) multivariable linear regression model
to estimate the popularity level of a tip pi, at a given point in time t, R(pi, t)1, as a
linear function of k predictor variables, x1, x2 · · · xk, i.e.:
R(pi, t) = β0 + β1x1 + β2x2 + · · · βkxk (6.1)
Model parameters β0, β1 · · · βk are determined by the minimization of the least
squared errors [Jain, 1991] in the training data, as will be discussed in Section 6.5.1.
6.3.3 Support Vector Regression (SVR)
We also consider the more sophisticated Support Vector Regression (SVR) algorithm
[Drucker et al., 1997], which is a state-of-the-art method for regression learning. SVR
has been applied to several problems including the estimate of the quality of articles
in collaborative digital libraries [Dalip et al., 2011] and regional logistic demand
forecasting [Yang et al., 2010]. Unlike our OLS model, SVR does not consider errors
that are within a certain distance of the true value (i.e., within the margin). Moreover,
as SVM, SVR allows the use of different kernel functions, which helps solving a larger
set of problems, compared to linear regression. Like for SVM, we explore both linear
and RBF kernels, also available in the same package.
In addition to the SVM, OLS and SVR models, we also consider a very simple strategy
that predicts the level of popularity of a tip pi posted by a user ui using the median
number of likes received by tips previously posted by ui as an estimate of the number of
likes the new tip will receive. Our interest is to assess the improvements in prediction
accuracy of using either the SVM, OLS or SVR models over this simpler (baseline)
approach which takes into account only the popularity of previous posted tips of the
user. We refer to it as the median strategy.
6.4 Tip Features
For the classification task, we exploit the same set of features proposed for the ranking
task (Chapter ) with exception of two features: the number of hours since posting time
(tip_age_hours) and the number of likes it has already received (tip_likes_current)2
1As already mentioned, the tip’s popularity level is obtained rounding Rt to the nearest integer.
2This feature is used when we evaluate scenarios were ε > 0.
6.4. Tip Features 107
that are more specific to scenarios in which we consider a set of tips posted at different
times. Recall, that here we are predicting the popularity of a single tip at posting time.
We considered two scenarios when computing the user features, namely: (1) all
tips posted by the user, and (2) only tips posted by the user at venues of the same
category of the venue where pi was posted. To distinguish between these two sets of
user features, we refer to the latter as user/category features.
In addition to the features listed in Section 5.3, we also considered other linguistic
features extracted from the tip’s textual content. These features are based on aggregate
statistics that capture the readability1, informativeness, and structural features1 of the
tip’s text. Readability features or tests are used to estimate the difficulty readers may
have in reading and comprehending a text. These tests produce scores that combine
several factors affecting the clarity of the text, such as the number of words, syllables
and sentences [Dalip et al., 2013]. We consider six of such tests:
• Automated Readability Index (ARI) (tip_read_ari) [Senter and Smith, 1967]:
is derived from the ratios representing word difficulty (number of characters per
word) and sentence difficulty (number of words per sentence).
• Flesch Reading Ease (tip_read_flesch) [Flesch, 1948]: measures the difficulty
to read a text, in a 100-point scale, based on the average number of syllables per
word and words per sentence. The higher the Flesch Reading Ease score, the
easier it is to understand the document.
• Flesch Kincaid Grade Level (tip_read_kincaid) [Ressler, 1993]: translates
the Flesch Reading Ease score into U.S. grade level of education required to
understand the text2.
• Coleman-Liau (tip_read_coliau) [Coleman and Liau, 1975]: consists of the
average number of characters per word and the number of sentences in a fragment
of 100 words2.
• Gunning Fog (tip_read_fog) [Gunning, 1952]: indicates the number of years
of education required for a reader to understand the text2. It uses the average
number of words per sentences and the average number of complex words (words
with more than 3 syllables).
• SMOG (Simple Measure of Gobbledygook) (tip_read_smog) [McLaughlin,
1969]: estimates the years of education a person needs to understand a piece
1Extracted from style readability tool (http://www.gnu.org/software/diction/)
2It outputs approximately the U.S. grade level necessary to comprehend the text.
108 Chapter 6. Predicting the Popularity Level of a Tip
of writing using the average of polysyllabic words (number of words of more than
two syllables) taken from a sample of 30 sentences2.
Informativeness (tip_informat) of a tip measures the novelty of the tip’s terms
with respect to other tips posted at the same venue. This metric was used in [Hsu
et al., 2009; Wagner et al., 2012; Momeni et al., 2013] to predict quality or helpfulness
of comments or reviews. We derive informativeness using the Term Frequency Inverse
Document Frequency (TF-IDF) measure, where we sum over the TF-IDF values for all
terms in a single tip pj (Equation 6.2):
inform(pj) =
∑
wi∈pj
TFi,j IDFi (6.2)
The Term Frequency (TF) term captures the concept of popularity, being defined
as (Equation 6.3):
TFi,j =
ni,j∑
k nk,j
(6.3)
where ni,j is the number of occurrences of the term wi in the tip pj and the denomi-
nator is the sum of the number of occurrences of all terms in the tip tj. The Inverse
Document Frequency (IDF) term measures the concept of specificity, valuing terms
that are infrequently across all tips in the dataset. It is defined as (Equation 6.4):
IDFi =
log|K(wi)|
|K| (6.4)
where |K(wi)| is the number of tips where the term wi, is found and |K| is the number
of tips in the dataset.
We consider a range of structural features used in previous studies of on-
line reviews [O’Mahony and Smyth, 2010; Castillo et al., 2011; Dalip et al.,
2013; Momeni et al., 2013] In particular the following set of structural fea-
tures are extracted from the tip’s text: fraction of capitalized words in the tip
(tip_cap_wrd), fraction of unique words (tip_uniq_wrd), fraction of capital-
ized characters (tip_cap_char), entropy of tip word sizes (tip_ent_wrd), frac-
tion of the characters that are spaces in the tip (tip_frac_space), number of
sentences in the tip text (tip_sent_num), number of syllables (tip_syl_num),
average number of syllables per word (tip_avg_syl_wrd), average num-
ber of characters per word (tip_avg_char_wrd), number of words in the
longest sentence (tip_long_sent_wrd), number of words with 3 or more syl-
lables (tip_comp_wrd_num), number of sentences in the tip’s text begin-
6.4. Tip Features 109
ning with a conjunction, article, interrogative pronoun, preposition, pronoun and
subordinating conjunction (tip_beg_conj, tip_beg_art, tip_beg_int_pron,
tip_beg_prep, tip_beg_pron, tip_beg_subor_conj), total number of
conjunctions (tip_conj_num), total number of sentences that are questions
(tip_quest_sent), total number of prepositions (tip_prep_num), number
of passive voice sentences (tip_pass_sent), number of uses of verb “to be”
(tip_tobe_num), and total number of pronouns (tip_pron_num) in the tip’s
text.
Complementary to the above features, we also use semantic and topical features.
This set includes the total number of named entities (tip_num_entities)1, number of
distinct types of named entities (tip_num_dist_entities)1 as well as, psychological
characteristics and sentiment polarities, further discussed next.
As in Momeni et al. [2013], we also use the psychological dimensions defined
by LIWC. LIWC [Tausczik and Pennebaker, 2010] is a dictionary that includes
more than 2,300 English words classified in psychologically categories that re-
flects people’s emotional and cognitive perceptions. It has been widely used in
many contexts, specially for sentiment analysis in social networks [Wu et al.,
2011b; Quercia et al., 2011; Park et al., 2012; Momeni et al., 2013]. In this
work, we focused on the following categories [Inquiry and Count, 2007]: first,
second and third persons (tip_liwc_first, tip_liwc_second, tip_liwc_third),
positive and negative emotions (tip_liwc_pos, tip_liwc_neg), cognitive
processes (tip_liwc_cog, tip_liwc_inhib, tip_liwc_cause), perceptual
processes (tip_liwc_perc), time or temporal context (tip_liwc_time),
verb tenses (tip_liwc_past, tip_liwc_pres, tip_liwc_fut), quanti-
fiers (tip_liwc_quant), numbers (tip_liwc_numb), personal concerns
(tip_liwc_work, tip_liwc_leisure, tip_liwc_money, tip_liwc_home,
tip_liwc_relig), swear words (tip_liwc_swear), social processes
(tip_liwc_social, tip_liwc_family, tip_liwc_friend, tip_liwc_humans),
negative feelings (tip_liwc_anxiety, tip_liwc_anger, tip_liwc_sad), biological
processes (tip_liwc_body, tip_liwc_health, tip_liwc_sex, tip_liwc_ingst),
relativity (tip_liwc_relativ, tip_liwc_space, tip_liwc_motion), achievement
(tip_liwc_achieve), affective processes (tip_liwc_affect), and spoken category
(tip_liwc_nonfl). We also used a feature to count the number of words that match
any word in LIWC (tip_liwc_match).
Finally, we extract three more content features from the tip, also inspired in
1For the extraction of the Named Entity features, we used the Stanford Named Entity Recognizer
(NER)[Stanford NLP Group, 2013].
110 Chapter 6. Predicting the Popularity Level of a Tip
studies about content quality [Agichtein et al., 2008; Chen et al., 2011; Dalip et al.,
2013]: number of bad words in the tip text (tip_bad_wrd)1, number of words in the
tip that are not in the English lexical WordNet (tip_not_wordnet), and number of
words present in a list of common misspellings (tip_miss_wrd)2.
Thus, we represent each tip pi by k = 125 features related to the user ui who
posted tip pi, the venue vi where pi was posted, and the textual content of pi. The
values of these features are computed up to (tpi + ε). All features are summarized in
Table 6.2.
6.5 Methodology Evaluation
Having defined the features used by our strategies for predicting the popularity level
of a tip, we now discuss our evaluation methodology. We start by presenting our
experimental setup in Section 6.5.1, and then discuss our evaluation metrics in Section
6.5.2.
6.5.1 Experimental Setup
Our experimental setup consists, in general terms, of dividing the available data into
training and test sets, learning model parameters using the training set, and evaluating
the accuracy of the learned model in the test set. We split the tips chronologically into
training and test sets, rather than performing random split, to avoid that the dataset
used to build a model to estimate a given tip’s popularity level includes data related to
tips that were posted after the time the given tip was created (or after the monitoring
period tpi+  expired). Next, to generate multiple runs, we slide both training and test
windows along the time axis3. We do so 5 times, thus producing 5 results, as illustrated
in Figure 6.2.
For the training sets, we considered the most recent tip posted by each user as
candidate for popularity prediction, using the other tips to compute the features used
as predictor variables. For the test set, we considered all tips posted during 1 month
after the end of the training set.
For each candidate for prediction in both training and test sets, we computed
the feature values by first applying a logarithm transformation on the raw numbers
to reduce their large variability (as discussed in Section 4.1), and then scaling the
1This list was extracted from https://gist.github.com/jamiew/1112488
2http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings
3Another approach would be using rolling forecast validation scheme, increasing the training set
in each run.
6.5. Methodology Evaluation 111
Table 6.2: Complete Set of Features for Tip Popularity Level Prediction
Type Feature Name Feature Description
User
user_tip_num Total number of tips
user_tipped_venues Number of distinct venues
user_total_likes Total number of received likes1
user_given_likes Total number of given
user_sn_size Number of friends/followers
user_sn_likes Social likes
user_tip_by_sn Total number of tips by the social network 1
user_like_by_sn Total number of likes given by social network1
user_venue_visibility_total User visibility at venue1
user_type Foursquare user category
user_mayorships_num Total number of mayorships
user_mayor If the author was mayor of the venue
Venue
venue_tip_num Total number of tips
venue_total_likes Total number of received likes1
venue_cks_num Total number of check ins
venue_visitors_num Total number of visitors
venue_verified If the venue was verified
venue_category Foursquare venue category
venue_like_rk_pos_* Like ranking position *(ascending/descending order)
venue_date_rk_pos_asc Date ranking position in ascending order
Content
tip_char_num Total number of characters
tip_wrd_num Total number of words
tip_url_num Total number of URLs or email addresses
tip_pos_nn Fraction of nouns
tip_pos_adj Fraction of adjectives
tip_pos_adv Fraction of adverbs
tip_pos_comp Fraction of comparatives
tip_pos_ver Fraction of verbs
tip_pos_fw Fraction of non-English words
tip_pos_num Fraction of numbers
tip_pos_sup Fraction of superlatives
tip_pos_sym Fraction of symbols
tip_pos_pp Fraction of punctuation
tip_read_* *(ARI, Flesch, Kincaid, Coleman-Liau, Gunning Fog and SMOG) Index
tip_informat Informativeness
tip_cap_wrd Percent of capital words
tip_cap_char Percent of capital characters
tip_ent_wrd Entropy of word sizes
tip_frac_space Space density
tip_sent_num Number of sentences
tip_syl_num Number of syllables
tip_avg_syl_wrd Average syllables per word
tip_avg_char_wrd Average characters per word
tip_long_sent_wrd Longest sentence size
tip_comp_wrd_num Number of complex words
tip_beg_* Number of sentences beginning with a * (conjunction, article, interrogative,
pronoun, preposition, pronoun and subordinating conjunction)
tip_conj_num Total number of conjunctions
tip_quest_sent Total number of questions
tip_prep_num Total number of prepositions
tip_pass_sent Total number of passive sentences
tip_tobe_num Total number of to be verbs
tip_pron_num Total number of pronouns
tip_num_entities Total number of entities
tip_num_dist_entities Total number of distinct types of entities
tip_liwc_* Psychological characteristics *(LIWC classes)
tip_*_score *(Positive, negative, neutral) tip score
tip_bad_wrd Number of bad words
tip_not_wordnet Number of words not in the WordNet
tip_miss_wrd Number of misspellings
1 Median, average and standard deviation are also included.
112 Chapter 6. Predicting the Popularity Level of a Tip
Figure 6.2: Chronological Split of Training and Test Sets: Sliding Windows Over Time.
results between 0 and 1. Moreover, in order to have enough historical data about users
who posted tips, we considered only users who posted at least 5 tips. We also only
considered tips that were posted at least one month before the end of training/test
sets (otherwise the tip certainly has not received a lot of attention by the time of
prediction). We also focused on tips written in English, since the textual features are
computed by using tools which are available only for that language. After applying
these filters, we ended up with roughly 700 thousand tips that are candidates for
prediction. The distribution of these tips into the different popularity levels is shown
in Table 6.1 (rightmost column).
OLS model parameters were defined by minimizing the least squared errors for
predictions for the candidate tips in the training data. This can be done as their
popularity levels (i.e., number of likes) is known. SVR model parameters were defined
in a similar manner, as performed by the LIBSVM package. Moreover, SVR and SVM
have some parameters that impact the error minimization process (e.g., parameters
C and ε) [Drucker et al., 1997]. As in previous work [Dalip et al., 2011; Hsu et al.,
2009], the best values of these parameters were selected using cross-validation within
the training set, using the parameter selection tool provided by the LIBSVM package.
As shown in Table 6.1, the distribution of tips across the popularity levels (i.e.,
categories) is quite unbalanced. Such imbalance, particularly in the training data,
poses great challenges to the regression and classification accuracy [He and Garcia,
2009]. Indeed, results from initial experiments using all available data were very poor
and unsatisfactory.
A widely used strategy to cope with the effects of class imbalance in the training
data is under-sampling [Liu et al., 2009]. Suppose there are n tips in the smallest
category in the training set. We produce a balanced training set by randomly selecting
equal sized samples from each category, each with n tips. Note that under-sampling is
6.5. Methodology Evaluation 113
performed only in the training set. The test set remains unchanged.1 Because of the
random selection of tips for under-sampling, we performed this operation 5 times for
each sliding window, thus producing 25 different results, in total. The results reported
in this chapter are thus averages of 25 results, along with corresponding 95% confidence
intervals.
6.5.2 Evaluation Metrics
Many metrics have been used for evaluating the performance of classification tasks.
A common technique is to evaluate the confusion matrix produced as a result of the
testing phase of the classifier. A confusion matrixM summarizes all information about
the actual and predicted values done by the classifier. Each matrix entry M [i, j] is
computed as the number of testing samples belonging to class i which were assigned
by the classifier to class j. The diagonal elements M [i, i] shows the numbers of correct
classifications made for each class, whereas the off-diagonal elements show the errors.
Table 6.3 illustrates a three-by-three confusion matrix. From these entries, simple
measures can be directly obtained. The true positive for each class i, tpi, is the number
of samples correctly assigned to class i. The false positive for class i, fpi, is the number
of samples that do not belong to class i but were incorrectly assigned to class i by the
classifier. For example, fpA = eBA + eCA in Table 6.3. The false negatives of class i,
fni, is the number of samples that were not assigned to class i by the classifier but
actually belong to it. In Table 6.3, fnA = eAB + eAC .
Table 6.3: Confusion Matrix for a Three-Class Classification Task
Predicted class
A B C
Known Class
A tpA eAB eAC
B eBA tpB eBC
C eCA eCB tpC
The most commonly used metric derived from a confusion matrix is the overall
accuracy. It is computed as the sum of correct classifications divided by the total
number of classifications. However, with highly skewed data distribution, as found in
our dataset, the accuracy metric does not always provide the full picture. For instance,
a model that predicts all samples as negative has high accuracy, but it is inefficient to
1Although under-sampling changes the distribution of tip categories between the training and test
sets, this technique has demonstrated to be effective in other applications with skewed data [Dalip
et al., 2011; Benevenuto et al., 2012].
114 Chapter 6. Predicting the Popularity Level of a Tip
detect rare positive samples [Tang et al., 2009]. Thus, the performance of a classifier in
applications with class imbalance should not be expressed in terms of average accuracy.
The precision for each class i, defined as tpi
tpi+fpi
, is also sensitive to changes in data
distribution as will be discussed in Section 6.6. In contrast, the recall metric, defined as
tpi
tpi+fni
, provides a less sensitive measure of the classifier performance [He and Garcia,
2009]. We also use macro-averaged precision and recall over all the experiments. Macro-
average scores are calculated by first computing precision and recall for each class i ∈ C,
where C is the set of class labels, and then taking the average of these values (Equation
6.5 and 6.6). These macro-averaged measures also prevent the results from being biased
towards the larger class.
Precisionmacro =
1
|C|
|C|∑
i=1
tpi
tpi + fpi
(6.5)
Recallmacro =
1
|C|
|C|∑
i=1
tpi
tpi + fni
(6.6)
6.6 Experimental Results: Predictions at Posting
Time
We now discuss representative results for predictions performed at the time the tip is
posted. That is, we fix ε = 0 and evaluate our proposed models to predict at time tpi
the popularity level that tip pi will achieve at time tpi + δ. We also set δ equal to 1
month and leave the evaluation for other values of δ and ε to next sections.
We start by first investigating how the sets of features related to the three central
entities – user, venue and tip – affect the prediction of tip popularity (Section 6.6.1).
We then analyze the importance of each feature individually (Section 6.6.2).
6.6.1 Analysis of the Groups of Features
Figures 6.3 and 6.4 show the macro-average precision and recall results along with
corresponding 95% confidence intervals, for 30 different strategies for predicting a tip’s
popularity level, considering two and three popularity levels respectively. These strate-
gies emerge from the combination of five prediction algorithms with alternative sets of
predictor variables. In particular, for OLS, SVR (linear and RBF kernels) and SVM1
algorithms, we consider the following sets of predictors: only user features, only venue
1We show only results for the RBF kernel, as the linear kernel produced similar results.
6.6. Experimental Results: Predictions at Posting Time 115
(a) Macro-Average Precision
(b) Macro-Average Recall
Figure 6.3: Macro-Average Results for Two Popularity Levels.
features, only content features, all venue and user features, all user, venue and content
features. We also consider only user features restricted to the category of the venue
where the tip was posted (user/cat features). For the predictions using the median
number of likes of the user, here referred to as median strategy, we compute this num-
ber over all tips of the user and only over the tips posted at venues of the same (target)
category1. The significance of the results was assessed using both one-way ANOVA
and Kruskal-Wallis ([Allen, 1990]) tests with 95% confidence.
1The Figures 6.3, 6.4, 6.5 and 6.6 show results of median prediction model only for the user and
user/cat sets of predictor variables since they are the same for the other sets.
116 Chapter 6. Predicting the Popularity Level of a Tip
(a) Macro-Average Precision
(b) Macro-Average Recall
Figure 6.4: Macro-Average Results for Three Popularity Levels.
We start by analyzing the two different scenarios of popularity levels. Comparing
precision and recall results for both scenarios, we note that the two-level scenario
produces the best results (gains of up to 93% for average precision and up to 68%
for average recall). This result is not surprising, since it may be simpler to build
a classifier to distinguish only between two classes than to consider more than two
classes as the decision boundaries in the former case can be simpler [Galar et al., 2011].
Analyzing the confusion matrices for the scenario with three levels, we note that most
of the misclassifications occur between the no popularity and the low popularity classes,
which suggests that the proposed features cannot distinguish between them very well.
6.6. Experimental Results: Predictions at Posting Time 117
Thus, we only focus on the scenario with 2 popularity levels for the next analyses.
We observe that there is little difference in macro-average precision across the
prediction algorithms, except for SVR with linear kernel using only content features,
which performs worse (Figure 6.3a). The same degradation of SVR is observed in terms
of macro-average recall (Figure 6.3b). Moreover, the superiority of SVM, SVR and OLS
over the simpler median strategy, in terms of recall, is clear. For any of those strategies,
the best macro-average recall was obtained using user and venue features, with gains
from 1% (SVR-RBF) up to 36% (OLS) over the other sets of predictors. Comparing
the different techniques using user and venue features, we see small but statistically
significant gains in macro-average recall for SVM and OLS over SVR (up to 1.15% and
0.84%, respectively), while OLS and SVM produce statistically tied results. However,
OLS is much simpler than both SVM and SVR, as will be discussed later in this section.
We also note that recall results are higher than precision results in Figures 6.3-
a,b). This is because the precision for the smaller class (high popularity) is very low
even with a high recognition rate (large number of true positives). This can be better
analyzed when we look at the confusion matrices. Table 6.4 shows two confusion
matrices taken from our experimental results with two popularity levels. Recall that
under-sampling is applied only to the training set, and thus class distribution in the
test set is still very unbalanced and dominated by the larger class (low popularity).
Thus, even small false negative rates for the larger class results in very low precision
for the other (smaller) class (see Table 6.4a). Thus, macro-average precision scores
by themselves do not provide a completely fair picture about the number of correct
predictions in all strategies. For that reason, towards a deeper analysis of the prediction
strategies, we also discuss precision and recall results for each class (low and high
popularity category).
Predicted class
Low High
Known Class Low 74,054 30,733High 115 529
(a) OLS Method with User + Venue + Textual
Predicted Class
Low High
Known Class Low 89,237 15,550High 558 86
(b) Median Method
Table 6.4: Examples of Confusion Matrices for a Two-Class Classification Task
Figure 6.5 and Figure 6.6 show the average precision and recall along with cor-
responding 95% confidence intervals for tips in the low and high popularity categories,
respectively. As discussed above, the precision values for tips in the low popularity
class is much higher than the precision for tips in the high popularity category (shown
118 Chapter 6. Predicting the Popularity Level of a Tip
(a) Average Precision
(b) Average Recall
Figure 6.5: Results for Tips in the Low Popularity Category.
in Figure 6.6). For tips with low popularity, the best average recall results are produced
by the median method. However, this result is a side effect of the large number of users
with median number of likes equal to zero (as discussed in Chapter 4) which favors the
median strategies. That is, the median strategies predict that most tips will have few
likes as illustrated in Table 6.4b. This leads to large precision and recall for the larger
class (low popularity) but very poor results for the smaller class.
Comparing the considered sets of predictor variables, we find a reasonable gain
in recall (up to 15% for SVR with linear kernel), for tips in the low popularity class
6.6. Experimental Results: Predictions at Posting Time 119
(a) Average Precision
(b) Average Recall
Figure 6.6: Results for Tips in the High Popularity Category.
using both user and venue features over using either set of features separately. For tips
with high popularity, there are no significant differences in the recall across the various
sets of predictors, provided that user features are included. Moreover, the use of venue
features jointly with user features improves the precision of the high popularity class
(Figure 6.6a) in up to 46% (35% for the OLS algorithm). That is, adding venue features
reduces the amount of noise when trying to retrieve potentially popular tips, with no
significant impact on the recall of that class, and is thus preferable over exploring only
user features. Note also that including content features as predictors lead to no further
120 Chapter 6. Predicting the Popularity Level of a Tip
(statistically significant) gains.
Comparing OLS, SVR and SVM with user and venue features as predictors, we
find only limited gains (if any) of the more sophisticated SVR and SVM algorithms
over the simpler OLS strategy. In particular, SVR (with RBF kernel) leads to a
small improvement (2% on average) in the recall of the high popularity class, being
statistically tied with SVM1. Yet, in terms of precision of that class, OLS outperforms
SVR by 13.5% on average, being statistically tied with SVM. We note that SVR tends
to overestimate the popularity of a tip, thus slightly improving the true positives of the
high popularity class (and thus its recall), but also increasing the false negatives of the
low popularity class, which ultimately introduces a lot of noise into the predictions for
the high popularity class (hurting its precision). Note that the limited gains in recall
of SVR over OLS come at the expense of a 30 times longer model learning process, for
a fixed training set.
Thus, the simpler OLS model produces results that, from a practical perspective,
are very competitive to (if not better than) those obtained with SVM and SVR.
6.6.2 Feature Importance
In the previous section, we concluded that the simpler linear OLS model using both
user and venue features produces the best trade-off between prediction accuracy and
complexity time of the model training phase. In this section, we evaluate the relative
importance of all features, including textual features. Focusing on the OLS prediction
strategy, we use a very popular feature selection technique, called Information Gain
[Yang and Pedersen, 1997], to analyze the importance of each feature to prediction in
our model (Section 6.6.2.1). Next, we examine the relationship between the response
variable (popularity level) and each feature used as predictor (Section 6.6.2.2). We
also investigate whether multicollinearity occurs in the regression and how it affects
the prediction accuracy (Section 6.6.2.3). Finally, we analyze the impact on the model
accuracy of removing features one at time according to the Information Gain metric
(Section 6.6.2.4).
6.6.2.1 Ranking of Features
As we did for evaluating the features of the ranking task (Section 5.5.3), we have
also sorted the features used by the OLS method using the Information Gain feature
selection technique [Yang and Pedersen, 1997].
1For the low popularity level, OLS outperforms SVR (with RBF kernel) by 4% on average being
statistically tied with SVM and SVR with linear kernel.
6.6. Experimental Results: Predictions at Posting Time 121
Table 6.5: Features Ranked by Information Gain.
Pos. Feature Pos. Feature Pos. Feature
1 user_avg_likes 44 tip_liwc_match 87 tip_conj_num
2 user_total_likes 45 tip_num_entities 88 tip_sent_num
3 user_std_likes 46 tip_ent_wrd 89 tip_liwc_affect
4 user_sn_size 47 tip_pos_ver 90 tip_url_num
5 user_median_likes 48 venue_verified 91 tip_liwc_pos
6 venue_visitors_num 49 tip_num_dist_entities 92 tip_liwc_pres
7 venue_cks_num 50 tip_pos_adv 93 tip_tobe_num
8 venue_category 51 tip_pos_adj 94 tip_liwc_quant
9 user_like_by_sn 52 tip_comp_wrd_num 95 tip_liwc_perc
10 user_type 53 user_median_tip_by_sn 96 tip_liwc_first
11 venue_total_likes 54 tip_cap_char 97 tip_liwc_past
12 venue_std_likes 55 tip_liwc_cog 98 tip_liwc_neg
13 venue_like_rk_pos_dsc 56 tip_pos_nn 99 tip_liwc_money
14 venue_avg_likes 57 tip_pos_comp 100 tip_liwc_body
15 venue_tip_num 58 tip_liwc_social 101 tip_liwc_cause
16 venue_date_rk_pos_asc 59 tip_pos_sup 102 tip_beg_prep
17 user_venue_visibility_total 60 tip_read_ari 103 tip_liwc_humans
18 user_sn_likes 61 tip_liwc_leisure 104 tip_pass_sent
19 user_tip_num 62 tip_pos_pp 105 tip_beg_pron
20 user_tipped_venues 63 user_mayor 106 tip_liwc_home
21 user_tip_by_sn 64 tip_liwc_time 107 tip_beg_art
22 venue_like_rk_pos_asc 65 tip_read_smog 108 tip_bad_wrd
23 user_std_like_by_sn 66 tip_uniq_wrd 109 tip_liwc_inhib
24 user_venue_visibility_std 67 tip_read_kincaid 110 tip_liwc_swear
25 user_venue_visibility_avg 68 tip_cap_wrd 111 tip_liwc_anger
26 tip_informat 69 tip_pos_num 112 tip_liwc_relig
27 venue_median_likes 70 tip_frac_space 113 tip_liwc_fut
28 tip_prep_num 71 tip_pron_num 114 tip_liwc_health
29 user_venue_visibility_median 72 tip_pos_fw 115 tip_liwc_anxiety
30 tip_long_sent_wrd 73 user_median_like_by_sn 116 tip_quest_sent
31 tip_syl_num 74 tip_neg_score 117 tip_liwc_sex
32 tip_pos_score 75 tip_read_fog 118 tip_beg_subor_conj
33 user_given_likes 76 tip_read_coliau 119 tip_liwc_sad
34 tip_liwc_relativ 77 tip_liwc_numb 120 tip_liwc_family
35 tip_char_num 78 tip_avg_syl_wrd 121 tip_beg_int_pron
36 tip_wrd_num 79 tip_liwc_work 122 tip_beg_conj
37 user_mayorships_num 80 tip_liwc_secon 123 tip_liwc_friend
38 user_avg_like_by_sn 81 tip_liwc_ingst 124 tip_liwc_nonfl
39 tip_neu_score 82 tip_read_flesch 125 tip_miss_wrd
40 tip_liwc_space 83 tip_liwc_motion
41 user_avg_tip_by_sn 84 tip_liwc_third
42 tip_avg_char_wrd 85 tip_pos_sym
43 tip_not_wordnet 86 tip_liwc_achieve
Table 6.5 shows the ranking of the considered features according to their Informa-
tion Gain, computed over all tips in our dataset. In consistency with the ranking task
(Chapter 6.4), we find that the most important features are related to the popularity of
the tip’s author: average, total and standard deviation of the number of likes received
by tips posted by the user (user_avg_likes, user_total_likes, user_std_likes, respec-
tively). Thus, the feedback received on previous tips of the user is the most important
factor for predicting the popularity level of her future tips. Figure 6.7, which shows the
122 Chapter 6. Predicting the Popularity Level of a Tip
complementary cumulative distribution functions (CCDF) of the best of these features
(average number of likes) for tips in each class, clearly indicates that it is very discrim-
inative of tips with different (future) popularity levels. Similar gaps exist between the
distributions of the other two aforementioned features.
Features related to the social network of the tip’s author are also important. The
number of friends/followers of the author and the total number of likes given by them
occupy the 4th and 9th positions of the ranking, respectively. Moreover, we find that
authors of tips that achieve high popularity tend to have more friends/followers (Figure
6.7b). Thus, the social network does play a more important role for tips with very high
popularity.
(a) Distribution of the Average Number of Likes per
Tip of the User
(b) Distribution of the Number of Friends/Followers
per User
Figure 6.7: Distribution of Most Important User Feature for Predicting a Tip’s Popu-
larity Level.
The best venue feature, which occupies the 6th position of the ranking, is the
number of unique visitors (Figure 6.8a). Moreover, the total number of check-ins (7th
position), the venue category (8th position), and the total number of likes received
by tips posted at the venue (11th position) (Figure 6.8b) are also very discriminative,
appearing above other user features, such as number of tips (19th position), in the
ranking.
Finally, note that the content features are the least important ones, which is
consistent with our discussion, in previous chapter, about the low correlation between
content features and number of likes per tip. For comparison purpose, Figure 6.9 shows
the distributions of the informativeness scores (defined in Section 5.3.3) and number
of prepositions per tip, the two most discriminative content features (26th and 28th
positions, respectively). Clearly these features cannot discriminate tips with different
6.6. Experimental Results: Predictions at Posting Time 123
(a) Number of Unique Visitors (b) Total Number of Likes per Tip
Figure 6.8: Distributions of the Most Important Venue Features for Predicting a Tip’s
Popularity Level.
(a) Informativeness (b) Number of Prepositions
Figure 6.9: Distribution of the Most Important Content Feature for Predicting a Tip’s
Popularity Level.
levels of popularity very well. These results are consistent with those in Section 5.5.3
thus unlike in other efforts to predict the popularity or assess the helpfulness of online
reviews [Kim et al., 2006; Zhang and Varadarajan, 2006], textual features are less
important in the tip popularity prediction task. In contrast, user-related features,
particularly those that capture the popularity of the tip’s author in her previous tips
and her social network, are much more important.
124 Chapter 6. Predicting the Popularity Level of a Tip
0 20 40 60 80 100 120
Features ID
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Ma
cr
o-
av
er
ag
e 
m
et
ric
Recall
Precision
Figure 6.10: Macro-Average Precision and Recall for OLS using one feature at time
6.6.2.2 Effects of Individual Features
To understand the effects of each individual feature on the response variable (popularity
level), we performed separate univariate linear OLS regressions with each feature, as
in Borghol et al. [2012].
Figure 6.10 presents the macro-average precision and recall results along with
corresponding 95% confidence intervals. Each point in the x-axis represents the results
of the OLS regression with each given feature, sorted by the Information Gain (Table
6.5), used as single predictor. The first point (x = 0) refers to the original model with all
features. As we can see, the precision values are similar across all strategies, except at
the end, in which the regressions were performed using the least discriminative features.
By looking at the macro-average recall values, we note these values are larger for the
most discriminative features, which, as discussed in the previous section, are related to
the number of likes received by the tips previously posted by the user (user_avg_likes,
user_std_likes,user_median_likes, user_total_likes). Finally, we observe that the
highest recall value is obtained by the full model, which reinforces the importance of
a multivariable model. Next, we present a collinearity analysis to identify correlated
features, which might have a negative effect in the regression results.
6.6.2.3 Multicollinearity Analysis
Multicollinearity occurs in linear regression models when two or more predictor vari-
ables are linearly correlated. Although the SVM and SVR methods are capable of
handling a high degree of collinearity [Howley et al., 2006], the OLS model might be
severely impacted [Jain, 1991], as multicollinearity may increase the variance of the
OLS coefficient estimates, degrading model predictability.
6.6. Experimental Results: Predictions at Posting Time 125
Table 6.6: Features with High Collinearity with at Least One Other Feature.
ID1 VIF Tolerance ID1 VIF Tolerance ID1 VIF Tolerance
16 507.83 ± 44.80 0.004 ± 0.00 56 34.33 ± 1.74 0.06 ± 0.00 19 16.52 ± 2.07 0.13 ± 0.01
15 215.71 ± 15.94 0.01 ± 0.00 42 32.69 ± 2.29 0.06 ± 0.01 38 16.52 ± 2.10 0.13 ± 0.02
11 101.03 ± 4.78 0.02 ± 0.00 21 27.19 ± 3.47 0.08 ± 0.01 20 15.33 ± 1.67 0.13 ± 0.01
36 96.01 ± 9.53 0.02 ± 0.00 47 21.92 ± 0.99 0.09 ± 0.00 30 15.19 ± 0.38 0.13 ± 0.00
35 80.60 ± 7.81 0.03 ± 0.00 93 21.36 ± 1.19 0.09 ± 0.01 4 13.95 ± 1.67 0.15 ± 0.02
31 59.83 ± 4.17 0.03 ± 0.00 104 21.02 ± 1.24 0.09 ± 0.01 62 12.85 ± 0.60 0.15 ± 0.01
13 49.46 ± 3.08 0.04 ± 0.00 88 19.85 ± 1.47 0.10 ± 0.01 23 12.64 ± 0.91 0.16 ± 0.01
39 45.92 ± 2.66 0.04 ± 0.00 45 19.10 ± 1.17 0.10 ± 0.01 74 12.56 ± 0.80 0.16 ± 0.01
25 44.03 ± 1.25 0.05 ± 0.00 49 18.60 ± 1.21 0.11 ± 0.01 34 12.52 ± 0.58 0.15 ± 0.01
34 43.59 ± 2.76 0.05 ± 0.00 14 18.57 ± 1.11 0.11 ± 0.01 50 11.86 ± 0.54 0.16 ± 0.01
22 40.11 ± 1.51 0.05 ± 0.00 51 18.53 ± 0.72 0.11 ± 0.00 6 11.84 ± 0.53 0.16 ± 0.01
9 37.20 ± 6.03 0.06 ± 0.01 17 18.50± 0.57 0.11 ± 0.00 44 10.23 ± 0.43 0.19 ± 0.01
32 34.99 ± 2.15 0.06 ± 0.00 1 18.49 ± 0.71 0.11 ± 0.00
1 The feature IDs are defined by the position in the Information Gain ranking shown in Table 6.5.
There are several methods to test multicollinearity. As in Borghol et al. [2012],
we here test whether our OLS prediction model is impacted by multicollinearity using
two methods: variance inflation factors (VIF) [Stevens, 2002] and tolerance [O’Brien,
2007]. The VIF for a predictor k, V IFk, indicates whether there is a strong linear
association between k and all the remaining predictors. If multicollinearity exists, the
variance of an estimated regression coefficient bi is inflated by V IFk because of the
correlation among the predictor variables in the model. The variance inflation factor
for the ith predictor is computed as:
V IFi =
1
1−R2i
where R2i is the coefficient of determination obtained by a regression model built using
the ith predictor as response variable and the remaining predictors as inputs. Stevens
[2002] recommends a heuristic VIF greater than 10 as an indication of multicollinearity
requiring correction. The tolerance is measured as 1 − R2i , where R2i is computed in
the same way as in VIF. A tolerance of less than 0.10 indicates a multicollinearity
problem [O’Brien, 2007].
We compute the VIFs and tolerances for all features of our complete OLS model.
Table 6.6 lists the features with VIF values above 10, with their corresponding average
VIF and tolerance values (and 95% confidence intervals). Several features are collinear
with at least one other feature in the model. This is not totally unexpected as most
collinear features are derived from other features (e.g., total and average numbers of
likes of the user). Note also that some of these features have high Information Gain.
Next, we perform an experiment with the OLS method, eliminating one collinear
feature at a time. We also compare the OLS model without the collinear feature and the
126 Chapter 6. Predicting the Popularity Level of a Tip
0 5 10 15 20 25 30 35 40
Feature ranking by VIF
0.0
0.2
0.4
0.6
0.8
1.0
A
ve
ra
g
e 
m
et
ri
c
Macro-averaged recall
Macro-averaged precision
(a) Macro-Average Results
0 5 10 15 20 25 30 35 40
Feature ranking by VIF
0.0
0.2
0.4
0.6
0.8
1.0
A
ve
ra
g
e 
re
ca
ll
Low popularity
High popularity
(b) Average Recall
Figure 6.11: Macro-Average Results for OLS After Removing Each Collinear Feature.
original complete model. Figure 6.11a shows the macro-average results, while Figure
6.11b shows the average recall values for each popularity category. Each point in the
x-axis represents a scenario when one collinear feature is eliminated: the eliminated
feature is identified by its position in the ranking of VIF values (see Table 6.6), starting
with the feature with largest VIF (i.e., feature ID 16). The first point (x = 0) refers to
the original method with all features. The figures show that the multicollinearity does
not impact any of our metrics, i.e., the gains eliminating each multicollinear feature
are not statistically significant neither for macro-average precision and recall nor for
average per-class recall.
6.6.2.4 Feature Removal using Information Gain
In our first experiments, we evaluated the relative importance of each group of feature,
but we were not able to see which features are redundant in each group. Next, we
perform an experiment removing one feature at a time cumulatively, in increasing
order of importance given by the Information Gain as we did for the ranking task.
Figure 6.12 shows the impact on the macro-average recall and on the recall of
the high popularity class as each feature is removed, starting with the complete set of
user, venue and content features. For example, the second point in each graph shows
results after removing the least discriminative feature (number of common misspellings
per tip). We omit results for macro-average precision as they are not affected by the
feature removal mainly because of the imbalance problem, discussed in Section 6.5.
Note that as observed for the ranking task (Section 5.5.3), the classification task also
has several features that are redundant, which means that the removal of many of
6.7. Experimental Results: Other Prediction Scenarios 127
020406080100120
# of remaining features
0.0
0.2
0.4
0.6
0.8
1.0
Re
ca
ll
(a) Macro-Average Recall
020406080100120
# of remaining features
0.0
0.2
0.4
0.6
0.8
1.0
Re
ca
ll
(b) Average Recall for High Popular Tips
Figure 6.12: Recall for OLS When Removing One Feature at a Time.
the least discriminative features has no significant impact on recall. Significant losses
in recall are observed only after we start removing features in the top-10 positions of
the ranking. Among those, the largest losses are observed when the number of check
ins at the venue (consistent with the ranking task), and the size of the user’s social
network are removed, which reinforces the importance of including venue and social
network features for the prediction task. In sum, using the top 10 most important
features produces predictions that are as accurate as those of using the complete set of
features.
6.7 Experimental Results: Other Prediction
Scenarios
Ideally, one would like to predict the future popularity of a tip immediately after it
is posted. This was the scenario considered in the previous section. However, tips
exhibit different popularity evolution patterns, as observed in Chapter 4 as well as for
other types of content (e.g., YouTube videos [Crane and Sornette, 2008; Pinto et al.,
2013], Digg news [Szabo and Huberman, 2010], and tweets [Hong et al., 2011]). By
monitoring the tips for a certain (short) time interval (ε units of time), we may be able
to gather useful information about how its popularity is evolving, which may improve
prediction accuracy.
Similarly, so far we have considered only predictions targeting one month ahead.
However, depending on how the popularity of a tip evolves over time, we may be
able to make accurate predictions further in the future (i.e., larger values of δ). One
128 Chapter 6. Predicting the Popularity Level of a Tip
concern in this case is that, as we predict further into the future, the information used
as predictors may get outdated, hurting prediction accuracy. Thus, it is interesting to
evaluate how further into the future our models are still able to produce reasonably
accurate predictions.
In order to address these questions, we analyzed the impact of varying the mon-
itoring time ε and the target prediction window δ on prediction accuracy in Sections
6.7.1 and 6.7.2, respectively.
6.7.1 Prediction Results Varying the Monitoring Period ε
We investigated the accuracy of our prediction models for duration of the monitoring
period (ε) equal to 1, 12, 24, 72 and 168 hours (one week), fixing the target time δ
equal to 1 month. Recall that, in these scenarios, we use as predictors all the features
listed in Section 6.4 as well as the number of likes already received by the tip pi For a
given ε, the values of the predictors are computed taking all past history up to ε hours
after the tip’s posting time.
Figure 6.13 presents macro-average precision and recall results for each prediction
method: SVM, SVR (with both linear and RBF kernels) and OLS. These results are
produced using the complete set of features. The figure also shows results for the
median baseline using all user features as well as only user/cat features (referred to as
median-cat). Note that the results for ε=0 are the same results presented in Section
6.6.1.
We note that extending the monitoring time to only 1 hour after the tip is posted
only slightly improves the macro-average recall (up to 1.60% for SVMmethod), which is
expected given the slow evolution of tip popularity observed (see discussion in Chapter
4). Yet, by monitoring the tip for one week (ε=168 hours) we can improve the macro-
average recall in up to 13% (SVM method) and the macro-average precision in up to
7% (SVR linear method) over using features computed at posting time. Moreover,
such improvements are observed for all methods, although the median baselines are
still much worse than our solutions. In particular, the OLS model remains as the most
cost-effective prediction method as it produces results that are statistically as good as
those obtained with the other (more costly) methods, for all considered values of ε.
Moreover, out of all considered features, the total number of likes received during
the monitoring period (ε) is the most discriminative feature according to the Infor-
mation Gain criterion. This indicates the importance of taking the early popularity
evolution as evidence for prediction.
In order to assess to which extent the other features contribute to prediction ac-
6.7. Experimental Results: Other Prediction Scenarios 129
0 1 12 24 72 168
ε (hours)
0.0
0.2
0.4
0.6
0.8
1.0
Pr
ec
isi
on
Median
Median cat
OLS
SVR-rbf
SVM
SVR-linear
(a) Macro-Average Precision
0 1 12 24 72 168
ε (hours)
0.0
0.2
0.4
0.6
0.8
1.0
Re
ca
ll
Median
Median cat
OLS
SVR-rbf
SVM
SVR-linear
(b) Macro-Average Recall
Figure 6.13: Macro-Average Results for Various Monitoring Times ε (δ = 1 month).
curacy, we also compare our OLS strategy against three other state-of-the-art baseline
models that exploit only early popularity measurements. The first one, proposed by
Szabo and Huberman [2010] and referred to as S-H model, is a univariate linear re-
gression model on logarithmic scale that uses only the total number of likes received
during the monitoring period as predictor. The other two baselines, proposed by Pinto
et al. [2013], are extensions of the S-H model. One is a multivariable linear regression
model that uses early popularity measures sampled at regular intervals (e.g., per day)
during the monitoring period as predictors. The other builds on this model by also
using Radial Basis Functions (RBFs) to capture the similarity (in the early popularity
130 Chapter 6. Predicting the Popularity Level of a Tip
Table 6.7: Macro-Average Results of Models that Use Early Popularity Measurements
(only tips with at least 1 like, ε=168 hours, δ=1 month).
Models
Metrics OLS ML MRBF S-H
Macro-average Recall 0.8263 ± 0.0129 0.7257 ± 0.0064 0.8003 ± 0.0085 0.8395 ± 0.0100
Recall low popularity 0.8305 ± 0.0186 0.9960 ± 0.0005 0.9619 ± 0.0022 0.9240 ± 0.0010
Recall high popularity 0.8220 ± 0.0189 0.4553 ± 0.0131 0.6386 ± 0.0176 0.7549 ± 0.0205
Macro-average Precision 0.5650 ± 0.0061 0.8799 ± 0.0145 0.6674 ± 0.0129 0.6158 ± 0.0112
Precision low popularity 0.9930 ± 0.0012 0.9827 ± 0.0021 0.9879 ± 0.0017 0.9913 ± 0.0015
Precision high popularity 0.1369 ± 0.0126 0.7770 ± 0.0308 0.3469 ± 0.0271 0.2403 ± 0.0237
measurements) between the target tip and selected examples from the training set.
These variations are referred to as ML and MRBF, respectively. We use the S-H, ML
and MBRF models as originally proposed, that is, to predict the total number of likes
of a tip. We then use the predicted number to infer the corresponding popularity level.
We evaluate all models in the same scenario adopted by Szabo and Huberman
[2010]; Pinto et al. [2013], i.e., ε equal to 168 hours and δ equal to 1 month. As
discussed in Section 6.5.1, model parameters are defined using cross-validation in the
training set. The number of RBF functions in the MRBF model was set to 100, as in
Pinto et al. [2013].
We start by noting that although the S-H, ML and MRBF models can be directly
applied to predict tip popularity, their use is constrained to tips with at least one like
at the target time, since the models are solved by minimizing the mean relative squared
error (MRSE) over the training set, which is undefined for tips with zero likes. Thus,
in order to favor the baselines in our evaluation, we disregarded tips with zero likes,
corresponding to 83% of all tips in our dataset1. Macro-average recall and precision
results obtained with all models over the 17% remaining tips are shown in Table 6.72.
Focusing first on recall, our primary metric of interest, we note that our OLS
model produces the best average recall results for the high popularity class, with gains of
80%, 29% and 9% over the ML, MRBF and S-H models, respectively. The baselines are
more biased towards the less popular tips, favoring the recall of this class. Yet, in terms
of macro-average recall, the OLS model still outperforms both ML and MRBF models,
being statistically tied with the S-H model. The gains of OLS in recall, particularly for
the high popularity class, come at the cost of a decrease in precision. The baselines,
especially ML, are able to better filter false positives out, leading to higher precision.
1In this setup, the low popularity class is defined as tips with number of likes ranging from 1 to 4.
2Note that the OLS results shown in this table are different from those in Figure 6.13, computed
over all tips.
6.7. Experimental Results: Other Prediction Scenarios 131
In sum, we find that the baselines, particularly the ML model, perform quite well
in the considered scenario, confirming previous results, except in terms of the recall of
the high popularity class. However, this metric is of particular interest if one is aiming
at retrieving most of the potentially more popular tips even if this comes at the expense
of some noise. In such case, our OLS model is preferable.
Moreover, we emphasize that our solution is more robust, since, unlike the base-
lines, it can be applied to any tip, at or after posting time. In particular, it can be
applied to tips that have not received any like yet (i.e., unpopular tips or tips that have
just been posted). The baselines, instead, are more suitable to types of content that
exhibit a faster popularity evolution (e.g., news, videos)1. Given the results discussed
in Chapter 4, it is very likely that the baselines will not be applicable to the vast
majority of the tips, even for other values of δ. Our solution is thus more general and
preferable.
6.7.2 Prediction Results Varying Target Prediction Window δ
We now analyze how prediction accuracy is affected as we vary the target prediction
window δ from 1 to 6 months. In all scenarios, we set ε to 0, thus focusing on predictions
at posting time. We evaluate the OLS, SVM and SVR (with both kernels) models using
all features. As baselines, we consider once again the median and median-cat models,
since the S-H, ML and MRBF methods are not suitable to predictions at posting time.
Figure 6.14 shows macro-average precision and recall results for all methods and
values of δ2. Note that the results for δ = 1 are the same as those presented in
Section 6.6.1. We find that, for any given value of δ, OLS, SVR (with both kernels)
and SVM produce statistically tied results in terms of macro-average precision (Figure
6.14a). Moreover, the gains (if any) in macro-average recall (Figure 6.14b) of the SVR
and SVM methods over OLS are limited (up to 3.15% when δ = 5 months). Thus,
once again, OLS is a cost-effective solution, from a practical perspective, for various
values of δ.
Moreover, we find the same trend for all methods: as δ increases, precision in-
creases slightly but at the cost of a (small) reduction on recall. For example, comparing
the predictions done by OLS for δ equal to 1 and 6 months, we observe a small im-
provement in macro-average precision (3.37%) for the latter, but also a decrease in
1Pinto et al. [2013] reported that less than 1.5% of the videos in their datasets had not received
any view in the first month in the system. This is in sharp contrast to 83% of the tips with no likes
in the same period in pur dataset.
2The tip distribution across classes may not be the same for experiments with different values of
δ since tips from the low popularity class may move to the high popularity class as δ increases.
132 Chapter 6. Predicting the Popularity Level of a Tip
1 2 3 4 5 6
δ (months)
0.0
0.2
0.4
0.6
0.8
1.0
Pr
ec
isi
on
Median
Median cat
OLS
SVM
SVR-rbf
SVR-linear
(a) Macro-Average Precision
1 2 3 4 5 6
δ (months)
0.0
0.2
0.4
0.6
0.8
1.0
Re
ca
ll
Median
Median cat
OLS
SVM
SVR-rbf
SVR-linear
(b) Macro-Average Recall
Figure 6.14: Macro-Average Results for Various Target Times δ (ε=0).
macro-average recall (5.9%). As shown in Figure 4.15, for a large fraction of tips, most
likes are gained after 3 months, which suggests that a tip may take a long time to
define its popularity level. Yet, we still observe a loss in macro-average recall of 3.7%
when predictions are done for δ=3. The losses in recall occur in both classes, reaching
6.8. Model Specialization 133
6.35% and 5.47% for low and high popularity classes respectively, for δ equal 6.
The gains in macro-average precision observed as δ increases come from a higher
precision for high popularity class, which, in turn, is due to the reduction of class
imbalance (which severely hurts the precision of the smaller class) as more tips migrate
from the low to the high popularity class. This migration might also partially explain
the losses in recall. More generally, as we predict further ahead, model inputs (feature
values) become outdated and less efficient for prediction purposes. Given our results,
we find that predictions for up to 2 months ahead are mostly unaffected by outdated
features. For longer periods, the reduction in recall starts becoming significant.
6.8 Model Specialization
So far, we have built and evaluated prediction models that were trained using all tips
in the training set. This approach produces a single general prediction model that ag-
gregates and summarizes the relationships between the predictors (feature values) and
the response (popularity level) across all tips. In this section, we analyze whether we
can improve prediction accuracy by building models that are specialized to particular
groups of tips such as tips posted at venues located in the same geographical region or
of the same category. Model specialization might bring up patterns that are inherent
to that particular group of tips, but are masked when all tips are treated jointly. For
example, venues in different categories might exhibit different patterns: while “Travel
& Transport” is the most popular venue category in terms of number check-ins, “Food”
is the category that attracts the largest number of tips [Li et al., 2013]. Model special-
ization might improve accuracy as fewer instances of noise are used to train the models.
On the other hand, specialization might also suffer from the lack of enough training
instances, which impacts prediction accuracy as it affects the model’s capacity to gen-
eralize, or from a more severe class imbalance which, due to the need of undersampling
in the training set, ends up severely restricting the amount of training examples.
We here assess the benefits from building specialized models for specific cities
(Section 6.8.1) and venue categories (Section 6.8.2). To that end, we compare special-
ized and general models, built using the same method (OLS, SVM or SVR) on the
same test set, when performing predictions at posting time for one month in the future
(i.e., ε = 0 and δ = 1 month).
We adopt the same general experimental setup described in Section 6.5.1, learning
model parameters through cross-validation in the training set. There are two key
differences though. First, the set of tips used as input to the experimental procedure is
134 Chapter 6. Predicting the Popularity Level of a Tip
restricted to tips posted in venues of the target city or category in case of specialization.
Second, when training either the general or the specialized models, we here consider
multiple tips posted by the same user as candidates for prediction, since, using only
the most recent tip of each user, as discussed in Section 6.5.1, severely constrains the
amount of data available for training.
6.8.1 City-Based Model Specialization
We start by assessing the benefits of building specialized models for specific cities. To
that end, we build models using tips posted at venues located in four selected cities,
namely New York (NY), Los Angeles (LA), San Francisco (SF), and Chicago (CHI)1.
For each city, we compare the specialized model against a general model using the
same test set composed of only tips posted at venues of the target city. Moreover, for
a fair comparison, the general model is built using a training set of the same size of the
one used to learn the specialized model. However, the former consists of tips posted in
venues of all cities in the dataset, randomly sampled from the original (global) training
set. Specifically, the training sets used to learn the models for the NY, LA, SF and
CHI scenarios contain 1141, 293, 308, and 183 tips, respectively. Similarly, the learned
models were applied on test sets including 5085, 1551, 1695 and 1443 tips, respectively.
Table 6.8: Geographical Model Specialization: Macro-Average Results
Macro-Average Recall
Scenarios
General Model Specialized Model
OLS SVM SVR-rbf SVR-linear OLS SVM SVR-rbf SVR-linear
NY 0.8123 ± 0.0290 0.8175 ± 0.0278 0.8286 ± 0.0253 0.8267 ± 0.0277 0.8147 ± 0.0169 0.8099 ± 0.0176 0.8242 ± 0.0189 0.8069 ± 0.0287
SF 0.7817 ± 0.0271 0.8395 ± 0.0313 0.8520 ± 0.0197 0.8519 ± 0.0167 0.8041 ± 0.0430 0.8315 ± 0.0486 0.8185 ± 0.0556 0.8177 ± 0.0465
CHI 0.7379 ± 0.0295 0.8038 ± 0.0370 0.8161 ± 0.0328 0.8188 ± 0.0337 0.7206 ± 0.0476 0.7701 ± 0.0501 0.7767 ± 0.0504 0.7734 ± 0.0506
LA 0.7860 ± 0.0309 0.8421 ± 0.0314 0.8545 ± 0.0313 0.8638 ± 0.0326 0.7983 ± 0.0504 0.8489 ± 0.0383 0.8594 ± 0.0326 0.8540 ± 0.0330
Macro-Average Precision
Scenarios
General Model Specialized Model
OLS SVM SVR-rbf SVR-linear OLS SVM SVR-rbf SVR-linear
NY 0.5336 ± 0.0067 0.5345 ± 0.0072 0.5332 ± 0.0075 0.5372 ± 0.0083 0.5537 ± 0.0090 ↑ 0.5499 ± 0.0081 ↑ 0.5485 ± 0.0091 ↑ 0.5429 ± 0.0093
SF 0.5121 ± 0.0022 0.5204 ± 0.0065 0.5201 ± 0.0053 0.5244 ± 0.0074 0.5221 ± 0.0040 ↑ 0.5338 ± 0.0070 ↑ 0.5369 ± 0.0087 ↑ 0.5369 ± 0.0094 ↑
CHI 0.5098 ± 0.0015 0.5158 ± 0.0034 0.5163 ± 0.0031 0.5189 ± 0.0036 0.5118 ± 0.0023 0.5259 ± 0.0051 ↑ 0.5269 ± 0.0046 ↑ 0.5272 ± 0.0050 ↑
LA 0.5169 ± 0.0035 0.5242 ± 0.0049 0.5260 ± 0.0051 0.5314 ± 0.0064 0.5265 ± 0.0072 ↑ 0.5407 ± 0.0078 ↑ 0.5426 ± 0.0078 ↑ 0.5423 ± 0.0083 ↑
Table 6.8 shows macro-average recall and precision results, along with corre-
sponding 95% confidence intervals, for each model and method. For each scenario
(city), the best methods (including statistical ties) are shown in bold. A ↑ (or ↓) sign
is used to indicate a statistical improvement (or loss) of the specialized model over the
corresponding general model. The lack of a sign indicates a statistical tie.
1These cities were selected as they have the largest number of tips in our dataset.
6.8. Model Specialization 135
We observe that the specialized models outperform (or at least are statistically
tied with) the corresponding general models in the vast majority of the cases. The
improvements occur in terms of macro-average precision, varying from 1.60% to 3.76%.
Such improvements in precision occur in the high popularity class. Indeed, the gains
in average precision for that class achieve 84.87% for the SF scenario with SVR (with
RBF kernel) model. In terms of macro-average recall, both specialized and general
models are statistically tied. Moreover, unlike observed in the previous sections, we
do find cases where SVM and SVR (with RBF kernel) significantly outperform the
simpler OLS in terms of macro-average precision (up to 3.07%). Note, for example,
the difference between these methods for SF, CHI and LA. Such gains are not directly
related to the specialization, but rather to the greater robustness of SVM and SVR
to smaller training sets [Alwee et al., 2013]. Indeed such gains are observed for both
general and specialized models.
In sum, we find that city-based model specialization does bring some improve-
ments, particularly in terms of precision, as specialized models are able to more ac-
curately capture patterns that are specific to the target city, reducing the amount of
false negatives. One point to note, though, is that the amount of information available
to learn a specialized model is inevitably smaller, if compared to a general model, and
may require the use of techniques that are more robust to the lack of training instances.
As a final observation in this analysis, we note that by evaluating specialized
models for each city, we are also analyzing the benefits from adding spatial (i.e., geo-
graphic) factors to our prediction models. We consider spatial factors at the city level
because, on Foursquare, the geographic information associated with each user, one of
the central entities of the popularity prediction problem, is available only at the city
level.
We also note that spatial information at a
finer granularity (i.e., latitude and longitude coordinates) is available only for
venues, and our models do capture patterns associated with particular venues by taking
venue-specific features into account. The city-based model specialization captures new
factors that may exist due to spatial locality of tip popularity patterns at the city level.
We leave to future work a more thorough investigation of other spatial factors as well
as other strategies to introduce them to the popularity prediction models.
6.8.2 Category-Based Model Specialization
Finally, we analyze the benefits of building specialized models for each venue cate-
gory. Recall that Foursquare defines nine top level venue categories, namely, “Arts &
136 Chapter 6. Predicting the Popularity Level of a Tip
Table 6.9: Categorical Model Specialization: Macro-Average Results.
Macro-Average Recall
Scenarios
General Model Specialized Model
OLS SVM SVR-rbf SVR-linear OLS SVM SVR-rbf SVR-linear
Arts 0.8010 ± 0.0132 0.7865 ± 0.0139 0.7959 ± 0.0153 0.7848 ± 0.0371 0.7832 ± 0.0108 ↓ 0.7937 ± 0.0124 0.7896 ± 0.0112 0.7795 ± 0.0351
Food 0.7884 ± 0.0193 0.7972 ± 0.0188 0.7853 ± 0.0196 0.7682 ± 0.0153 0.8140 ± 0.0164 ↑ 0.8250 ± 0.0153 ↑ 0.8262 ± 0.0137 ↑ 0.7627 ± 0.0096
Night 0.7921 ± 0.0102 0.8048 ± 0.0079 0.8036 ± 0.0081 0.7707 ± 0.0167 0.8065 ± 0.0113 ↑ 0.8134 ± 0.0102 0.8008 ± 0.0086 0.7627 ± 0.0075
Shops 0.8119 ± 0.0347 0.8130 ± 0.0310 0.8124 ± 0.0337 0.7042 ± 0.0359 0.8187 ± 0.0331 0.8156 ± 0.0353 0.8092 ± 0.0355 0.7609 ± 0.0366 ↑
Macro-Average Precision
Scenarios
General Model Specialized Model
OLS SVM SVR-rbf SVR-linear OLS SVM SVR-rbf SVR-linear
Arts 0.5279 ± 0.0059 0.5237 ± 0.0058 0.5255 ± 0.0065 0.5627 ± 0.0026 0.5470 ± 0.0076 ↑ 0.5497 ± 0.0089 ↑ 0.5494 ± 0.0095 ↑ 0.5658 ± 0.0025 ↑
Food 0.5258 ± 0.0036 0.5288 ± 0.0048 0.5271 ± 0.0053 0.5234 ± 0.0034 0.5194 ± 0.0026 ↓ 0.5186 ± 0.0027 ↓ 0.5212 ± 0.0040 ↓ 0.5360 ± 0.0047 ↑
Night 0.5256 ± 0.0049 0.5275 ± 0.0050 0.5314 ± 0.0064 0.5450 ± 0.0067 0.5292 ± 0.0032 0.5328 ± 0.0047 0.5383 ± 0.0066 0.5448 ± 0.0071
Shops 0.5232 ± 0.0042 0.5244 ± 0.0047 0.5246 ± 0.0054 0.5285 ± 0.0094 0.5271 ± 0.0041 0.5261 ± 0.0044 0.5261 ± 0.0044 0.5363 ± 0.0061
Entertainment”, “Colleges & Universities”, “Food”, “Great Outdoors”, “Nightlife Spots”,
“Travel and Transport”, “Shops”, “Professional & Other Places” and “Residences”. We
built specialized models for four selected categories, namely “Arts & Entertaining”
(Arts), “Food”, “Shops & Services" (Shops), and “Nightlife Spots” (Night)1. As before,
we compare each specialized model with the corresponding general model (considering
all categories) on the same test set of tips posted at venues of the target category.
Moreover, both models are learned with training sets of same size. Specifically, the
training sets used to learn the models for the Arts, Food, Shops and Night scenarios
consist of 1390, 2093, 1326, 1086 tips, respectively, whereas corresponding test sets
include 5790, 49341, 17965, and 9685 tips.
Table 6.9 shows macro-average recall and precision results, along with correspond-
ing 95% confidence intervals, for each model and method. Like in Table 6.8, results for
the best methods (including statistical ties) for each category are shown in bold, and
↑ and ↓ signs are used to indicate gains and losses due to specialization.
Our first observation is that category-based model specialization does not bring
as large and clear improvements over the general model as the city-based specialization.
On one hand, we do observe some statistical improvements in macro-average recall and
precision of up to 8.06% and 4.98%, respectively. Yet, statistical losses in the same
metrics of up to 2.22% and 1.92% are also observed. Overall, we find that the results of
the specialized models are only marginally different (if not tied with) those produced
by the corresponding general models.
Such small differences are mainly due to the fact that the venue category is
already somewhat explored by our (general) model as a feature. Indeed, as discussed in
Section 6.6.2, venue category is one of the top 10 most important features for popularity
prediction, implying that different patterns may exist across different categories. Yet,
1These categories were selected as they have the largest numbers of tips in our dataset.
6.8. Model Specialization 137
since this feature is already part of the general model, specialization for each category
does not bring as much new information to the model as the city-based specialization
does (which, as discussed, introduces factors related to city-level spatial locality to the
model). This is why we observed a tie between general and specialized models in various
scenarios. The statistically significant differences (improvements and losses) observed
in a few cases (e.g., OLS on Food and Arts categories) are caused by differences in
the training set distributions used to build both general and specialized models. These
differences in turn are an indirect result of the specialization, as we further explain
next.
Recall that in our experiments, for a given category, the training sets used to
learn both specialized and general models have the same number of tips. However,
the training set of the general model contains tips from all categories. Thus, if we
consider only tips of the target category, the number of tips in each class (and the class
imbalance) may be different in both training sets.
Take, for example, the case of the Food category. Table 6.9 shows that, for OLS,
SVM and SVR (with RBF kernel), the specialization does bring some improvements in
macro-average recall but losses in macro-average precision. We manually investigated
the tips in the training sets (some folds) used to build both models. Focusing only on
tips in Food venues, the training set of the specialized model includes a proportionally
much larger number of tips of high popularity than the training set of the general model.
This favors the classification of tips in the high popularity class by the specialized
model, which leads to improvements in recall for that particular class. However, as a
side effect, the number of tips in the low popularity class that are incorrectly classified
also increases, which leads to losses in precision (once again for the high popularity
class, as it is smaller and more sensitive to changes).
For the Arts category, we observe an opposite trend: specialization leads to losses
in macro-average recall but improvements in macro-average precision. Once again, this
can be explained by the different numbers of tips in each class in both training sets,
considering only venues in the Arts category. Compared to the training set of the
general model, the training set of the specialized model includes a proportionally much
larger number of tips of the low popularity class. This favors the classification of tips in
that class, which hurts recall of the high popularity class but also improves its precision.
We note that different techniques (OLS, SVM, SVR) may be more or less robust to
such differences in training set. We also emphasize that it was not expected beforehand
that the category-based model specialization would not bring consistent improvements
over the general model, despite the latter including venue category as a feature. The
two prediction models exploit venue category very differently. In the general model,
138 Chapter 6. Predicting the Popularity Level of a Tip
this information is another dimension of the feature space considered. Given the high
dimensionality of this space (125 features), using only tips of the target category to learn
the model could help to reduce noise and significantly improve prediction accuracy. Yet,
our results revealed that the specialization is not consistently beneficial and may also
hurt prediction accuracy.
Thus, given such inconsistent performance, and considering that the differences,
when significant, are small, we argue that category-based model specialization is not
worthwhile because the main additional information, venue category, is already con-
sidered by the general model (though in a different way).
As a final observation, we also note that the gains of the more sophisticated
SVM and SVR over OLS are not as large as observed for the city-based specialization.
They are constrained to at most 3.77% and 3.44% for macro-averaged recall and pre-
cision, respectively, which limit their benefits over the simpler OLS, from a practical
perspective. The limited benefits of SVM and SVR are probably due to the amounts
of training examples available in the category-based scenarios, which unlike in some
of the city-based scenarios, are large enough for OLS to produce reasonably accurate
results.
6.9 Summary
In this chapter, we tackled the problem of predicting the popularity level of a tip
in Foursquare. To that end, we investigated the use of various classification and
regression-based strategies, notably SVM, SVR, and OLS, along with different sets
of features to build prediction models.
This is a challenging problem which, in comparison with other types of tasks
(e.g., assess the helpfulness of longer reviews, estimate the number of views of a video,
or predict the ranking of a group of tips), has unique aspects and inherently different
nature, and may depend on a non-trivial combination of various user, venue and content
features. Nevertheless, some of our proposed models were able to produce good results
(over 80%) in terms of recall, though precision results were limited in particular due
to the severe imbalance across different popularity levels.
We note that, despite having covered a very extensive set of features, we may have
left out other factors that also influence tip popularity, such as characteristics of the
interface design, and other types of interaction (e.g., through search engines). In order
to gather more evidence that support this insight and further analyze the complexity
of our popularity prediction problem, we evaluated whether the values of the features
6.9. Summary 139
considered exhibit “locality”. In other words, we analyze whether the tips can grouped
by their similar patterns and eventually be correlated to a popularity category.
For this experiment, we considered the top-10 most discriminative features ac-
cording to Information Gain which, as discussed in Section 6.6.2.4, produces prediction
results as good as those produced by the complete set of 125 features. We defined a
10-dimensional space based on the selected features. The values of each feature were
discretized into four intervals1, namely, [0; 0.25], (0.25; 0.5], (0.5; 0.75], and (0.75; 1].
We then divided the 10-dimensional space into 1, 048, 576 sectors2, each one defined
by a tuple of 10 (discretized) feature values (one of each feature).
Our first analysis focused on how tips are spread across such sectors. Considering
only sectors with at least 2 tips3, we analyze how the popularity classes are spread
across the selected sectors. For this analysis, we considered the true popularity of
each tip measured 1 month after posting time. We observe that the majority of the
selected sectors (83%) contains only low popularity tips, and the tips in those selected
sectors account for over 59% of the total number of tips. This shows that tips from low
popularity level have a very small locality in terms of the feature values, suggests that
these tips do not have a clear “signature” (i.e., one or few sectors that contain most
of them). In contrast, we find that the high popularity are spread across a smaller
fractions of sectors (18%). However, the vast majority of those sectors (90%) contain a
clear majority of tips of the low popularity level. That is, most sectors containing high
popularity tips are indeed dominated by low popularity tips. We find that over 84%
of the high popularity tips are in those sectors. These fractions reflect the severe class
imbalance that challenges our solutions, and provide further evidence of the complexity
of the tip popularity prediction problem.
Our second analysis focused on the accuracy of our predictions. We considered
our best prediction model (OLS), running it into two scenarios: a ideal scenario that
considered tips from the training set as the testing set (Figure 6.15) and a more realistic
scenario in which the model was evaluated using another set of tips for testing set
(Figure 6.16). In both cases, we considered predictions to 1 month ahead and performed
at posting time (ε = 0, and δ = 1 month). We measured the accuracy rate (fraction of
the number of correct classifications done by OLS) for each sector, and we correlate it
with the dominant (true) popularity category in the sector.
If the model is capturing the most important aspects of the problem, we expect
that it should have higher accuracy in sectors containing the majority of tips of the same
1Recall that features are already normalized between 0 and 1.
2We omitted sectors with less than two tips.
3We find 38% sectors with only one tip.
140 Chapter 6. Predicting the Popularity Level of a Tip
popularity category. Indeed, this is true for various sectors in both scenarios. Take for
instance the sectors labeled as Region 1 in both Figures 6.15 and 6.16. These are sectors
containing only tips from the same popularity category where the model obtained
100% of accuracy. Note that this happens also if we look at Region 3 (both figures),
which consists of sectors containing only highly popular tips. However, there also
sectors where the model behaves poorly as they have all tips from the low popularity
category but zero accuracy rate (Region 2). When we ran the same experiment for
the second scenario (different set of tips for the test set), we observed, in particular for
the low popularity tips, a greater variation in the model accuracy when the sectors are
composed by only tips from the same category. Interestingly, in the same scenario, if
we focus on the high popularity tips, we observed a more clear trend of having a better
model accuracy in sectors containing a larger fraction of tips of the same class.
These results illustrate the complexity of the problem we tackled. There seems
to be other factors that were not explored in this dissertation or possible hidden factor
interactions that also affect tip popularity. Identifying and characterizing such factor
and interactions can lead to further improvements in prediction accuracy, motivating
further investigations in this direction.
50 60 70 80 90 100
% de tips from the dominant category
0
20
40
60
80
100
Ac
cu
ra
cy
 ra
te
 (%
)
Region 2
Region 1
(a) Tips in the low popularity category
50 60 70 80 90 100
% de tips from the dominant category
0
20
40
60
80
100
Ac
cu
ra
cy
 ra
te
 (%
)
Region 3
(b) Tips in the high popularity category
Figure 6.15: Model Accuracy in the Training Set (Each point is a Sector in the 10-
Dimensional Space Defined by the Top-10 Features).
6.9. Summary 141
50 60 70 80 90 100
% de tips from the dominant category
0
20
40
60
80
100
Ac
cu
ra
cy
 ra
te
 (%
)
Region 2
Region 1
(a) Tips in the low popularity category
50 60 70 80 90 100
% de tips from the dominant category
0
20
40
60
80
100
Ac
cu
ra
cy
 ra
te
 (%
)
Region 3
(b) Tips in the high popularity category
Figure 6.16: Model Accuracy in the Testing Set (Each Point is a Sector in the 10-
Dimensional Space Defined by the Top-10 Features).

Chapter 7
Conclusions and Future Work
Online reviews have enabled customers to interact and share information and opinions
about products and services they experience. With the diffusion of smartphones, re-
views were expanded to the mobile environment, incorporating new forms, features,
and addressing new challenges. Micro-reviews or tips, a more concise type of review
(usually, up to 200 characters), have emerged from this environment. Tips often cap-
ture the immediate reaction from users, while the information is still fresh in the user’s
mind, and usually contain much more subjective and informal content.
As in longer review systems, the abundance of micro-reviews makes it hard for
customers to find helpful reviews, especially on a mobile device. Thus, rather than
helping users in their purchasing decision, these large volume of reviews can be make the
user experience overwhelming and misleading. As the soon as a micro-review is posted,
it has the potential to influence other customers on their purchase decisions via systems
like Yelp and Foursquare. So, it is very important to understand how tips are explored
by users in order to understand the consequences for the businesses. That can provide
valuable inputs to learn a model for automatically predict the quality of a review, which
can greatly benefit the future design of content filtering and recommendation methods.
Our dissertation presents a combination of (a) an extensive characterization of
the main entities that may impact tip popularity, and analysis of popularity dynamics
over time, (b) identification of a rich set of features related to tip popularity, and (c)
investigation of solutions to two popularity prediction tasks using the set of features.
Next, we summarize our main conclusions, and present some possible directions for
future work. We then wrap up with a list of publications derived, directly or indirectly,
from this dissertation.
143
144 Chapter 7. Conclusions and Future Work
7.1 Main Conclusions
The main focus of our dissertation is to investigate the problem of predicting the
popularity of micro-reviews. To support our study, we collected two datasets from
Foursquare, consisting of over 10 million of tips posted by 13 million users. We pre-
sented a large-scale characterization of the three main entities related to Foursquare
system that may impact tip popularity – the user who posted the tip, the venue where
it was posted, and its textual content. Our analyses also uncovered four different user
behavior profiles and identified the presence of spamming activity. Towards better un-
derstanding user interactions, notably the presence of influential users, we also investi-
gated methods to automatically infer a user’s influence level on Foursquare. Moreover,
we studied how the popularity of a tip evolves over time by performing a extensive
analysis on the popularity dynamics of Foursquare.
After performing the characterization, we were able to identify a rich set of over
120 features related to the user, venue and tip content that may impact tip popularity.
We then investigated the potential benefits from exploiting these aspects to estimate
the popularity of a tip in the future. To that end, we formulated the prediction problem
as two different prediction tasks.
The first task addressed the prediction problem as a ranking task aiming at rank-
ing a group of tips based on their predicted popularity at a given future time. Towards
addressing this task, we first evaluated the stability of the tip popularity ranking over
time, assessing to which extent the current popularity ranking of a set of tips can be
used to predict their popularity ranking at a future time. Overall, the ranking is stable
corroborating the characterization results, but we observed opportunities for improve-
ment, exploiting a multidimensional set of predictors. Our results showed that the use
of the richer set of features can indeed improve the prediction accuracy, provided that
enough data is available to train the regression model. Moreover, our experimental
results showed that the use of only four features (total number of likes received by the
tips previously posted by the user, total number of check-ins at the venues, the type of
the Foursquare user, and the current popularity of the tip) produces results that are
as good as when all features are used as predictors.
The other prediction task we tackled relates to the problem of predicting the
popularity level of a single tip. This turned out to be a more challenging task, as it
focus on absolute values of popularity, as opposed to relative measures (i.e., ranking).
Moreover, we focused first on predictions at tip posting time, when the only informa-
tion about the tip’s textual content and historical patterns related to the user and the
venue associated with it can be exploited as predictor variables. In particular, unlike
7.2. Directions for Future Work 145
the vast majority of previous content popularity prediction efforts, no early measures
of popularity of the tip is considered. Nevertheless, some of our proposed models were
able to produce good results (over 80% in terms of recall). We also observed that
the top-10 most discriminative features , which include features that capture the prior
popularity of the user who posted the tip and the venue where it was posted as well as
characteristics of the user’s social network, produce results that are as good as when all
125 features considered are used as predictors. We found significant improvements for
all prediction methods if we relax the restriction of performing predictions at posting
time. In other words, by monitoring the tip for a (short) time, we are able to gather
information about its early popularity, which improves prediction accuracy. More-
over, although state-of-the-art prediction methods that use early popularity measures
as the only predictors do perform reasonably well in such scenarios, our models are
more robust, as they can be applied to any tip, at or after posting time (unlike the
other methods), besides producing much higher recall for the high popularity class.
Finally, we found that model specialization does bring some (limited) improvements if
performed at the city-level, whereas category-based specialization does not bring clear
and consistent gains.
Predicting the popularity of micro-reviews (tips in particular) is a challenging
problem which, in comparison with previous related efforts, has unique aspects and
inherently different characteristics, and may depend on a non-trivial combination of
various user, venue and content features. We expect that the knowledge derived from
the present effort may bring valuable insights into the design of more cost-effective
automatic tip filtering and recommendation strategies.
7.2 Directions for Future Work
We envision several directions into which our work can be extended in the future.
One possible direction consisting in analyzing and investigation the introduction
of new features to the prediction models. For example, we have explored the geograph-
ical location of the venues to build specialized models, finding some improvements in
the prediction accuracy. We believe that there is room for other spatial data analyses.
For example, the model could have a feature that captures the correlation between
the tip geographic location and the geographical location of previous likes received by
the tip’s author. Other factors that were not explored in this dissertation and may
also influence tip popularity, such as the characteristics of the interface design and
referrals from other external source (e.g., search engines, impact of newspaper articles,
146 Chapter 7. Conclusions and Future Work
other recommendation sites) should also be investigated towards improving the predic-
tion accuracy. Moreover, based on our experimental results, we observed that textual
features used for longer reviews were not effective to improve the accuracy of our pre-
diction models. We believe that a study of new textual features, more specific for short
and informal texts is a promising approach to develop an understanding about which
information attracts more people.
In this study, we focused on tips written in English. A possible direction to be
explored is to investigate the tips written in other languages which would allow us to
analyze the patterns for different countries or cultures. That would require the use
of specific textual tools for each language or the analysis of these tips using only the
other features (i.e., user and venue features), since the textual features used in our
models were not as relevant for the popularity prediction tasks. It is also interesting
to evaluate if there are invariants in the text of the most popular tips.
Another group of possible directions is related to how we model the problem.
In this work, we observed that most of our features were non-stationary features (i.e.,
features defined by their varying statistical properties with time) and we explored them
as aggregate numbers (e.g., total, average, median). One extension of this work is to
consider the evolution of these metrics over time during the monitoring time. This
is similar to approaches that explore time series for predicting queries in a search en-
gine [Radinsky et al., 2013] or trends in financial market [Ruiz et al., 2012]. Exploring
such models, we are also able to capture seasonal or trend components of the predic-
tion associated with events (e.g., Christmas or Thanksgiving holidays) that were not
accounted when we use models based on aggregate metrics. Other interesting inves-
tigations are studying other ways to represent historical data and the possibility of
a specific model for each type of prediction (e.g., for the next hour, day, week or to
predict the number of likes received in a single day). Another way to formulate our
popularity prediction problem is to built multivariate models that takes into account
several dependent variables simultaneously. A multivariate study may show other vari-
ables that are related to the tip popularity. For example, the total number of likes
received by a tip may be not the only important variable related to popularity, an
interesting feature to be investigated is the mean number of likes received by the user’s
tips.
We also envision further investigations on technical aspects of the prediction
techniques to help improving our predictions. Recall that to cope with the effects of
class imbalance in the training data, we used the under-sampling technique. How-
ever, in some scenarios, this strategy is not ideal, since it reduces the size of training
dataset. Thus, there are other approaches that should be investigated, such as over-
7.2. Directions for Future Work 147
sampling (e.g., SMOTE [Chawla et al., 2002]), boosting [Seiffert et al., 2007] and
bagging [Breiman, 1996] techniques. Moreover, active learning approaches [Harpale
and Yang, 2008] can also be used to identify the most informative set of training in-
stances. Even though we use state-of-the-art methods, other classification and ranking
techniques could be also investigated.
Another possible direction for future developments consists of exploring our pre-
diction models on various applications and services including recommendation/filtering
of tips. Systems can make use of such predictions to instantly identify reviews that
are expected to be helpful to users, and adjust their presentation efficiently, ultimately
improving the user experience. Moreover, both reviews systems and businesses owners
would like to be able to predict if a tip have the potential to become popular. For
example, such information can be used to estimate the revenue of promoted tips ahead
of time as well as to provide a quickly feedback about the used marketing strategy. An
evaluation with real users is also important to better define the popularity measure and
understanding the mapping between helpfulness and popularity with Foursquare users
as done for Amazon reviews in Liu et al. [2007]. This evaluation can be performed as
mechanical turk. Moreover, a deeper analysis of particular categories or subcategories
of venues tips can be used to understand the impact of public services on the population
from a given city. For example, monitoring the citizens reviews about buses stations
give an idea of what the citizens expect from the provided service. This analysis can
also point problems with the service or underutilization of the service for part of the
users. Moreover, since Foursquare is a world-wide social network, this analysis can
also be used to compare the most common problems faced by different countries (e.g.,
developed vs. undeveloped countries).
Finally, an interesting direction would be to evaluate our prediction models in
other micro-reviews systems (e.g. Yelp). Such evaluation would require additional work
to map our proposed features to the new system as well as investigating new features
that are specific to the application. It would be interesting to investigate whether
the same features would have a similar explanatory power in the new system as they
have in Foursquare, and whether other features emerge as relevant to the popularity
prediction in those systems. Furthermore, since our data was collected in 2011, it will
be interesting if we can assess the impact on the tip popularity of the current Foursquare
system. From 2015, Foursquare app was divided in two apps, separating the check-in
functionality from the recommendation tool. It will be interesting to evaluate how the
popularity of a tip is evolving in the new app comparing with our results.
148 Chapter 7. Conclusions and Future Work
7.3 Publications
The main results of this dissertation generated the following publications:
• [Vasconcelos et al., 2012b] published in ACM 5th ACM International Conference
on Web Search and Data Mining (WSDM’12).
• [Vasconcelos et al., 2012a] published in the XXX Brazilian Symposium on Com-
puter Networks and Distributed Systems (SBRC’12).
• [Vasconcelos et al., 2014b] published in ACM 29th Symposium On Applied Com-
puting (SAC’14).
• [Vasconcelos et al., 2014c] published in ACM Conference on Online Social Net-
works (COSN’14)
• [Vasconcelos et al., 2014a] submitted to Elsevier Information Sciences (2nd round
of review).
During the development of this dissertation, we were also involved in other studies
that are indirectly related to its topic. They generated the following publications:
• [Moraes et al., 2013b] published in 19th Brazilian Symposium on Multimedia
and the Web (WebMedia’13). In this work, we evaluated the effectiveness of four
polarity classification strategies on subsets of our Foursquare dataset.
• [Moraes et al., 2013a] published in 5th International Conference on Social Infor-
matics (SocInfo’13). In this work, we compared the same four polarity detection
methods with a hybrid approach that combines all techniques using stacking.
• [Pontes et al., 2012b] published in 4th International Workshop on Location-Based
Social Networks (LBSN’12). In this work, we investigated how much information
about a user can be inferred from her tips, likes and mayorships.
• [Pontes et al., 2012a] published in the International Workshop on Privacy in Social
Data (PinSoDa’12). In this work, we performed a large-scale inference study in
three of the currently most popular social networks: Foursquare, Google+ and
Twitter.
Bibliography
Abbasi, M. A. and Liu, H. (2013). Measuring User Credibility in Social Media. In
Proceedings of the 6th International Conference on Social Computing, Behavioral-
Cultural Modeling and Prediction (SBP).
Adal, S., Liu, T., and Magdon-Ismail, M. (2012). An Analysis of Optimal Link Bombs.
Theoretical Computer Science, 437:1--20.
Adamic, L., Zhang, J., Bakshy, E., and Ackerman, M. (2008). Knowledge Sharing and
Yahoo Answers: Everyone Know Something. In Proceedings of the 17th International
World Wide Web Conference (WWW).
Agarwal, N., Liu, H., Tang, L., and Yu, P. S. (2008). Identifying the Influential Bloggers
in a Community. In Proceedings of the International Conference on Web Search and
Data Mining (WSDM).
Aggarwal, A., Almeida, J., and Kumuraguru, P. (2013). Detection of Spam Tipping
Behaviour on Foursquare. In Proceedings of the 2nd International Workshop on
Mining Social Network Dynamics (MSND).
Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. (2008). Finding
High-Quality Content in Social Media. In Proceedings of the First International
Conference on Web Search and Data Mining (WSDM).
Akoglu, L., Chandy, R., and Faloutsos, C. (2013). Opinion Fraud Detection in Online
Reviews by Network Effects. In Proceedings of the 7th International AAAI Confer-
ence on Weblogs (ICWSM).
Allen, A. (1990). Probability, Statistics, and Queueing Theory with Computer Science
Applications. Academic Press Professional, Inc., San Diego, CA, USA. ISBN 0-12-
051051-0.
149
150 Bibliography
Alwee, R., Shamsuddin, S., and Sallehuddin, R. (2013). Hybrid Support Vector Re-
gression and Autoregressive Integrated Moving Average Models Improved by Particle
Swarm Optimization for Property Crime Rates Forecasting with Economic Indica-
tors. The Scientific World Journal, 2013(951475).
Anderson, A., Huttenlocher, D., Kleinberg, J., and Leskovec, J. (2012). Discover-
ing Value from Community Activity on Focused Question Answering Sites: a Case
Study of Stack Overflow. In Proceedings of the 18th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD).
Ante, S. (2009). Amazon: Turning Consumer Opinions into Gold. http://www.
businessweek.com/magazine/content/09_43/b4152047039565.htm.
Bakshy, E., Hofman, J. M., Mason, W. A., and Watts, D. J. (2011). Everyone’s
an Influencer: Quantifying Influence on Twitter. In Proceedings of the 4th ACM
International Conference on Web Search and Data Mining (WSDM).
Bakshy, E., Karrer, B., and Adamic, L. A. (2009). Social Influence and the Diffusion
of User-Created Content. In Proceedings of the 10th ACM Conference on Electronic
Commerce (EC).
Bandari, R., Asur, S., and Huberman, B. (2012). The Pulse of News in Social Me-
dia: Forecasting Popularity. In Proceedings of the 6th International Conference on
Weblogs and Social Media (ICWSM).
Barabasi, A.-L. and Albert, R. (1999). Emergence of scaling in random networks.
Science, 286(5439):509–512.
Barbierri, C. (2011). Foursquare Reveals New “Promoted” Check-in for Super Bowl
Sunday. http://venturebeat.com/2011/02/03/foursquare-super-bowl/.
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., and Ross, K. (2009). Video
Interactions in Online Video Social Networks. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMCCAP), 5(4):1--25.
Benevenuto, F., Rodrigues, T., Veloso, A., Almeida, J., Gonçalves, M., and Almeida,
V. (2012). Practical Detection of Spammers and Content Promoters in Online Video
Sharing Systems. IEEE Transactions on Systems, Man and Cybernetics - Part B,
PP(99):1–14.
Bibliography 151
Berjani, B. and Strufe, T. (2011). A Recommendation System for Spots in Location-
Based Online Social Networks. In Proceedings of the 4th Workshop on Social Network
Systems (SNS).
Bermingham, A. and Smeaton, A. F. (2010). Classifying Sentiment in Microblogs: is
Brevity an Advantage? In Proceedings of the 19th ACM International Conference
on Information and Knowledge Management (CIKM).
Borghol, Y., Ardon, S., Carlsson, N., Eager, D., and Mahanti, A. (2012). An Un-
told Story of the Clones: Content-agnostic Factors that Impact YouTube Video
Popularity. In Proceedings of the 18th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD).
Breiman, L. (1996). Bagging predictors. Journal Machine Learning, 24(2):123--140.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1):5--32.
Brodersen, A., Scellato, S., and Watternhofer, M. (2012). YouTube Around the World:
Geographic Popularity of Videos. In Proceedings of the 21st International Conference
on World Wide Web (WWW).
Castillo, C., Mendoza, M., and Poblete, B. (2011). Information Credibility on Twitter.
In Proceedings of the 20th International Conference on World Wide Web (WWW).
Catalyst Marketers Blog (2010). Catalyst Marketers Blog. http://www.
catalystmarketers.com/foursquare-spam.
Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. (2010). Measuring User In-
fluence in Twitter: The Million Follower Fallacy. In Proceedings of 4th International
AAAI Conference on Weblogs and Social (ICWSM).
Cha, M., Mislove, A., and Gummadi, K. P. (2009). A Measurement-driven Analysis
of Information Propagation in the Flickr Social Network. In Proceedings of the 18th
International Conference on World Wide Web (WWW).
Chang, C. and Lin, C. (2001). LIBSVM: a Library for Support Vector Machines.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE:
Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Re-
search, 16(1):321--357.
152 Bibliography
Chen, B.-C., Guo, J., Tseng, B., and Yang, J. (2011). User Reputation in a Com-
ment Rating Environment. In Proceedings of the 17th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD).
Chen, P.-Y., yi Wu, S., and Yoon, J. (2004). The Impact of Online Recommendations
and Consumer Feedback on Sales. In Proceedings of the International Conference on
Information Systems (ICIS).
Chen, Y. and Xie, J. (2008). Online Consumer Review: Word-of-Mouth as A New
Element of Marketing Communication Mix. Management Science, 54(3):477--491.
Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M., and Leskovec, J. (2014). Can
Cascades Be Predicted? In Proceedings of the 23rd International Conference on
World Wide Web (WWW).
Cheng, Z., Caverlee, J., Kamath, K., and Lee, K. (2011a). Toward Traffic-Driven
Location-Based Web Search. In Proceedings of the 20th ACM International Confer-
ence on Information and Knowledge Management (CIKM).
Cheng, Z., Caverlee, J., Lee, K., and Sui, D. Z. (2011b). Exploring Millions of Foot-
prints in Location Sharing Services. In Proceedings of the 5th International Confer-
ence on Weblogs and Social Media (ICWSM).
Chevalier, J. A. and Mayzlin, D. (2006). The Effect of Word of Mouth on Sales: Online
Book Reviews. Journal of Marketing Research, 43(3):345--354.
Cho, E., Myers, S., and Leskovec, J. (2011). Friendship and Mobility: User Move-
ment In Location-Based Social Networks. In Proceedings of the 17th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (KDD).
Coleman, M. and Liau, T. (1975). A Computer Readability Formula Designed for
Machine Scoring. Journal of Applied Psychology, 60(2):283–284.
Costa, H., Benevenuto, F., and de Campos Merschmann, L. H. (2013). Detecting Tip
Spam in Location-based Social Networks. In Proceedings of the 28th Annual ACM
Symposium on Applied Computing (SAC).
Crane, R. and Sornette, D. (2008). Robust Dynamic Classes Revealed by Measuring
the Response Function of a Ssocial System. In Proceedings of the National Academy
of Sciences (PNAS).
Bibliography 153
Dalip, D., Gonçalves, M., Cristo, M., and Calado, P. (2011). Automatic Assessment of
Document Quality in Web Collaborative Digital Libraries. J. Data and Information
Quality, 2(3):14:1--14:30.
Dalip, D., Lima, H., Gonçalves, M., Cristo, M., and Calado, P. (2014). Quality Assess-
ment of Collaborative Content With Minimal Information. In Joint Conference on
Digital Libraries and the Theory and Practice of Digital Libraries.
Dalip, D. H., Gonçalves, M. A., Cristo, M., and Calado, P. (2013). Exploiting User
Feedback to Learn to Rank Answers in Q&#38;a Forums: A Case Study with Stack
Overflow. In Proceedings of the 36th International ACM Conference on Research
and Development in Information Retrieval (SIGIR).
Danescu-Niculescu-Mizil, C., Kossinets, G., Kleinberg, J., and Lee, L. (2009). How
Opinions are Received by Online Communities: a Case Study on Amazon.com Help-
fulness Votes. In Proceedings of the 18th International Conference on World Wide
Web (WWW).
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum Likelihood from Incomplete
Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B
(Methodological), 39(1):1--38.
Drucker, H., Burges, C., Kaufman, L., Smola, A., and Vladimir, V. (1997). Support
Vector Regression Machines. In Advances in Neural Information Processing Systems
9.
Du, Y., Shi, Y., and Zhao, X. (2007). Using Spam Farm to Boost PageRank. In
Proceedings of the 3rd International Workshop on Adversarial Information Retrieval
on the Web (AIRWeb).
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm
for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of
the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (KDD).
Esuli, A. and Sebastiani, F. (2006). SentiWordNet: A Publicly Available Lexical Re-
source for Opinion Mining. In In Proceedings of the 5th Conference on Language
Resources and Evaluation (LREC), pages 417--422.
Fagin, R., Kumar, R., and Sivakumar, D. (2003). Comparing Top K Lists. In Proceed-
ings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA).
154 Bibliography
Fellbaum, C. (1998). WordNet: An Electronical Lexical Database. The MIT Press,
Cambridge, MA.
Figueiredo, F., Almeida, J., Matsubara, Y., Ribeiro, B., and Faloutsos, C. (2014a).
Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries. In Pro-
ceedings of European Conference on Machine Learning and Principles and Practice
of Knowledge Discovery (ECML/PKDD).
Figueiredo, F., Gonçalves, M., and Almeida, J. (2014b). Improving the Effectiveness of
Content Popularity Prediction Methods using Time Series Trends. In ECML/PKDD
Predictive Analytics Challenge.
Flesch, R. (1948). A New Readability Yardstick. Journal of Applied Psychology,
32(3):221 – 233.
Fogg, B. J., Marshall, J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J.,
Rangnekar, A., Shon, J., Swani, P., and Treinen, M. (2001). What Makes Web Sites
Credible?: a Report on a Large Quantitative Study. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems (CHI).
Foursquare (2011). Foursquare House Rules. http://support.foursquare.com/entries/
386768-foursquare-house-rules.
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., and Herrera, F. (2011). An
Overview of Ensemble Methods for Binary Classifiers in Multi-Class Problems: Ex-
perimental Study on One-vs-One and One-vs-All Schemes. Pattern Recognition,
44(8):1761--1776.
Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., and Zhao, B. Y. (2010). Detecting and
Characterizing Social Spam Campaigns. In Proceedings of the 10th ACM SIGCOMM
Conference on Internet Measurement (IMC).
Georgiev, P., Noulas, A., and Mascolo., C. (2014). The Call of the Crowd: Event Par-
ticipation in Location-based Social Services. In Proceedings of the 8th International
Conference on Weblogs and Social Media (ICWSM).
Ghose, A. and Ipeirotis, P. G. (2007). Designing Novel Review Ranking Systems: Pre-
dicting the Usefulness and Impact of Reviews. In Proceedings of the 9th International
Conference on Electronic Commerce (ICEC).
Bibliography 155
Gionis, A., Lappas, T., Pelechrinis, K., and Terzi, E. (2014). Customized Tour Recom-
mendations in Urban Areas. In Proceedings of the 7th ACM International Conference
on Web Search and Data Mining (WSDM).
Gladwell, M. (2002). The Tipping Point: How Little Things Can Make a Big Difference.
Back Bay Books.
Gonçalves, P., Araujo, M., Benevenuto, F., and Cha, M. (2013). Comparing and
Combining Sentiment Analysis Methods. In Proceedings of ACM Conference on
Online Social Networks (COSN).
Grier, C., Thomas, K., Paxson, V., and Zhang, M. (2010). @Spam: the Underground
on 140 Characters or Less. In Proceedings of the 17th ACM Conference on Computer
and Communications Security (CSS).
Grinter, R. and Eldridge, M. (2003). Wan2Tlk?: Everyday Text Messaging. In Pro-
ceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI),
CHI ’03.
Gruhl, D., Guha, R., Liben-Nowell, D., and Tomkins, A. (2004). Information Diffu-
sion Through Blogspace. In Proceedings of the 13th International World Wide Web
Conference (WWW).
Gunning, R. (1952). The Technique of Clear Writing. McGraw-Hill, New York.
Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. (2004). Combating Web Spam with
Trustrank. In Proceedings of the 30th International Conference on Very Large Data
bases (VLDB).
Harpale, A. S. and Yang, Y. (2008). Personalized Active Learning for Collaborative
Filtering. In Proceedings of the 31st International ACM Conference on Research and
Development in Information Retrieval (SIGIR).
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering
Algorithm. Journal of the Royal Statistical Society (Applied Statistics), 28(1):100--
108.
Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the 11th Inter-
national World Wide Web Conference (WWW).
He, H. and Garcia, E. (2009). Learning from Imbalanced Data. IEEE Transactions on
Knowledge and Data Engineering, 21(9):1263 –1284.
156 Bibliography
Henzinger, M. R., Heydon, A., Mitzenmacher, M., and Najork, M. (1999). Measuring
Index Quality using Random Walks on the Web. In Proceedings of the 8th Interna-
tional World Wide Web Conference (WWW).
Hong, L., Dan, O., and Davison, B. D. (2011). Predicting Popular Messages in Twitter.
In Proceedings of the 20th International World Wide Web Conference (WWW).
Hong, Y., Lu, J., Yao, J., Zhu, Q., and Zhou, G. (2012). What Reviews are Sat-
isfactory: Novel Features for Automatic Helpfulness Voting. In Proceedings of the
35th International ACM Conference on Research and Development in Information
Retrieval (SIGIR).
Hovland, C. I. and Weiss, W. (1951). The Influence of Source Credibility on Commu-
nication Effectiveness. The Public Opinion Quarterly, 15(4):635--650.
Howley, T., Madden, M. G., O’Connell, M.-L., and Ryder, A. G. (2006). The Ef-
fect of Principal Component Analysis on Machine Learning Accuracy with High-
Dimensional Spectral Data. Knowledge-Based Systems, 19(5):363–370.
Hsu, C.-F., Khabiri, E., and Caverlee, J. (2009). Ranking Comments on the Social
Web. In Proceedings of the International Conference on Computational Science and
Engineering (CSE).
Hua, S. and Sun, Z. (2001). Support Vector Machine Approach for Protein Subcellular
Localization Prediction. Oxford Journal of Bioinformatics, 17(8):721--728.
Inquiry, L. and Count, W. (2007). Liwc2007 output variable information. http://www.
liwc.net/descriptiontable1.php.
Jain, R. (1991). The Art of Computer Systems Performance Analysis: Techniques for
Experimental Design, Measurement, Simulation, and Modeling. Wiley.
Janecek, A., Gansterer, W. N., Demel, M., and Ecker, G. (2008). On the Relationship
Between Feature Selection and Classification Accuracy. Journal of Machine Learning
Research - Proceedings Track, 4:90–105.
Jindal, N. and Liu, B. (2008). Opinion Spam and Analysis. In Proceedings of the
International Conference on Web Search and Data Mining (WSDM).
Joachims, T. (1998). Text Categorization with Suport Vector Machines: Learning
with Many Relevant Features. In Proceedings of the 10th European Conference on
Machine Learning (ECML).
Bibliography 157
Katz, E. (1957). The Two-Step Flow of Communication: An Up-to-Date Report of an
Hypothesis. Public Opinion Quarterly, 21(1):61–78.
Kendall, M. and Gibbons, J. D. (1990). Rank Correlation Methods. A Charles Griffin
Title, 5 edition.
Kim, S.-M., Pantel, P., Chklovski, T., and Pennacchiotti, M. (2006). Automatically As-
sessing Review Helpfulness. In Proceedings of the Conference on Empirical Methods
in Natural Language Processing (EMNLP).
Kleinberg, J. M. (1999). Hubs, authorities, and communities. ACM Computing Surveys
(CSUR), 31(4es). ISSN 0360-0300.
Konagurthu, A. and Collier, J. (2013). An Information Measure for Comparing Top k
Lists. CoRR, abs/1310.0110.
Korfiatis, N., GarcíA-Bariocanal, E., and SáNchez-Alonso, S. (2012). Evaluating Con-
tent Quality and Helpfulness of Online Product Reviews: The Interplay of Review
Helpfulness vs. Review Content. Electronic Commerce Research and Applications,
11(3):205--217.
Krapivsky, P. L., Redner, S., and Leyvraz, F. (2000). Connectivity of Growing Random
Networks. Physical Review Letters, 85(21):4629--4632.
Kwak, H., Lee, C., Park, H., and Moon, S. (2010). What is Twitter, a Social Network or
a News Media? In Proceedings of the 19th International World Wide Web Conference
(WWW).
Lappas, T. (2012). Fake Reviews: The Malicious Perspective. In Natural Language
Processing and Information Systems, volume 7337 of Lecture Notes in Computer
Science, pages 23–34. Springer Berlin Heidelberg.
Lee, S. and Choeh, J. Y. (2014). Predicting the Helpfulness of Online Reviews Us-
ing Multilayer Perceptron Neural Networks. Expert Systems with Applications: An
International Journal, 41(6):3041--3046.
Lerman, K. and Hogg, T. (2010). Using a Model of Social Dynamics to Predict Popu-
larity of News. In Proceedings of the 19th International Conference on World Wide
Web (WWW).
Leskovec, J., Adamic, L. A., and Huberman, B. A. (2007). The Dynamics of Viral
Marketing. ACM Transactions on the Web (TWEB), 1(1).
158 Bibliography
Li, B., Jin, T., Lyu, M., King, I., and Mak, B. (2012). Analyzing and Predicting
Question Quality in Community Question Answering Services. In Proceedings of the
21st International Conference on World Wide Web (WWW).
Li, F., Huang, M., Yang, Y., and Zhu, X. (2011). Learning to Identify Review Spam.
In Proceedings of the Twenty-Second International Joint Conference on Artificial
Intelligence (IJCAI).
Li, N. and Chen, G. (2009a). Analysis of a Location-based Network. In Proceedings of
the International Conference on Computational Science and Engineering (CSE).
Li, N. and Chen, G. (2009b). Multi-layer Friendship Modeling of Location-based Mo-
bile Social Networks. In Proceedings of the 6th Annual International Conference
on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQui-
tous).
Li, X. and Hitt, L. M. (2008). Self-Selection and Information Role of Online Product
Reviews. Information Systems Research, 19(4):456--474.
Li, Y., Steiner, M., Wang, L., Zhang, Z., and Bao, J. (2013). Exploring Venue Pop-
ularity in Foursquare. In Proceedings of the 5th IEEE International Workshop on
Network Science for Communication Networks (NetSciCom).
Lin, Y., Zhu, T., Wang, X., Zhang, J., and Zhou, A. (2014). Towards Online Review
Spam Detection. In Proceedings of the 23st International Conference on World Wide
Web (WWW).
Lindqvist, J., Cranshaw, J., Wiese, J., Hong, J., and Zimmerman, J. (2011). I’m
the Mayor of my House: Examining why People use Foursquare - a Social-Driven
Location Sharing Application. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (CHI).
Liu, B. (2010). Sentiment Analysis and Subjectivity. In Indurkhya, N. and Damerau,
F. J., editors, Handbook of Natural Language Processing, Second Edition. CRC Press,
Taylor and Francis Group.
Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., and Zhou, M. (2007). Low-Quality Product
Review Detection in Opinion Summarization. In Proceedings of the Joint Conference
on Empirical Methods in Natural Language Processing and Computational Natural
Language Learning (EMNLP-CoNLL).
Bibliography 159
Liu, X.-Y., Wu, J., and Zhou, Z.-H. (2009). Exploratory Undersampling for Class-
Imbalance Learning. Transactions on Systems, Man and Cybernetics - Part B, 39(2).
Liu, Y., Huang, X., An, A., and Yu, X. (2008). Modeling and Predicting The Helpful-
ness of Online Reviews. In Proceedings of the 8th IEEE International Conference on
Data Mining (ICDM).
Lu, Y., Tsaparas, P., Ntoulas, A., and Polanyi, L. (2010). Exploiting Social Context
for Review Quality Prediction. In Proceedings of the 19th International Conference
on World Wide Web (WWW).
Ma, Y. and Li, F. (2012). Detecting Review Spam: Challenges and Opportunities. In
8th International Conference on Collaborative Computing: Networking, Applications
and Worksharing (CollaborateCom).
Manevitz, L. M. and Yousef, M. (2002). One-class SVMs for Document Classification.
Journal of Machine Learning Research, 2:139--154.
Martin, L. and Pu, P. (2014). Prediction of Helpful Reviews Using Emotions Extrac-
tion. In Proceedings of the 28th AAAI Conference on Artificial Intelligence.
Matsubara, Y., Sakurai, Y., Prakash, B. A., Li, L., and Faloutsos, C. (2012). Rise
and Fall Patterns of Information Diffusion: Model and Implications. In Proceedings
of the 18th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD).
McLaughlin, H. G. (1969). SMOG Grading - A New Readability Formula. Journal of
Reading, 12(8):639–646.
Moghaddam, S., Jamali, M., and Ester, M. (2012). ETF: Extended Tensor Factoriza-
tion Model for Personalizing Prediction of Review Helpfulness. In Proceedings of the
5th ACM International Conference on Web Search and Data Mining (WSDM).
Momeni, E., Cardie, C., and Ott, M. (2013). Properties, Prediction, and Prevalence
of Useful User-generated Comments for Descriptive Annotation of Social Media Ob-
jects. In Proceedings of the 7th International Conference on Weblogs and Social
Media (ICWSM).
Moraes, F., Vasconcelos, M., Prado, P., Almeida, J., and Gonçalves, M. (2013a). Po-
larity Analysis of Micro Reviews in Foursquare. In Proceedings of the 19th Brazilian
Symposium on Multimedia and the Web (WebMedia).
160 Bibliography
Moraes, F., Vasconcelos, M., Prado, P., Dalip, D., Almeida, J., and GonÃ§alves, M.
(2013b). Polarity Detection of Foursquare Tips. In Social Informatics, volume 8238
of Lecture Notes in Computer Science, pages 153–162. Springer International Pub-
lishing.
Ngo-Ye, T. L. and Sinha, A. P. (2012). Analyzing Online Review Helpfulness Us-
ing a Regressional ReliefF-Enhanced Text Mining Method. ACM Transactions on
Management Information Systems (TMIS), 3(2):10:1--10:20.
Nguyen, T.-S., Lauw, H. W., and Tsaparas, P. (2013). Using Micro-reviews to Select an
Efficient Set of Reviews. In Proceedings of the 22nd ACM International Conference
on Conference on Information &#38; Knowledge Management (CIKM).
Noulas, A., Scellato, S., Lambiotte, R., Pontil, M., and Mascolo, C. (2012). A Tale of
Many Cities: Universal Patterns in Human Urban Mobility. PloS one, 7(5):e37027.
Noulas, A., Scellato, S., Mascolo, C., and Pontil, M. (2011a). An Empirical Study of
Geographic User Activity Patterns in Foursquare. In Proceedings of 5th International
AAAI Conference on Weblogs and Social Media (ICWSM).
Noulas, A., Scellato, S., Mascolo, C., and Pontil, M. (2011b). Exploiting Semantic
Annotations for Clustering Geographic Areas and Users in Location-based Social
Networks. In Proceedings of 3rd Workshop Social Mobile Web (SMW).
O’Brien, R. (2007). A Caution Regarding Rules of Thumb for Variance Inflation Fac-
tors. Quality & Quantity: International Journal of Methodology, 41(5):673–690.
O’Mahony, M. and Smyth, B. (2009). Learning to Recommend Helpful Hotel Reviews.
In Proceedings of the 3rd ACM Conference on Recommender Systems (RecSys).
O’Mahony, M. P. and Smyth, B. (2010). Using Readability Tests to Predict Help-
ful Product Reviews. In Adaptivity, Personalization and Fusion of Heterogeneous
Information (RIAO).
Page, L., S. Brin, R. M., and Winograd, T. (1998). The Pagerank Citation Ranking:
Bringing Order to the Web. Stanford Digital Library Technologies Project.
Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs Up? Sentiment Classification
Using Machine Learning Techniques. In Proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP).
Bibliography 161
Park, J., Cha, M., Kim, H., and Jeong, J. (2012). Managing Bad News in Social Media:
A Case Study on Domino’s Pizza Crisis. In Proceedings of 6th International AAAI
Conference on Weblogs and Social Media (ICWSM).
Pelleg, D. and Moore, A. W. (2000). X-means: Extending K-means with Efficient
Estimation of the Number of Clusters. In Proceedings of the 17th International
Conference on Machine Learning (ICML).
Pinto, H., Almeida, J. M., and Gonçalves, M. A. (2013). Using Early View Patterns
to Predict the Popularity of YouTube Videos. In Proceedings of the 6th ACM Inter-
national Conference on Web Search and Data Mining (WSDM).
Pontes, T., Magno, G., Vasconcelos, M., Gupta, A., Almeida, J., Kumaraguru, P., and
Almeida, V. (2012a). Beware of What You Share: Inferring Home Location in Social
Networks. In International Workshop on Privacy in Social Data (PinSoDa).
Pontes, T., Vasconcelos, M., Almeida, J., Kumaraguru, P., and Almeida, V. (2012b).
We KnowWhere you Live: Privacy Characterization of Foursquare Behavior. In Pro-
ceedings of 4th International Workshop on Location-Based Social Networks (LBSN).
Quercia, D., Ellis, J., Capra, L., and Crowcroft, J. (2011). In the Mood for Being
Influential on Twitter. In Proceedings of the 3th IEEE International Conference on
Social Computing (SOCIALCOM).
Radinsky, K., Svore, K. M., Dumais, S. T., Shokouhi, M., Teevan, J., Bocharov, A.,
and Horvitz, E. (2013). Behavioral Dynamics on the Web: Learning, Modeling, and
Prediction. ACM Transactions on Information Systems (TOIS), 31(3):16:1--16:37.
Ressler, S. (1993). Perspectives on Electronic Publishing - Standards, Solutions, and
More. Prentice Hall.
Romero, D., Galuba, W., Asur, S., and Huberman, B. (2011). Influence and Passivity in
Social Media. In Proceedings of the 20th International World Wide Web Conference
(WWW).
Rossi, L. and Musolesi, M. (2014). It’s the Way you Check-in: Identifying Users
in Location-Based Social Networks. In Proceedings of ACM Conference on Online
Social Networks (COSN).
Rubin, V. L. and Liddy, E. D. (2006). Assessing Credibility of Weblogs. In AAAI
Spring Symposium: Computational Approaches to Analyzing Weblogs.
162 Bibliography
Ruiz, E. J., Hristidis, V., Castillo, C., Gionis, A., and Jaimes, A. (2012). Correlating
Financial Time Series with Micro-blogging Activity. In Proceedings of the 5th ACM
International Conference on Web Search and Data Mining (WSDM).
Sarah Best (2012). The Building Blocks of a Great Foursquare List. http://www.
mightybytes.com/blog/entry/the_building_blocks_of_a_great_foursquare_list/.
Scellato, S. and Mascolo, C. (2011). Measuring User Activity on an Online Location-
based Social Network. In Proceedings of 3rd International Workshop on Network
Science for Communication Networks (NetSciCom).
Scellato, S., Mascolo, C., Musolesi, M., and Latora, V. (2010). Distance Matters: Geo-
social Metrics for Online Social Networks. In Proceedings of the 3rd Workshop on
Online Social Networks (WOSN).
Scellato, S., Noulas, A., Lambiotte, R., and Mascolo, C. (2011a). Socio-spatial Prop-
erties of Online Location-based Social Networks. In Proceedings of 5th International
AAAI Conference on Weblogs and Social Media (ICWSM).
Scellato, S., Noulas, A., and Mascolo, C. (2011b). Exploiting Place Features in Link
Prediction on Location-based Social Networks. In Proceedings of 17th ACM Confer-
ence on Knowledge Discovery and Data Mining (KDD).
Scherer, K. (2005). What are Emotions? And How Can They be Measured? Social
Science Information, 44(4):695–729.
Seiffert, C., Khoshgoftaar, T. M., Hulse, J. V., and Napolitano, A. (2007). Mining Data
with Rare Events: A Case Study. In Proceedings of the 19th IEEE International
Conference on Tools with Artificial Intelligence (ICTAI).
Senter, R. and Smith, E. A. (1967). Automated Readability Index. Technical report
AMRL-TR-6620, Wright-Patterson Air Force Base.
Seward, Z. (2011). Checking In to the Snowpocalypse. http://blogs.wsj.com/digits/
2011/05/19/checking-into-snowpocalypse/.
Shi, J. and Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 22(8):888–905.
Siersdorfer, S., Chelaru, S., Nejdl, W., and San Pedro, J. (2010). How Useful are Your
Comments?: Analyzing and Predicting YouTube Comments and Comment Ratings.
In Proceedings of the 19th International Conference on World Wide Web (WWW).
Bibliography 163
Siersdorfer, S., Chelaru, S., Pedro, J. S., Altingovde, I. S., and Nejdl, W. (2014).
Analyzing and Mining Comments and Comment Ratings on the Social Web. ACM
Transactions on the Web (TWEB), 8(3).
Silva, T., de Melo, P., Almeida, J., Salles, J., and Loureiro, A. (2014). Revealing the
City that We Cannot See. In Transactions on Internet Technology.
Smith, W. (2013). Brands and the New View Of Social Influence. http://www.
brandingstrategyinsider.com/2013/06/brands-and-the-new-view-of-social-influence.
html.
Stanford NLP Group (2012). Stanford Part-Of-Speech Tagger. http://nlp.stanford.
edu/software/tagger.shtml.
Stanford NLP Group (2013). Stanford Named Entity Recognizer. http://nlp.stanford.
edu/software/CRF-NER.shtml.
Stevens, J. (2002). Applied Multivariate Statistics for The Social Sciences. L. Erlbaum
Associates Inc., Hillsdale, NJ, USA. ISBN 0-898-59568-1.
Suh, B., Hong, L., Pirolli, P., and Chi, E. (2010). Want to be Retweeted? Large Scale
Analytics on Factors Impacting Retweet in Twitter Network. In Proceedings of the
2nd IEEE International Conference on Social Computing (SOCIALCOM).
Szabo, G. and Huberman, B. A. (2010). Predicting the Popularity of Online Content.
Communications of the ACM, 53(8):80--88.
Tang, J., Gao, H., Hu, X., and Liu, H. (2013). Context-aware Review Helpfulness
Rating Prediction. In Proceedings of the 7th ACM Conference on Recommender
Systems (RecSys).
Tang, Y., Zhang, Y.-Q., Chawla, N., and Krasser, S. (2009). SVMs Modeling for Highly
Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics,
Part B: Cybernetics, 39(1):281 –288.
Tatar, A., Antoniadis, P., de Amorim, M. D., and Fdida, S. (2014). From Popularity
Prediction to Ranking Online News. Social Network Analysis and Mining, 4(1).
Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., de Amorim, M. D., and Fdida,
S. (2011). Predicting the Popularity of Online Articles Based on User Comments.
In Proceedings of the International Conference on Web Intelligence, Mining and Se-
mantics (WIMS).
164 Bibliography
Tausczik, Y. R. and Pennebaker, J. W. (2010). The Psychological Meaning of Words:
LIWC and Computerized Text Analysis Methods. Journal of Language and Social
Psychology, 29(1):24--54.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment
in Short Strength Detection Informal Text. Journal of the American Society for
Information Science and Technology, 61(12):2544--2558.
Thomas, K., Grier, C., Ma, J., Paxson, V., and Song, D. (2011). Design and Evaluation
of a Real-Time URL Spam Filtering Service. In IEEE Symposium on Security and
Privacy (S&P).
Thurlow and Brown (2003). Generation Txt? The Sociolinguistics of Young People’s
Text-Messaging. Discourse Analysis Online.
T.Ngo-Ye and Sinha, A. (2014). The Influence of Reviewer Engagement Characteristics
on Online Review Helpfulness: A Text Regression Model. Decision Support Systems,
61:47–58.
Tsur, O. and Rappoport, A. (2009). RevRank: a Fully Unsupervised Algorithm for
Selecting the Most Helpful Book Reviews. In Proceedings of the 3th International
Conference on Weblogs and Social Media (ICWSM).
Valente, T. (1995). Network Models of the Diffusion of Inovations. Hampton Press,
Cresskill, NJ.
van Zwol, R. (2007). Flickr: Who is Looking? In ACM International Conference on
Web Intelligence (WIS).
Vasconcelos, M., Almeida, J., and Gonçalves, M. (2014a). Predicting the Popularity
of Micro-Reviews: A Foursquare Case Study. Elsevier Information Sciences (2nd
review round).
Vasconcelos, M., Almeida, J., and Gonçalves, M. (2014b). What Makes your Opinion
Popular? Predicting the Popularity of Micro-Reviews in Foursquare. In Proceedings
of the 29th Annual ACM Symposium on Applied Computing (SAC).
Vasconcelos, M., Almeida, J., Gonçalves, M., Souza, D., and Gomes, G. (2014c). Popu-
larity Dynamics of Foursquare Micro Reviews. In Proceedings of the ACM Conference
on Online Social Networks (COSN).
Bibliography 165
Vasconcelos, M., Ricci, S., Almeida, J., Benevenuto, F., and Almeida, V. (2012a).
Caracterizacao e Influencia do Uso de Tips e Dones no Foursquare. In Proceedings
of the 30th Brazilian Symposium on Computer Networks and Distributed Systems
(SBRC).
Vasconcelos, M., Ricci, S., Almeida, J., Benevenuto, F., and Almeida, V. (2012b). Tips,
Dones and Todos: Uncovering User Profiles in Foursquare. In Proceedings of the 5th
ACM International Conference on Web Search and Data Mining (WSDM).
Wagner, C., Rowe, M., Strohmaier, M., and Alani, H. (2012). Ignorance isn’t Bliss:
An Empirical Analysis of Attention Patterns in Online Communities. In Proceedings
of the 4th IEEE International Conference on Social Computing (SOCIALCOM).
Walther, J., Carr, C., Choi, S., DeAndrea, D., Kim, J., Tong, S., and Heide, B. V. D.
(2010). Interaction of Interpersonal, Peer, and Media Influence Sources Online. A
Networked Self: Identity, Community, and Culture on Social Network Sites, pages
17–38.
Wang, Z., Zhang, D., Zhou, X., Yang, D., Yu, Z., and Yu, Z. (2014). Discovering
and Profiling Overlapping Communities in Location-Based Social Networks. IEEE
Transactions on Systems, Man, and Cybernetics: Systems, 44(4):499–509.
Watts, D. and Dodds, P. (2007). Influentials, Networks, and Public Opinion Formation.
Journal of Consumer Research, 34(4):441–458.
Weerkamp, W. and de Rijke, M. (2012). Credibility-inspired Ranking for Blog Post
Retrieval. Information Retrieval, 15(3-4):243–277.
Weka Machine Learning Project (2012). Weka. http://www.cs.waikato.ac.nz/~ml/
weka.
Weng, J., Lim, E.-P., Jiang, J., and He, Q. (2010). TwitterRank: Finding Topic-
Sensitive Influential Twitterers. In Proceedings of the 3rd ACM International Con-
ference on Web Search and Data Mining (WSDM).
Wu, S., Hofman, J. M., Mason, W. A., and Watts, D. J. (2011a). Who Says What
to Whom on Twitter. In Proceedings of the 20th International Conference on World
Wide Web (WWW).
Wu, S., Tan, C., Kleinberg, J. M., and Macy, M. W. (2011b). Does Bad News Go
Away Faster? In Proceedings of 5th International AAAI Conference on Weblogs and
Social Media (ICWSM).
166 Bibliography
Yang, H., Zhou, Y., and Liu, H. (2010). Chaos Optimization SVR Algorithm with
Application in Prediction of Regional Logistics Demand. In Proceedings of the First
International Conference on Advances in Swarm Intelligence (ICSI).
Yang, J. and Leskovec, J. (2011). Patterns of Temporal Variation in Online Media.
In Proceedings of the 4th ACM International Conference on Web Search and Data
Mining (WSDM).
Yang, Y. and Pedersen, J. (1997). A Comparative Study on Feature Selection in Text
Categorization. In Proceedings of the 14th International Conference on Machine
Learning (ICML).
Ye, M., Yin, P., and Lee, W.-C. (2010). Location Recommendation for Location-based
Social Networks. In Proceedings of the 18th SIGSPATIAL International Conference
on Advances in Geographic Information Systems (GIS).
Yin, P., Luo, P., Wang, M., and Lee, W.-C. (2012). A Straw Shows Which Way the
Wind Blows:Ranking Potentially Popular Items from Early Votes. In Proceedings of
the 5th ACM International Conference on Web Search and Data Mining (WSDM).
Yu, B., Chen, M., and Kwok, L. (2011). Toward Predicting Popularity of Social Mar-
keting Messages. In Proceedings of the 4th International Conference on Social Com-
puting, Behavioral-Cultural Modeling and Prediction (SBP).
Yu, X., Liu, Y., Huang, X., and An, A. (2010). A Quality-aware Model for Sales
Prediction Using Reviews. In Proceedings of the 19th International Conference on
World Wide Web (WWW).
Zhang, J., Ackerman, M., and Adamic, L. (2007). Expertise Networks in Online Com-
munities: Structure and Algorithms. In Proceedings of the 16th International World
Wide Web Conference (WWW).
Zhang, R. and Tran, T. (2008). An Entropy-Based Model for Discovering the Usefulness
of Online Product Reviews. In Proceedings of the IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).
Zhang, Z. and Varadarajan, B. (2006). Utility Scoring of Product Reviews. In Pro-
ceedings of the 15th ACM International Conference on Information and Knowledge
Management (CIKM).
Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables
and Formulae. Chapman & Hall.