# Recommender Systems

## Bayesian Recommender Systems

Recommender systems are increasingly driving user experiences on the Internet. This personalization is often achieved through the factorization of a large but sparse observation matrix of user-item feedback signals. We present a probabilistic model for generating personalised recommendations of items to users. The system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional trait space in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data.

We also include a dynamics model which allows an item’s popularity, a user’s taste or a user’s personal rating scale to drift over time. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. In instances where the user’s social network is known, its inclusion can significantly improve recommendations for cold start users. We also propose and investigate two ways for including a social network, either as a Markov Random Field that describes a user similarity in the prior over user features, or an explicit model that treats social links as observations.

• M. Gartrell, U. Paquet, and R. Herbrich, „A Bayesian Treatment of Social Links in Recommender Systems,“ CU Technical Report CU-CS-1092-12 2012.

Recommender systems are increasingly driving user experiences on the Internet. This personalization is often achieved through the factorization of a large but sparse observation matrix of user-item feedback signals. In instances where the user’s social network is known, its inclusion can significantly improve recommendations for cold start users. There are numerous ways in which the network can be incorporated into a probabilistic graphical model. We propose and investigate two ways for including a social network, either as a Markov Random Field that describes a user similarity in the prior over user features, or an explicit model that treats social links as observations. State of the art performance is reported on the Flixster online social network dataset.

@techreport{gartrell2012bayesian,
abstract = {Recommender systems are increasingly driving user experiences on the Internet. This personalization is often achieved through the factorization of a large but sparse observation matrix of user-item feedback signals. In instances where the user's social network is known, its inclusion can significantly improve recommendations for cold start users. There are numerous ways in which the network can be incorporated into a probabilistic graphical model. We propose and investigate two ways for including a social network, either as a Markov Random Field that describes a user similarity in the prior over user features, or an explicit model that treats social links as observations. State of the art performance is reported on the Flixster online social network dataset.},
author = {Gartrell, Mike and Paquet, Ulrich and Herbrich, Ralf},
institution = {CU Technical Report CU-CS-1092-12},
title = {A {Bayesian} Treatment of Social Links in Recommender Systems},
url = {https://www.herbrich.me/papers/ucb511101092internet.pdf},
year = {2012},
}

• D. Stern, R. Herbrich, and T. Graepel, „Matchbox: Large Scale Online Bayesian Recommendations,“ in Proceedings of the 18th International Conference on World Wide Web, 2009, p. 111–120.

We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional trait space in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Here we present three alternatives: direct observation of an absolute rating each user gives to some items, observation of a binary preference (like/ don’t like) and observation of a set of ordinal ratings on a userspecific scale. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. We also include a dynamics model which allows an item’s popularity, a user’s taste or a user’s personal rating scale to drift over time. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. We evaluate the performance of the algorithm on the MovieLens and Netflix data sets consisting of approximately 1,000,000 and 100,000,000 ratings respectively. This demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance with the option of improving performance further if computational resources are available by performing multiple EP passes over the training data.

@inproceedings{stern2009matchbox,
abstract = {We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional trait space in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Here we present three alternatives: direct observation of an absolute rating each user gives to some items, observation of a binary preference (like/ don't like) and observation of a set of ordinal ratings on a userspecific scale. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. We also include a dynamics model which allows an item's popularity, a user's taste or a user's personal rating scale to drift over time. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. We evaluate the performance of the algorithm on the MovieLens and Netflix data sets consisting of approximately 1,000,000 and 100,000,000 ratings respectively. This demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance with the option of improving performance further if computational resources are available by performing multiple EP passes over the training data.},
author = {Stern, David and Herbrich, Ralf and Graepel, Thore},
booktitle = {Proceedings of the 18th International Conference on World Wide Web},
pages = {111--120},
title = {{Matchbox}: Large Scale Online {Bayesian} Recommendations},
url = {https://www.herbrich.me/papers/www09.pdf},
year = {2009}
}

## Large-Scale Approximate Recommender Systems

In general, collaborative filtering (CF) systems show their real strength when supplied with enormous data sets. In order to handle massive amounts of information, we investigated sketching techniques. We demonstrate how to use fingerprinting methods to compute a family of rank correlation coefficients with high accuracy and confidence. We examine the suggested methods empirically through a recommender system for the Netflix dataset, showing that the required fingerprint sizes are even smaller than the theoretical analysis suggests. We also explore the of use standard hash functions rather than min-wise independent hashes and the relation between the quality of the final recommendations and the fingerprint size.

• Y. Bachrach and R. Herbrich, „Fingerprinting Ratings for Collaborative Filtering – Theoretical and Empirical Analysis,“ in Proceedings of 17th International Symposium on String Processing and Information Retrieval, 2010, p. 25–36.

We consider fingerprinting methods for collaborative filtering (CF) systems. In general, CF systems show their real strength when supplied with enormous data sets. Earlier work already suggests sketching techniques to handle massive amounts of information, but most prior analysis has so far been limited to non-ranking application scenarios and has focused mainly on a theoretical analysis. We demonstrate how to use fingerprinting methods to compute a family of rank correlation coefficients. Our methods allow identifying users who have similar rankings over a certain set of items, a problem that lies at the heart of CF applications. We show that our method allows approximating rank correlations with high accuracy and confidence. We examine the suggested methods empirically through a recommender system for the Netflix dataset, showing that the required fingerprint sizes are even smaller than the theoretical analysis suggests. We also explore the of use standard hash functions rather than min-wise independent hashes and the relation between the quality of the final recommendations and the fingerprint size.

@inproceedings{bachrach2010fingerprinting,
abstract = {We consider fingerprinting methods for collaborative filtering (CF) systems. In general, CF systems show their real strength when supplied with enormous data sets. Earlier work already suggests sketching techniques to handle massive amounts of information, but most prior analysis has so far been limited to non-ranking application scenarios and has focused mainly on a theoretical analysis. We demonstrate how to use fingerprinting methods to compute a family of rank correlation coefficients. Our methods allow identifying users who have similar rankings over a certain set of items, a problem that lies at the heart of CF applications. We show that our method allows approximating rank correlations with high accuracy and confidence. We examine the suggested methods empirically through a recommender system for the Netflix dataset, showing that the required fingerprint sizes are even smaller than the theoretical analysis suggests. We also explore the of use standard hash functions rather than min-wise independent hashes and the relation between the quality of the final recommendations and the fingerprint size.},
author = {Bachrach, Yoram and Herbrich, Ralf},
booktitle = {Proceedings of 17th International Symposium on String Processing and Information Retrieval},
pages = {25--36},
title = {Fingerprinting Ratings for Collaborative Filtering - Theoretical and Empirical Analysis},
url = {https://www.herbrich.me/papers/spire10.pdf},
year = {2010}
}

• Y. Bachrach, R. Herbrich, and E. Porat, „Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems,“ in Proceedings of 16th International Symposium String Processing and Information Retrieval, 2009, p. 344–352.

Collaborative filtering (CF) shares information between users to provide each with recommendations. Previous work suggests using sketching techniques to handle massive data sets in CF systems, but only allows testing whether users have a high proportion of items they have both ranked. We show how to determine the correlation between the rankings of two users, using concise sketches of the rankings. The sketches allow approximating Kendall’s Tau, a known rank correlation, with high accuracy $\epsilon$ and high confidence $1-\delta$. The required sketch size is logarithmic in the confidence and polynomial in the accuracy.

@inproceedings{bachrach2009sketching,
abstract = {Collaborative filtering (CF) shares information between users to provide each with recommendations. Previous work suggests using sketching techniques to handle massive data sets in CF systems, but only allows testing whether users have a high proportion of items they have both ranked. We show how to determine the correlation between the rankings of two users, using concise sketches of the rankings. The sketches allow approximating Kendall's Tau, a known rank correlation, with high accuracy $\epsilon$ and high confidence $1-\delta$. The required sketch size is logarithmic in the confidence and polynomial in the accuracy.},
author = {Bachrach, Yoram and Herbrich, Ralf and Porat, Ely},
booktitle = {Proceedings of 16th International Symposium String Processing and Information Retrieval},
pages = {344--352},
title = {Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems},
url = {https://www.herbrich.me/papers/spire09.pdf},
year = {2009}
}

## Active Learning in Recommender Systems

Recommendation systems are trained to predict user behavior which is then used to generate new training by receiving actual user behavior of these recommendations. This creates a feedback loop: as long as the low-cost way to interact with the service is through the recommender, the recommender will only ever see behavioral data on the items it chooses. This process can lead to hidden biases, as it effectively limits how much information the recommender system will ever see. We explore the notion that recommender systems are a special kind of active learning agents, with the peculiarity that the cost of asking for the label of an instance depends on its true label, as the cost of showing a bad recommendation when exploring is higher than the cost of showing a good recommendation.

• A. Passos, J. Van Gael, R. Herbrich, and U. Paquet, „A Penny for Your Thoughts? The Value of Information in Recommendation Systems,“ in NIPS Workshop on Bayesian Optimization, Experimental Design, and Bandits, 2011, p. 9–14.

Most recommendation systems are trained to predict behavioral data and then used to generate more such data by recommending items and receiving feedback on the quality of these recommendations. This data in then fed back into the training process. This creates a feedback loop: as long as the low-cost way to interact with the service is through the recommender, the recommender will only ever see behavioral data on the items it chooses. This process can lead to hidden biases, as it effectively limits how much information the recommender system will ever see. On the other hand, there is a cost to making exploratory recommendations, as they should, myopically, be worse than the best bets of a recommendation system. In this paper we explore the notion that recommender systems are a special kind of active learning agents, with the peculiarity that the cost of asking for the label of an instance depends on its true label, as the cost of showing a bad recommendation when exploring is higher than the cost of showing a good recommendation.

@inproceedings{passos2011informationvalue,
abstract = {Most recommendation systems are trained to predict behavioral data and then used to generate more such data by recommending items and receiving feedback on the quality of these recommendations. This data in then fed back into the training process. This creates a feedback loop: as long as the low-cost way to interact with the service is through the recommender, the recommender will only ever see behavioral data on the items it chooses. This process can lead to hidden biases, as it effectively limits how much information the recommender system will ever see. On the other hand, there is a cost to making exploratory recommendations, as they should, myopically, be worse than the best bets of a recommendation system. In this paper we explore the notion that recommender systems are a special kind of active learning agents, with the peculiarity that the cost of asking for the label of an instance depends on its true label, as the cost of showing a bad recommendation when exploring is higher than the cost of showing a good recommendation.},
author = {Passos, Alexandre and Van Gael, Juergen and Herbrich, Ralf and Paquet, Ulrich},
booktitle = {{NIPS} Workshop on {Bayesian} Optimization, Experimental Design, and Bandits},
pages = {9--14},
title = {A Penny for Your Thoughts? The Value of Information in Recommendation Systems},
url = {https://www.herbrich.me/papers/passos11penny.pdf},
year = {2011}
}

## Transparent User Models

Personalization is a ubiquitous phenomenon in our daily online experience. While such technology is critical for helping us combat the overload of information we face, in many cases, we may not even realize that our results are being tailored to our personal tastes and preferences. Worse yet, when such a system makes a mistake, we have little recourse to correct it. In this work, we developed a new user-interpretable feature set upon which to base personalized recommendations. These features, which we call badges, represent fundamental traits of users (e.g., “vegetarian” or “Apple fanboy”) inferred by modeling the interplay between a user’s behavior and self-reported identity.

• K. El-Arini, U. Paquet, R. Herbrich, J. Van Gael, and B. Agüera y Arcas, „Transparent User Models for Personalization,“ in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, p. 678–686.

Personalization is a ubiquitous phenomenon in our daily online experience. While such technology is critical for helping us combat the overload of information we face, in many cases, we may not even realize that our results are being tailored to our personal tastes and preferences. Worse yet, when such a system makes a mistake, we have little recourse to correct it. In this work, we propose a framework for addressing this problem by developing a new user-interpretable feature set upon which to base personalized recommendations. These features, which we call badges, represent fundamental traits of users (e.g., “vegetarian” or “Apple fanboy”) inferred by modeling the interplay between a user’s behavior and self-reported identity. Specifically, we consider the microblogging site Twitter, where users provide short descriptions of themselves in their profiles, as well as perform actions such as tweeting and retweeting. Our approach is based on the insight that we can define badges using high precision, low recall rules (e.g., “Twitter profile contains the phrase ‘Apple fanboy”’), and with enough data, generalize to other users by observing shared behavior. We develop a fully Bayesian, generative model that describes this interaction, while allowing us to avoid the pitfalls associated with having positive-only data. Experiments on real Twitter data demonstrate the effectiveness of our model at capturing rich and interpretable user traits that can be used to provide transparency for personalization.

@inproceedings{elarini2012transparent,
abstract = {Personalization is a ubiquitous phenomenon in our daily online experience. While such technology is critical for helping us combat the overload of information we face, in many cases, we may not even realize that our results are being tailored to our personal tastes and preferences. Worse yet, when such a system makes a mistake, we have little recourse to correct it. In this work, we propose a framework for addressing this problem by developing a new user-interpretable feature set upon which to base personalized recommendations. These features, which we call badges, represent fundamental traits of users (e.g., “vegetarian” or “Apple fanboy”) inferred by modeling the interplay between a user’s behavior and self-reported identity. Specifically, we consider the microblogging site Twitter, where users provide short descriptions of themselves in their profiles, as well as perform actions such as tweeting and retweeting. Our approach is based on the insight that we can define badges using high precision, low recall rules (e.g., “Twitter profile contains the phrase ‘Apple fanboy”’), and with enough data, generalize to other users by observing shared behavior. We develop a fully Bayesian, generative model that describes this interaction, while allowing us to avoid the pitfalls associated with having positive-only data. Experiments on real Twitter data demonstrate the effectiveness of our model at capturing rich and interpretable user traits that can be used to provide transparency for personalization.},
author = {El-Arini, Khalid and Paquet, Ulrich and Herbrich, Ralf and Van Gael, Jurgen and Ag{\"u}era y Arcas, Blaise},
booktitle = {Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
pages = {678--686},
title = {Transparent User Models for Personalization},
url = {https://www.herbrich.me/papers/kdd2012-personalization.pdf},
organization = {ACM},
year = {2012}
}

## Collaborative Expert Portfolio Management

We consider the task of assigning experts from a portfolio of specialists in order to solve a set of tasks by combining collaborative filtering with a feature-based description of tasks and experts to yield a general framework for managing a portfolio of experts. The model learns an embedding of tasks and problems into a latent space in which affinity is measured by the inner product. The model can be trained incrementally and can track non-stationary data, tracking potentially changing expert and task characteristics. The approach allows us to use a principled decision theoretic framework for expert selection and recommendation, allowing the user to choose a utility function that best suits their objectives. We apply the model to manage a portfolio of algorithms to solve hard combinatorial problems. This is a well studied area and we demonstrate a large improvement on the state of the art in one domain (constraint solving) and in a second domain (combinatorial auctions) created a portfolio that performed significantly better than any single algorithm.

• D. Stern, H. Samulowitz, R. Herbrich, T. Graepel, L. Pulina, and A. Tacchella, „Collaborative Expert Portfolio Management,“ in Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010.
@inproceedings{stern2010collaborativeexperts,
}