Computational Advertising

Click-Through Rate Prediction

Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most on-line advertising systems. We introduce a new Bayesian click-through rate (CTR) prediction algorithm based on a probit regression model that maps discrete or real-valued input features to probabilities. It maintains Gaussian beliefs over weights of the model and performs Gaussian online updates derived from approximate message passing. Scalability of the algorithm is ensured through a principled weight pruning procedure and an approximate parallel implementation. In practice, the model combines decision trees with logistic regression, outperforming either of these methods on its own. The most important thing is to have the right features: those capturing historical information about the user or ad dominate other types of features. Picking the optimal handling for data freshness, learning rate schema and data sampling improve the model slightly, though much less than adding a high-value feature, or picking the right model to begin with.

  • X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. Candela, „Practical Lessons from Predicting Clicks on Ads at Facebook,“ in Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2014, p. 1–9.
    [BibTeX] [Abstract] [Download PDF]

    Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most on-line advertising systems. With over 750 million daily active users and over 1 million active advertisers, predicting clicks on Facebook ads is a challenging machine learning task. In this paper we introduce a model which combines decision trees with logistic regression, outperforming either of these methods on its own by over 3%, an improvement with significant impact to the overall system performance. We then explore how a number of fundamental parameters impact the final prediction performance of our system. Not surprisingly, the most important thing is to have the right features: those capturing historical information about the user or ad dominate other types of features. Once we have the right features and the right model (decisions trees plus logistic regression), other factors play small roles (though even small improvements are important at scale). Picking the optimal handling for data freshness, learning rate schema and data sampling improve the model slightly, though much less than adding a high-value feature, or picking the right model to begin with.

    @inproceedings{he2014advertising,
    abstract = {Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most on-line advertising systems. With over 750 million daily active users and over 1 million active advertisers, predicting clicks on Facebook ads is a challenging machine learning task. In this paper we introduce a model which combines decision trees with logistic regression, outperforming either of these methods on its own by over 3%, an improvement with significant impact to the overall system performance. We then explore how a number of fundamental parameters impact the final prediction performance of our system. Not surprisingly, the most important thing is to have the right features: those capturing historical information about the user or ad dominate other types of features. Once we have the right features and the right model (decisions trees plus logistic regression), other factors play small roles (though even small improvements are important at scale). Picking the optimal handling for data freshness, learning rate schema and data sampling improve the model slightly, though much less than adding a high-value feature, or picking the right model to begin with.},
    author = {He, Xinran and Pan, Junfeng and Jin, Ou and Xu, Tianbing and Liu, Bo and Xu, Tao and Shi, Yanxin and Atallah, Antoine and Herbrich, Ralf and Bowers, Stuart and Candela, Joaquin Qui\~{n}onero},
    booktitle = {Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
    pages = {1--9},
    title = {Practical Lessons from Predicting Clicks on Ads at {F}acebook},
    url = {https://www.herbrich.me/papers/adclicksfacebook.pdf},
    organization = {ACM},
    year = {2014}
    }

  • T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich, „Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine,“ in Proceedings of the 27th International Conference on Machine Learning, 2010, p. 13–20.
    [BibTeX] [Abstract] [Download PDF]

    We describe a new Bayesian click-through rate (CTR) prediction algorithm used for Sponsored Search in Microsoft’s Bing search engine. The algorithm is based on a probit regression model that maps discrete or real-valued input features to probabilities. It maintains Gaussian beliefs over weights of the model and performs Gaussian online updates derived from approximate message passing. Scalability of the algorithm is ensured through a principled weight pruning procedure and an approximate parallel implementation. We discuss the challenges arising from evaluating and tuning the predictor as part of the complex system of sponsored search where the predictions made by the algorithm decide about future training sample composition. Finally, we show experimental results from the production system and compare to a calibrated Naive Bayes algorithm.

    @inproceedings{graepel2010bayesianctr,
    abstract = {We describe a new Bayesian click-through rate (CTR) prediction algorithm used for Sponsored Search in Microsoft's Bing search engine. The algorithm is based on a probit regression model that maps discrete or real-valued input features to probabilities. It maintains Gaussian beliefs over weights of the model and performs Gaussian online updates derived from approximate message passing. Scalability of the algorithm is ensured through a principled weight pruning procedure and an approximate parallel implementation. We discuss the challenges arising from evaluating and tuning the predictor as part of the complex system of sponsored search where the predictions made by the algorithm decide about future training sample composition. Finally, we show experimental results from the production system and compare to a calibrated Naive Bayes algorithm.},
    author = {Graepel, Thore and Candela, Joaquin Qui\~{n}onero and Borchert, Thomas and Herbrich, Ralf},
    booktitle = {Proceedings of the 27th International Conference on Machine Learning},
    pages = {13--20},
    title = {Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft's {B}ing Search Engine},
    url = {https://www.herbrich.me/papers/adpredictor.pdf},
    year = {2010}
    }

Keyword Suggestions

We developed an efficient Bayesian online learning algorithm for clustering advertisements based on the keywords they have been subscribed to. The proposed approach scales well for large datasets, and compares favorably to other clustering algorithms on advertisements. As a concrete application to online advertising, we show how the learned model can be used to recommend new keywords for given advertisements.

  • A. Schwaighofer, J. Q. Candela, T. Borchert, T. Graepel, and R. Herbrich, „Scalable Clustering and Keyword Suggestion for Online Advertisements,“ in Proceedings of 3rd Annual International Workshop on Data Mining and Audience Intelligence for Advertising, 2009, p. 27–36.
    [BibTeX] [Abstract] [Download PDF]

    We present an efficient Bayesian online learning algorithm for clustering vectors of binary values based on a well known model, the mixture of Bernoulli profiles. The model includes conjugate Beta priors over the success probabilities and maintains discrete probability distributions for cluster assignments. Clustering is then formulated as inference in a factor graph which is solved efficiently using online approximate message passing. The resulting algorithm has three key features: a) it requires only a single pass across the data and can hence be used on data streams, b) it maintains the uncertainty of parameters and cluster assignments, and c) it implements an automatic step size adaptation based on the current model uncertainty. The model is tested on an artificially generated toy dataset and applied to a large scale real-world data set from online advertising, the data being online ads characterized by the set of keywords to which they have been subscribed. The proposed approach scales well for large datasets, and compares favorably to other clustering algorithms on the ads dataset. As a concrete application to online advertising we show how the learned model can be used to recommend new keywords for given ads.

    @inproceedings{schwaighofer2009clustering,
    abstract = {We present an efficient Bayesian online learning algorithm for clustering vectors of binary values based on a well known model, the mixture of Bernoulli profiles. The model includes conjugate Beta priors over the success probabilities and maintains discrete probability distributions for cluster assignments. Clustering is then formulated as inference in a factor graph which is solved efficiently using online approximate message passing. The resulting algorithm has three key features: a) it requires only a single pass across the data and can hence be used on data streams, b) it maintains the uncertainty of parameters and cluster assignments, and c) it implements an automatic step size adaptation based on the current model uncertainty. The model is tested on an artificially generated toy dataset and applied to a large scale real-world data set from online advertising, the data being online ads characterized by the set of keywords to which they have been subscribed. The proposed approach scales well for large datasets, and compares favorably to other clustering algorithms on the ads dataset. As a concrete application to online advertising we show how the learned model can be used to recommend new keywords for given ads.},
    author = {Schwaighofer, Anton and Candela, Joaquin Qui\~{n}onero and Borchert, Thomas and Graepel, Thore and Herbrich, Ralf},
    booktitle = {Proceedings of 3rd Annual International Workshop on Data Mining and Audience Intelligence for Advertising},
    pages = {27--36},
    title = {Scalable Clustering and Keyword Suggestion for Online Advertisements},
    url = {https://www.herbrich.me/papers/kdd2009.pdf},
    year = {2009}
    }