Bayesian Learning

Approximate Bayesian Inference

Bayesian inference provides a powerful mechanism for data analysis and learning. However, in real-world situations it is rarely possible to perform exact inference; in fact, exact Bayesian inference is in general NP hard. One of the most successful approaches to address this problem is to exploit a factorial structure of both the sampling distribution and the prior. Then, there are a variety of methods that exploit the factorisation for efficient approximations in inference. The Expectation-Propagation (EP) algorithm is a powerful tool for inference in such factor graphs. For discrete distributions, the EP algorithm is also known as Belief Propagation.

Poisson Networks

Modelling structured multivariate point process data has wide ranging applications like understanding neural activity, developing faster file access systems and learning dependencies among servers in large networks. In this project, we develop the Poisson network model for representing multivariate structured Poisson processes. In our model each node of the network represents a Poisson process. The novelty of our work is that waiting times of a process are modelled by an exponential distribution with a piecewise constant rate function that depends on the event counts of its parents in the network in a generalised linear way. Our choice of model allows to perform exact sampling from arbitrary structures. We adopt a Bayesian approach for learning the network structure. We also develop fixed point and sampling based approximations for performing inference of rate functions in Poisson networks.

Informative Vector Machine

We have developed a framework for sparse Gaussian process methods which uses forward selection with criteria based on information-theoretical principles, previously suggested for active learning. In contrast to most previous work on sparse GPs, our goal is not only to learn sparse predictors (which can be evaluated in O(d) rather than O(n), d<n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(nd2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet it requires only a fraction of the training time. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation.

Bayesian Transduction

We consider the case of binary classification by linear discriminant functions. The simplification of the transduction problem results from the fact that the infinite number of linear discriminants is boiled down to a finite number of equivalence classes on the working set. The number of equivalence classes is bounded from above by the growth function. Each equivalence class corresponds to a polyhedron in parameter space. In a PAC style setting we consider only the region of parameter space with zero training error, often referred to as the version space. From a Bayesian point of view, we suggest to measure the prior probability of a labelling of the working set as the volume of the corresponding polyhedron w.r.t. the a-priori distribution in parameter space. Then the maximum a-posterior (MAP) scheme recommends to choose the labelling of maximum volume.

Bayes Point Machines

From a Bayesian perspective Support Vector Machines choose the hypothesis corresponding to the largest possible hypersphere that can be inscribed in version space, i.e. in the space of all consistent hypotheses given a training set. Those boundaries of version space which are tangent to the hypersphere define the support vectors. An alternative and potentially better approach is to construct the hypothesis using the whole of version space. This is achieved by using a Bayes Point Machine which finds the midpoint of the region of intersection of all hyperplanes bisecting version space into two halves of equal volume (the Bayes point). It is known that the centre of mass of version space approximates the Bayes point. We investigate estimating the centre of mass by averaging over the trajectory of a billiard ball bouncing in version space. Experimental results indicate that Bayes Point Machines consistently outperform Support Vector Machines.