Kernel Methods

Semidefinite Programming for Classification Learning

Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Approaches are either based on regularisation or on the generation of virtual (transformed) examples. We developed a new framework for learning linear classifiers under known transformations based on semi-definite programming. We present a new learning algorithm – the Semi-Definite Programming Machine (SDPM) – which is able to find a maximum margin hyper-plane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one trans- formation parameter, and to learning with kernels are conceivable.

Adaptive Margin Machines

Former approaches for learning kernel classifiers like Quadratic Programming Machines (SVMs), and Linear Programming Machines were based on minimization of a regularized margin loss where the margin was treated equivalently for each training pattern. We propose a reformulation of the minimization problem such that adaptive margins (AMM) for each training pattern are utilized. Furthermore, we give bounds on the generalization error of AMMs which justify their robustness against outliers. We show experimentally that the generalization error of AMMs is comparable to QP- and LP-Machines on benchmark datasets from the UCI repository.

Ordinal Regression

In contrast to the standard machine learning tasks of classification and metric regression we investigate the problem of predicting variables of ordinal scale, a setting referred to as ordinal regression. This problem arises frequently in the social sciences and in information retrieval where human preferences play a major role. Whilst approaches proposed in Statistics rely on a probability model of a latent (unobserved) variable we present a distribution independent risk formulation of ordinal regression which allows us to derive uniform convergence bounds. Applying these bounds we present a large margin algorithm that is based on a mapping from objects to scalar utility values though classifying pairs of objects.

Classification Learning on Proximity Data

We investigate the problem of learning a classification task on data represented in terms of their pair wise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pair wise proximities can always be calculated. Our approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics w.r.t. their generalization.