Knowledge Corroboration
Current knowledge bases suffer from either low coverage or low accuracy, yet user feedback can greatly improve the quality of automatically extracted knowledge bases. User feedback could help quantify the uncertainty associated with the stored statements and would enable mechanisms for searching, ranking and reasoning at entity-relationship level. Most importantly, a principled model for exploiting user feedback to learn the truth values of statements in the knowledge base would be a major step forward in addressing the issue of knowledge base curation. We present a family of probabilistic graphical models that builds on user feedback and logical inference rules derived from the popular Semantic-Web formalism of RDFS. Through internal inference and belief propagation, these models can learn both, the truth values of the statements in the knowledge base and the reliabilities of the users who give feedback. We demonstrate the viability of our approach in extensive experiments on real-world datasets, with feedback collected from Amazon Mechanical Turk.
1.
Kasneci, Gjergji; Gael, Jurgen Van; Herbrich, Ralf; Graepel, Thore
Bayesian Knowledge Corroboration with Logical Rules and User Feedback Proceedings Article
In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 1–18, 2010.
@inproceedings{kasneci2010knowledgecorroboration,
title = {Bayesian Knowledge Corroboration with Logical Rules and User Feedback},
author = {Gjergji Kasneci and Jurgen Van Gael and Ralf Herbrich and Thore Graepel},
url = {https://www.herbrich.me/papers/ecml10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases},
pages = {1--18},
abstract = {Current knowledge bases suffer from either low coverage or low accuracy. The underlying hypothesis of this work is that user feedback can greatly improve the quality of automatically extracted knowledge bases. The feedback could help quantify the uncertainty associated with the stored statements and would enable mechanisms for searching, ranking and reasoning at entity-relationship level. Most importantly, a principled model for exploiting user feedback to learn the truth values of statements in the knowledge base would be a major step forward in addressing the issue of knowledge base curation. We present a family of probabilistic graphical models that builds on user feedback and logical inference rules derived from the popular Semantic-Web formalism of RDFS. Through internal inference and belief propagation, these models can learn both, the truth values of the statements in the knowledge base and the reliabilities of the users who give feedback. We demonstrate the viability of our approach in extensive experiments on real-world datasets, with feedback collected from Amazon Mechanical Turk.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Current knowledge bases suffer from either low coverage or low accuracy. The underlying hypothesis of this work is that user feedback can greatly improve the quality of automatically extracted knowledge bases. The feedback could help quantify the uncertainty associated with the stored statements and would enable mechanisms for searching, ranking and reasoning at entity-relationship level. Most importantly, a principled model for exploiting user feedback to learn the truth values of statements in the knowledge base would be a major step forward in addressing the issue of knowledge base curation. We present a family of probabilistic graphical models that builds on user feedback and logical inference rules derived from the popular Semantic-Web formalism of RDFS. Through internal inference and belief propagation, these models can learn both, the truth values of the statements in the knowledge base and the reliabilities of the users who give feedback. We demonstrate the viability of our approach in extensive experiments on real-world datasets, with feedback collected from Amazon Mechanical Turk.
2.
Paquet, Ulrich; Gael, Jurgen Van; Stern, David; Kasneci, Gjergji; Herbrich, Ralf; Graepel, Thore
Vuvuzelas & Active Learning for Online Classification Proceedings Article
In: Proceedings of Computational Social Science and the Wisdom of Crowds Workshop, 2010.
@inproceedings{paquet2010,
title = {Vuvuzelas & Active Learning for Online Classification},
author = {Ulrich Paquet and Jurgen Van Gael and David Stern and Gjergji Kasneci and Ralf Herbrich and Thore Graepel},
url = {https://www.herbrich.me/papers/vuvuzela.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of Computational Social Science and the Wisdom of Crowds Workshop},
abstract = {Many online service systems leverage user-generated content from Web 2.0 style platforms such as Wikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features. In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest hype on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Many online service systems leverage user-generated content from Web 2.0 style platforms such as Wikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features. In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest hype on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.
Features from Knowledge Bases
The prediction accuracy of learning algorithms highly depends on the quality of the selected features; but often, the task of feature construction and selection is tedious and non-scalable. Over the years, however, there have been numerous projects with the goal of constructing general-purpose or domain-specific knowledge bases with entity-relationship-entity triples extracted from various Web sources or collected from user communities, e.g., YAGO, DBpedia, Free- base, UMLS. We introduce an expressive graph-based language for extracting features from such knowledge bases and a theoretical framework for constructing feature vectors from the extracted features. The experimental evaluation on different learning scenarios provides evidence that the features derived through our framework can considerably improve the prediction accuracy, especially when the labeled data at hand is sparse.
1.
Cheng, Weiwei; Kasneci, Gjergji; Graepel, Thore; Stern, David H; Herbrich, Ralf
Automated Feature Generation From Structured Knowledge Proceedings Article
In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, pp. 1395–1404, 2011.
@inproceedings{cheng2011featuresfromknowledge,
title = {Automated Feature Generation From Structured Knowledge},
author = {Weiwei Cheng and Gjergji Kasneci and Thore Graepel and David H Stern and Ralf Herbrich},
url = {https://www.herbrich.me/papers/cikmfp0337-cheng.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 20th ACM Conference on Information and Knowledge Management},
pages = {1395--1404},
abstract = {The prediction accuracy of any learning algorithm highly depends on the quality of the selected features; but often, the task of feature construction and selection is tedious and non-scalable. In recent years, however, there have been numerous projects with the goal of constructing general-purpose or domain-specific knowledge bases with entity-relationship-entity triples extracted from various Web sources or collected from user communities, e.g., YAGO, DBpedia, Free- base, UMLS, etc. This paper advocates the simple and yet far-reaching idea that the structured knowledge contained in such knowledge bases can be exploited to automatically extract features for general learning tasks. We introduce an expressive graph-based language for extracting features from such knowledge bases and a theoretical framework for constructing feature vectors from the extracted features. Our experimental evaluation on different learning scenarios provides evidence that the features derived through our framework can considerably improve the prediction accuracy, especially when the labeled data at hand is sparse.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The prediction accuracy of any learning algorithm highly depends on the quality of the selected features; but often, the task of feature construction and selection is tedious and non-scalable. In recent years, however, there have been numerous projects with the goal of constructing general-purpose or domain-specific knowledge bases with entity-relationship-entity triples extracted from various Web sources or collected from user communities, e.g., YAGO, DBpedia, Free- base, UMLS, etc. This paper advocates the simple and yet far-reaching idea that the structured knowledge contained in such knowledge bases can be exploited to automatically extract features for general learning tasks. We introduce an expressive graph-based language for extracting features from such knowledge bases and a theoretical framework for constructing feature vectors from the extracted features. Our experimental evaluation on different learning scenarios provides evidence that the features derived through our framework can considerably improve the prediction accuracy, especially when the labeled data at hand is sparse.
Efficient Graph Matching
The ⍬-subsumption problem is crucial to the efficiency of learning systems on structured knowledge bases, finding a match of a sub-graph with variables in node or edge labels. We present discuss two ⍬-subsumption algorithms based on strategies for preselecting suitable matching literals for the variables. We further map the general problem of ⍬-subsumption to a certain problem of finding a clique of fixed size in a graph, and in turn show that a specialization of the pruning strategy of the Carraghan and Pardalos clique algorithm provides a dramatic reduction of the subsumption search space.
1.
Scheffer, Tobias; Herbrich, Ralf; Wysotzki, Fritz
Efficient $Theta$-Subsumption Based on Graph Algorithms Proceedings Article
In: Lecture Notes in Artifical Intelligence: 6th International Workshop on Inductive Logic Programming,, pp. 212–228, 1996.
@inproceedings{scheffer1996subsumption,
title = {Efficient $Theta$-Subsumption Based on Graph Algorithms},
author = {Tobias Scheffer and Ralf Herbrich and Fritz Wysotzki},
url = {https://www.herbrich.me/papers/scheffer96.pdf},
year = {1996},
date = {1996-01-01},
booktitle = {Lecture Notes in Artifical Intelligence: 6th International Workshop on Inductive Logic Programming,},
volume = {1314},
pages = {212--228},
abstract = {The $theta$-subsumption problem is crucial to the efficiency of ILP learning systems. We discuss two $theta$-subsumption algorithms based on strategies for preselecting suitable matching literals. The class of clauses, for which subsumption becomes polynomial, is a superset of the deterministic clauses. We further map the general problem of $theta$-subsumption to a certain problem of finding a clique of fixed size in a graph, and in return show that a specialization of the pruning strategy of the Carraghan and Pardalos clique algorithm provides a dramatic reduction of the subsumption search space. We also present empirical results for the mesh design data set.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The $theta$-subsumption problem is crucial to the efficiency of ILP learning systems. We discuss two $theta$-subsumption algorithms based on strategies for preselecting suitable matching literals. The class of clauses, for which subsumption becomes polynomial, is a superset of the deterministic clauses. We further map the general problem of $theta$-subsumption to a certain problem of finding a clique of fixed size in a graph, and in return show that a specialization of the pruning strategy of the Carraghan and Pardalos clique algorithm provides a dramatic reduction of the subsumption search space. We also present empirical results for the mesh design data set.