COLT & UAI 2008, Invited Speakers

Peter Grünwald: The Catch-Up Phenomenon in Bayesian Inference

Standard Bayesian model selection/averaging sometimes learn too slowly: there exist other learning methods that lead to better predictions based on less data. We give a novel analysis of this "catch-up" phenomenon. Based on this analysis, we propose the switching method, a modification of Bayesian model averaging that never learns slower, but sometimes learns much faster than Bayes. The method is related to expert-tracking algorithms developed in the COLT literature, and has time complexity comparable to Bayes.

The switching method resolves a long-standing debate in statistics, known as the AIC-BIC dilemma: model selection/averaging methods like BIC, Bayes, and MDL are consistent (they eventually infer the correct model) but, when used for prediction, the rate at which predictions improve can be suboptimal. Methods like AIC and leave-one-out cross-validation are inconsistent but typically converge at the optimal rate. Our method is the first that provably achieves both. Experiments with nonparametric density estimation confirm that these large-sample theoretical results also hold in practice in small samples.

Joint work with T. van Erven and S. de Rooij.

Photo Peter Grünwald (Ph.D. 1998) heads the information-theoretic learning group at the CWI, the national research institute for mathematics and computer science in the Netherlands, located in Amsterdam. He is also affiliated with EURANDOM. His research interests lie where statistics, computer science and information theory meet: theories of learning from data. He is author of the book The Minimum Description Length Principle (MIT Press, June 2007). Recently, he has been involved in exposing the flawed statistics that were used in the trial against Lucia de B., nurse and alleged serial killer.

Robin Hanson: Combinatorial Prediction Markets

Several hundred organizations are now using prediction markets to forecast sales, project completion dates, and more. This number has been doubling annually for several years. Most, however, are simple prediction markets, with one market per number forecast, and several traders per market. In contrast, a single combinatorial prediction market lets a few traders manage an entire combinatorial space of forecasts. For millions of numbers or less, implementation is easy, and lab experiments have confirmed feasibility and accuracy. For larger spaces, however, many open computational problems remain.

Photo Robin Hanson is an Associate Professor of Economics at George Mason University; he received his Ph.D. in 1998 from California Institute of Technology. He is one of the founders of information markets (also known as 'idea futures' or 'prediction markets'), speculative markets created for the purpose of making predictions. These markets have been the subject of considerable academic research and attention in the popular media.

Dan Klein: Unsupervised Learning for Natural Language Processing

Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion. The resulting coarse-to-fine grammars admit efficient coarse-to-fine inference schemes and have produced the best parsing results in a variety of languages. For coreference resolution, we describe a discourse model in which entities are shared across documents using a hierarchical Dirichlet process. In each document, entities are repeatedly rendered into mention strings by a sequential model of attentional state and anaphoric constraint. Despite being fully unsupervised, this approach is competitive with the best supervised approaches. Finally, for machine translation, we present a model which learns translation lexicons from non-parallel corpora. Alignments between word types are modeled by a prior over matchings. Given any fixed alignment, a joint density over word vectors derives from probabilistic canonical correlation analysis. This approach is capable of discovering high-precision translations, even when the underlying corpora and languages are divergent.

Photo Dan Klein received his PhD from Stanford in 2004. He is currently Assistant Professor of Computer Science at Berkeley. He achieved international notoriety with his PhD thesis on the automatic induction of Grammars for English from unlabeled English text. More recently he has achieved considerable success in the automatic learning of lexical subcategorization as well as a variety of other results in the general area of statistical computational linguistics.

Gabor Lugosi: Concentration Inequalities

In this talk by concentration inequalities we mean inequalities that bound the deviations of a function of independent random variables from its mean. Due to their generality and elegance, many of such results have served as standard tools in a variety of areas, including statistical learning theory, probabilistic combinatorics, and the geometry of Banach spaces. To illustrate some of the basic ideas, we start by showing simple ways of bounding the variance of a general function of several independent random variables. We show how to use these inequalities on a few key quantities in statistical learning theory. In the past two decades several techniques have been introduced to improve such variance inequalities to exponential tail inequalities. We focus on a particularly elegant and effective method, the so-called "entropy method", based on logarithmic Sobolev inequalities and their modifications. Similar ideas appear in a variety of areas of mathematics, including discrete and Gaussian isoperimetric problems, and estimation of mixing times of Markov chains. We intend to shed some light to some of these connections. In particular, we mention some closely related results on influences of variables of Boolean functions, phase transitions, and threshold phenomena.

Photo Gabor Lugosi is a professor at the economics department of Pompeu Fabra University, Barcelona, Spain. He is a leading expert in statistical learning theory, and has published numerous papers as well as several highly influential books on the subject.