<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel> 
<atom:link href="http://jmlr.csail.mit.edu/jmlr.xml" rel="self" type="application/rss+xml" />
<link>http://www.jmlr.org</link>
<title>JMLR</title>
<description></description>

<item>
<title>
Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions; Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page; 10(Oct):2374--2411, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/hellerstein09a.html
</guid>
<description>
A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both
</description>
</item>

<item>
<title>
Estimating Labels from Label Proportions; Novi Quadrianto, Alex J. Smola, Tib&#x00E9;rio S. Caetano, Quoc V. Le; 10(Oct):2349--2374, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/quadrianto09a.html
</guid>
<description>
Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, possibly with known label proportions. This problem occurs in areas like e-commerce, politics, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice.
</description>
</item>

<item>
<title>
Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors; Mathias Drton, Michael Eichler, Thomas S. Richardson; 10(Oct):2329--2348, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/drton09a.html
</guid>
<description>
In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by a recursive (or acyclic) system of linear structural equations. These linear models have a long tradition and appear in seemingly unrelated regressions, structural equation modelling, and approaches to causal inference. They are also related to Gaussian graphical models via a classical representation known as a path diagram. Despite the models' long history, a number of problems remain open. In this paper, we address the problem of computing maximum likelihood estimates in the subclass of 'bow-free' recursive linear models. The term 'bow-free' refers to
</description>
</item>

<item>
<title>
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs; Han Liu, John Lafferty, Larry Wasserman; 10(Oct):2295--2328, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/liu09a.html
</guid>
<description>
Recent methods for estimating sparse undirected graphs for real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula---or "nonparanormal"---for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of one-dimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method's theoretical properties, and show that it works well in many examples.
</description>
</item>

<item>
<title>
Learning Nondeterministic Classifiers; Juan Jos&#x00E9; del Coz, Jorge D&#x00ED;ez, Antonio Bahamonde; 10(Oct):2273--2293, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/delcoz09a.html
</guid>
<description>
Nondeterministic classifiers are defined as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classifiers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classifiers provide
</description>
</item>

<item>
<title>
The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List; Cynthia Rudin; 10(Oct):2233--2271, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/rudin09b.html
</guid>
<description>
We are interested in supervised ranking algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. In this work, we provide a general form of convex objective that gives high-scoring examples more importance. This "push" near the top of the list can be chosen arbitrarily large or small, based on the preference of the user. We choose lp-norms to provide a specific type of push; if the user sets p larger, the objective concentrates harder on the top of the list. We derive a generalization bound based on the p-norm objective, working around
</description>
</item>

<item>
<title>
Margin-based Ranking and an Equivalence between AdaBoost and RankBoost; Cynthia Rudin, Robert E. Schapire; 10(Oct):2193--2232, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/rudin09a.html
</guid>
<description>
We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to "approximate coordinate ascent boosting." Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as
</description>
</item>

<item>
<title>
Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization; Vojt&#x011B;ch Franc, S&#246;ren Sonnenburg; 10(Oct):2157--2192, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/franc09a.html
</guid>
<description>
We have developed an optimized cutting plane algorithm (OCA) for solving large-scale risk minimization problems. We prove that the number of iterations OCA requires to converge to a &#949; precise solution is approximately linear in the sample size. We also derive OCAS, an OCA-based linear binary Support Vector Machine (SVM) solver, and OCAM, a linear multi-class SVM solver.  In an extensive empirical evaluation we show that OCAS outperforms current state-of-the-art SVM solvers like SVM^light, SVM^perf and BMRM, achieving speedup factor more than 1,200 over SVM^light on some data sets and speedup factor of 29 over SVM^perf, while obtaining the same precise support vector solution.
</description>
</item>

<item>
<title>
Discriminative Learning Under Covariate Shift; Steffen Bickel, Michael Br&#252;ckner, Tobias Scheffer; 10(Sep):2137--2155, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/bickel09a.html
</guid>
<description>
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution---problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under
</description>
</item>

<item>
<title>
RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments; Brian Tanner, Adam White; 10(Sep):2133--2136, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/tanner09a.html
</guid>
<description>
RL-Glue is a standard, language-independent software package for reinforcement-learning experiments. The standardization provided by RL-Glue facilitates code sharing and collaboration. Code sharing reduces the need to re-engineer tasks and experimental apparatus, both common barriers to comparatively evaluating new ideas in the context of the literature. Our software features a minimalist interface and works with several languages and computing platforms. RL-Glue compatibility can be extended to any programming language that supports network socket communication. RL-Glue has been used to teach classes, to run international competitions, and is currently used by several other open-source software and hardware projects.
</description>
</item>

<item>
<title>
Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods; Christian Rieger, Barbara Zwicknagl; 10(Sep):2115--2132, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/rieger09a.html
</guid>
<description>
We introduce a new technique for the analysis of kernel-based regression problems. The basic tools are sampling inequalities which apply to all machine learning problems involving penalty terms induced by kernels related to Sobolev spaces. They lead to explicit deterministic results concerning the worst case behaviour of &#949;- and &#957;-SVRs. Using these, we show how to adjust regularization parameters to get best possible approximation orders for regression. The results are illustrated by some numerical examples.
</description>
</item>

<item>
<title>
An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems; Luciana Ferrer, Kemal S&#246;nmez, Elizabeth Shriberg; 10(Sep):2079--2114, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/ferrer09a.html
</guid>
<description>
We present a method for training support vector machine (SVM)-based classification systems for combination with other classification systems designed for the same task. Ideally, a new system should be designed such that, when combined with existing systems, the resulting performance is optimized. We present a simple model for this problem and use the understanding gained from this analysis to propose a method to achieve better combination performance when training SVM systems. We include a regularization term in the SVM objective function that aims to reduce the average
</description>
</item>

<item>
<title>
Evolutionary Model Type Selection for Global Surrogate Modeling; Dirk Gorissen, Tom Dhaene, Filip De Turck; 10(Sep):2039--2078, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/gorissen09a.html
</guid>
<description>
Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visualization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist
</description>
</item>

<item>
<title>
Ultrahigh Dimensional Feature Selection: Beyond The Linear Model; Jianqing Fan, Richard Samworth, Yichao Wu; 10(Sep):2013--2038, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/fan09a.html
</guid>
<description>
Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan &#38; Lv, 2008) or feature selection using a two-sample t-test in high-dimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan &#38; Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when
</description>
</item>

<item>
<title>
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection; Jie Chen, Haw-ren Fang, Yousef Saad; 10(Sep):1989--2012, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/chen09b.html
</guid>
<description>
Nearest neighbor graphs are widely used in data mining and machine learning.  A brute-force method to compute the exact kNN graph takes &#920;(dn^2) time for n data points in the d dimensional Euclidean  space.  We propose two divide and conquer methods for computing an approximate kNN graph in &#920;(dn^t) time for high dimensional data (large d).  The exponent t &#8712; (1,2) is an increasing function of an internal parameter &#945; which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained
</description>
</item>

<item>
<title>
Provably Efficient Learning with Typed Parametric Models; Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy; 10(Aug):1955--1988, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/brunskill09a.html
</guid>
<description>
To quickly achieve good performance, reinforcement-learning algorithms for acting in large continuous-valued domains must use a representation that is both sufficiently powerful to capture important domain characteristics, and yet simultaneously allows generalization, or sharing, among experiences. Our algorithm balances this tradeoff by using a stochastic, switching, parametric dynamics representation. We argue that this model characterizes a number of significant, real-world domains, such as robot navigati on across varying terrain. We prove that this representational assumption allows our algorithm to be probably approximately correct with a sample complexity that scales polynomially with all problem-specific quantities including the state-space dimension. We also explicitly incorporate
</description>
</item>

<item>
<title>
Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training; Kristian Woodsend, Jacek Gondzio; 10(Aug):1937--1953, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/woodsend09a.html
</guid>
<description>
Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from
</description>
</item>

<item>
<title>
Learning Approximate Sequential Patterns for Classification; Zeeshan Syed, Piotr Indyk, John Guttag; 10(Aug):1913--1936, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/syed09a.html
</guid>
<description>
In this paper, we present an automated approach to discover patterns that can distinguish between sequences belonging to different labeled groups. Our method searches for approximately conserved motifs that occur with varying statistical properties in positive and negative training examples. We propose a two-step process to discover such patterns. Using locality sensitive hashing (LSH), we first estimate the frequency of all subsequences and their approximate matches within a given Hamming radius in labeled examples. The discriminative ability of each pattern is then assessed from the estimated frequencies by concordance and rank sum testing. The use of LSH to identify approximate matches for each candidate pattern helps reduce the runtime of our method. Space requirements are reduced by decomposing the search problem into an iterative method that uses a single LSH table in memory. We propose two further optimizations to
</description>
</item>

<item>
<title>
Learning Acyclic Probabilistic Circuits Using Test Paths; Dana Angluin, James Aspnes, Jiang Chen, David Eisenstat, Lev Reyzin; 10(Aug):1881--1911, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/angluin09a.html
</guid>
<description>
We define a model of learning probabilistic acyclic circuits using value injection queries, in which fixed values are assigned to an arbitrary subset of the wires and the value on the single output wire is observed. We adapt the approach of using test paths from the Circuit Builder algorithm (Angluin et al., 2009) to show that there is a polynomial time algorithm that uses value injection queries to learn acyclic Boolean probabilistic circuits of constant fan-in and log depth. We establish upper and lower bounds on the attenuation factor for general and transitively reduced Boolean probabilistic circuits of test paths versus general experiments. We give computational evidence that
</description>
</item>

<item>
<title>
CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning; Roberto Esposito, Daniele P. Radicioni; 10(Aug):1851--1880, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/esposito09a.html
</guid>
<description>
The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role, by allowing the evaluation of the best (most probable) sequence of labels with a time complexity linear in the number of time events, and quadratic in the number of labels.  In this paper we propose CarpeDiem, a novel algorithm allowing the evaluation of the best possible sequence of labels with a sub-quadratic time complexity. We provide theoretical grounding together with solid empirical results supporting
</description>
</item>

<item>
<title>
Nonlinear Models Using Dirichlet Process Mixtures; Babak Shahbaba, Radford Neal; 10(Aug):1829--1850, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/shahbaba09a.html
</guid>
<description>
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on
</description>
</item>

<item>
<title>
Distributed Algorithms for Topic Models; David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling; 10(Aug):1801--1828, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/newman09a.html
</guid>
<description>
We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting. In this algorithm processors concurrently perform Gibbs sampling over local data followed by a global update of topic counts. The algorithm is simple to implement and can be viewed as an approximation to Gibbs-sampled LDA. The second version is a model that uses a hierarchical Bayesian extension of LDA to directly account for distributed data. This model has a theoretical guarantee of convergence but is
</description>
</item>

<item>
<title>
Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning; Halbert White, Karim Chalak; 10(Aug):1759--1799, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/white09a.html
</guid>
<description>
Judea Pearl's Causal Model is a rich framework that provides deep insight into the nature of causal relations. As yet, however, the Pearl Causal Model (PCM) has had a lesser impact on economics or econometrics than on other disciplines. This may be due in part to the fact that the PCM is not as well suited to analyzing structures that exhibit features of central interest to economists and econometricians: optimization, equilibrium, and learning. We offer the settable systems framework as an extension of the PCM that permits causal discourse in systems embodying optimization, equilibrium, and learning. Because these are common features of physical, natural, or social systems, our framework may prove generally useful for
</description>
</item>

<item>
<title>
Dlib-ml: A Machine Learning Toolkit; Davis E. King; 10(Jul):1755--1758, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/king09a.html
</guid>
<description>
There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy
</description>
</item>

<item>
<title>
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent; Antoine Bordes, L&#233;on Bottou, Patrick Gallinari; 10(Jul):1737--1754, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/bordes09a.html
</guid>
<description>
The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of second-order information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the "Wild Track" of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008).
</description>
</item>

<item>
<title>
Learning Permutations with Exponential Weights; David P. Helmbold, Manfred K. Warmuth; 10(Jul):1705--1736, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/helmbold09a.html
</guid>
<description>
We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes predictions using an efficient method for decomposing the weight matrix into a convex combination of permutations. The weight matrix is updated by multiplying the current matrix entries by exponential factors, and an iterative procedure is needed to restore double stochasticity. Even though the result of this procedure does not have a closed form, a new analysis approach allows us to prov
</description>
</item>

<item>
<title>
Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification; Eitan Greenshtein, Junyong Park; 10(Jul):1687--1704, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/greenshtein09a.html
</guid>
<description>
We consider the problem of classification using high dimensional features' space. In a paper by Bickel and Levina (2004), it is recommended to use naive-Bayes classifiers, that is, to treat the features as if they are statistically independent.  Consider now a sparse setup, where only a few of the features are informative for classification. Fan and Fan (2008), suggested a variable selection and classification method, called FAIR. The FAIR method improves the design of naive-Bayes classifiers in sparse setups. The improvement is due to reducing the noise in estimating the features' means. This reduction is since that only the means of a few selected variables should be estimated.  We also consider the design of naive Bayes classifiers. We show that
</description>
</item>

<item>
<title>
Transfer Learning for Reinforcement Learning Domains: A Survey; Matthew E. Taylor, Peter Stone; 10(Jul):1633--1685, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/taylor09a.html
</guid>
<description>
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework 
</description>
</item>

<item>
<title>
Marginal Likelihood Integrals for Mixtures of Independence Models; Shaowei Lin, Bernd Sturmfels, Zhiqiang Xu; 10(Jul):1611--1631, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/lin09a.html
</guid>
<description>
Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of Segre-Veronese varieties.
</description>
</item>

<item>
<title>
Learning Linear Ranking Functions for Beam Search with Application to Planning; Yuehua Xu, Alan Fern, Sungwook Yoon; 10(Jul):1571--1610, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/xu09c.html
</guid>
<description>
Beam search is commonly used to help maintain tractability in large search spaces at the expense of completeness and optimality. Here we study supervised learning of linear ranking functions for controlling beam search. The goal is to learn ranking functions that allow for beam search to perform nearly as well as unconstrained search, and hence gain computational efficiency without seriously sacrificing optimality. In this paper, we develop theoretical aspects of this learning problem and investigate the application of this framework to learning in the context of automated planning. We first study the computationa
</description>
</item>

<item>
<title>
Bayesian Network Structure Learning by Recursive Autonomy Identification; Raanan Yehezkel, Boaz Lerner; 10(Jul):1527--1570, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/yehezkel09a.html
</guid>
<description>
We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous sub-structure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires
</description>
</item>

<item>
<title>
Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks; Nikolai Slobodianik, Dmitry Zaporozhets, Neal Madras; 10(Jul):1511--1526, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/slobodianik09a.html
</guid>
<description>
In the machine learning community, the Bayesian scoring criterion is widely used for model selection problems. One of the fundamental theoretical properties justifying the usage of the Bayesian scoring criterion is its consistency. In this paper we refine this property for the case of binomial Bayesian network models. As a by-product of our derivations we establish strong consistency and obtain the law of iterated logarithm for the Bayesian scoring criterion.
</description>
</item>

<item>
<title>
Robustness and Regularization of Support Vector Machines; Huan Xu, Constantine Caramanis, Shie Mannor; 10(Jul):1485--1510, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/xu09b.html
</guid>
<description>
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new
</description>
</item>

<item>
<title>
Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks; Jean Hausser, Korbinian Strimmer; 10(Jul):1469--1484, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/hausser09a.html
</guid>
<description>
We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by
</description>
</item>

<item>
<title>
Classification with Gaussians and Convex Loss; Dao-Hong Xiang, Ding-Xuan Zhou; 10(Jul):1447--1468, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/xiang09a.html
</guid>
<description>
This paper considers binary classification algorithms generated from Tikhonov regularization schemes associated with general convex loss functions and varying Gaussian kernels. Our main goal is to provide fast convergence rates for the excess misclassification error. Allowing varying Gaussian kernels in the algorithms improves learning rates measured by regularization error and sample error. Special structures of Gaussian kernels enable us to construct, by a nice approximation scheme with a Fourier analysis technique, uniformly bounded regularizing functions achieving polynomial decays of the regularization error under a Sobolev smoothness condition. The sample error is
</description>
</item>

<item>
<title>
A Least-squares Approach to Direct Importance Estimation; Takafumi Kanamori, Shohei Hido, Masashi Sugiyama; 10(Jul):1391--1445, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/kanamori09a.html
</guid>
<description>
We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show
</description>
</item>

<item>
<title>
Model Monitor (M2): Evaluating, Comparing, and Monitoring Models; Troy Raeder, Nitesh V. Chawla; 10(Jul):1387--1390, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/raeder09a.html
</guid>
<description>
This paper presents Model Monitor (M2), a Java toolkit for robustly evaluating machine learning algorithms in the presence of changing data distributions. M2 provides a simple and intuitive framework in which users can evaluate classifiers under hypothesized shifts in distribution and therefore determine the best model (or models) for their data under a number of potential scenarios. Additionally, M2 is fully integrated with the WEKA machine learning environment, so that a variety of commodity classifiers can be used if desired.
</description>
</item>

<item>
<title>
Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination; Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola; 10(Jul):1341--1366, 2009
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/tuv09a.html
</guid>
<description>
Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach.
</description>
</item>

<item>
<title>
A Parameter-Free Classification Method for Large Scale Learning; Marc Boull&#233;; 10(Jul):1367--1385, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/boulle09a.html
</guid>
<description>
With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection.  In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models
</description>
</item>

<item>
<title>
Robust Process Discovery with Artificial Negative Events; Stijn Goedertier, David Martens, Jan Vanthienen, Bart Baesens; 10(Jun):1305--1340, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/goedertier09a.html
</guid>
<description>
Process discovery is the automated construction of structured process models from information system event logs. Such event logs often contain positive examples only. Without negative examples, it is a challenge to strike the right balance between recall and specificity, and to deal with problems such as expressiveness, noise, incomplete event logs, or the inclusion of prior knowledge. In this paper, we present a configurable technique that deals with these challenges by representing process discovery as a multi-relational classification problem on event logs supplemented with Artificially Generated Negative Events (AGNEs). This problem formulation allows
</description>
</item>

<item>
<title>
Perturbation Corrections in Approximate Inference: Mixture Modelling Applications; Ulrich Paquet, Ole Winther, Manfred Opper; 10(Jun):1263--1304, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/paquet09a.html
</guid>
<description>
Bayesian inference is intractable for many interesting models, making deterministic algorithms for approximate inference highly desirable. Unlike stochastic methods, which are exact in the limit, the accuracy of these approaches cannot be reasonably judged. In this paper we show how low order perturbation corrections to an expectation-consistent (EC) approximation can provide the necessary tools to ameliorate inference accuracy, and to give an indication of the quality of approximation without having to resort to Monte Carlo methods. Further comparisons are given with
</description>
</item>

<item>
<title>
Incorporating Functional Knowledge in Neural Networks; Charles Dugas, Yoshua Bengio, Fran&#231;ois B&#233;lisle, Claude Nadeau, Ren&#233; Garcia; 10(Jun):1239--1262, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/dugas09a.html
</guid>
<description>
Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in its two arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of Lipschitz functions with these and other properties. We apply this new class of functions to the task of modelling the price of call options. Experiments show improvements on
</description>
</item>

<item>
<title>
The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models; Ricardo Silva, Zoubin Ghahramani; 10(Jun):1187--1238, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/silva09a.html
</guid>
<description>
Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of DAGs is not closed under marginalization of hidden variables. This means that in general we cannot use a DAG to represent the independencies over a subset of variables in a larger DAG. Directed mixed graphs (DMGs) are a representation that includes DAGs as a special case, and overcomes this limitation. This paper introduces algorithms for performing Bayesian inference in Gaussian and probit DMG models. An important requirement for
</description>
</item>

<item>
<title>
Multi-task Reinforcement Learning in Partially Observable Stochastic Environments; Hui Li, Xuejun Liao, Lawrence Carin; 10(May):1131--1186, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/li09b.html
</guid>
<description>
We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent's choice of actions is directly based on this conditional distribution, without an intervening model to characterize the environment itself. We propose off-policy batch algorithms to learn the parameters of the RPRs, using episodic data collected when following a behavior policy, and show their linkage to policy iteration. We employ the Dirichlet process as a nonparametric prior over
</description>
</item>

<item>
<title>
Universal Kernel-Based Learning with Applications to Regular Languages; Leonid (Aryeh) Kontorovich, Boaz Nadler; 10(May):1095--1129, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/kontorovich09a.html
</guid>
<description>
We propose a novel framework for supervised learning of discrete concepts. Since the 1970's, the standard computational primitive has been to find the most consistent hypothesis in a given complexity class. In contrast, in this paper we propose a new basic operation: for each pair of input instances, count how many concepts of bounded complexity contain both of them.  Our approach maps instances to a Hilbert space, whose metric is induced by a universal kernel coinciding with our computational primitive, and identifies concepts with half-spaces. We prove that all concepts are linearly separable under this mapping. Hence, given a labeled sample and
</description>
</item>

<item>
<title>
An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity; Jose M. Pe&#241;a, Roland Nilsson, Johan Bj&#246;rkegren, Jesper Tegn&#233;r; 10(May):1071--1094, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/pena09a.html
</guid>
<description>
We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that
</description>
</item>

<item>
<title>
Fourier Theoretic Probabilistic Inference over Permutations; Jonathan Huang, Carlos Guestrin, Leonidas Guibas; 10(May):997--1070, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/huang09a.html
</guid>
<description>
Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the "low-frequency" terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for
</description>
</item>

<item>
<title>
On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality; Wenxin Jiang; 10(Apr):977--996, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/jiang09a.html
</guid>
<description>
The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing. We will consider two examples with high dimensional predictors: autoregression (AR) with l1-loss, and ARX model with variable selection for sign classification, which uses both lagged responses and exogenous predictors.
</description>
</item>

<item>
<title>
Nonextensive Information Theoretic Kernels on Measures; Andr&#233; F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, M&#225;rio A. T. Figueiredo; 10(Apr):935--975, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/martins09a.html
</guid>
<description>
Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and the Jensen-Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon's information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type divergences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon's entropy. The notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we introduce 
</description>
</item>

<item>
<title>
Java-ML: A Machine Learning Library; Thomas Abeel, Yves Van de Peer, Yvan Saeys; 10(Apr):931--934, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/abeel09a.html
</guid>
<description>
Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license.
</description>
</item>

<item>
<title>
Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods; Holger H&#246;fling, Robert Tibshirani; 10(Apr):883--906, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/hoefling09a.html
</guid>
<description>
We consider the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact algorithm starts with the pseudo-likelihood solution and then adjusts the pseudo-likelihood criterion so that each additional iterations moves it closer to the exact solution. Our results show that this procedure is faster than the competing exact method proposed by Lee, Ganapathi, and Koller (2006a). However, we also find that
</description>
</item>

<item>
<title>
Stable and Efficient Gaussian Process Calculations; Leslie Foster, Alex Waagen, Nabeela Aijaz, Michael Hurley, Apolonio Luis, Joel Rinsky, Chandrika Satyavolu, Michael J. Way, Paul Gazis, Ashok Srivastava; 10(Apr):857--882, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/foster09a.html
</guid>
<description>
The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.
</description>
</item>

<item>
<title>
Consistency and Localizability; Alon Zakai, Ya'acov Ritov; 10(Apr):827--856, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/zakai09a.html
</guid>
<description>
We show that all consistent learning methods---that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y)---are necessarily localizable, by which we mean that they do not significantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be defined in a non-local manner, such as support vector machines in classification and least-squares estimators in regression. Aside from showing that consistency implies a specific form of localizability, we also show that
</description>
</item>

<item>
<title>
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization; Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert; 10(Mar):803--826, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/abernethy09a.html
</guid>
<description>
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of "users" to a set of possibly desired "objects". In particular, several recent low-rank type matrix-completion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularization-based CF, our approach can be used to incorporate additional information such as attributes of the users/objects---a feature currently lacking in existing regularization-based CF approaches---using popular and well-known kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning
</description>
</item>

<item>
<title>
Sparse Online Learning via Truncated Gradient; John Langford, Lihong Li, Tong Zhang; 10(Mar):777--801, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/langford09a.html
</guid>
<description>
We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss functions. This method has several essential properties: (1) The degree of sparsity is continuous---a parameter controls the rate of sparsification from no sparsification to total sparsification. (2) The approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. (3) The approach works well empirically. We apply the approach to several data sets and find for data sets with large numbers of features, substantial sparsity is discoverable.
</description>
</item>

<item>
<title>
Similarity-based Classification: Concepts and Algorithms; Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, Luca Cazzanti; 10(Mar):747--776, 2009.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/chen09a.html
</guid>
<description>
This paper reviews and extends the field of similarity-based classification, presenting new analyses, algorithms, data sets, and a comprehensive set of experimental results for a rich collection of classification problems. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning.
</description>
</item>

<item>
<title>
Nieme: Large-Scale Energy-Based Models; Francis Maes; 10(Mar):743--746, 2009. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v10/maes09a.html
</guid>
<description>
In this paper we introduce NIEME, a machine learning library for large-scale classification, regression and ranking. NIEME, relies on the framework of energy-based models (LeCun et al., 2006) which unifies several learning algorithms ranging from simple perceptrons to recent models such as the pegasos support vector machine or l1-regularized maximum entropy models. This framework also unifies batch and stochastic learning which are both seen as energy minimization problems. NIEME, can hence be used in a wide range of 
</description>
</item>

</channel>
</rss>
