<?xml version="1.0"?>
<rss version="2.0">
<channel> 
<link>http://www.jmlr.org</link>
<title>JMLR</title>
<description></description>

<item>
<title>
Bayesian Inference and Optimal Design for the Sparse Linear Model; Matthias W. Seeger; 9(Apr):759--813, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/seeger08a.html
</guid>
<description>
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how
</description>
</item>

<item>
<title>
Multi-class Discriminant Kernel Learning via Convex Programming; Jieping Ye, Shuiwang Ji, Jianhui Chen; 9(Apr):719--758, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/ye08b.html
</guid>
<description>
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming 
</description>
</item>

<item>
<title>
Learning Control Knowledge for Forward Search Planning; Sungwook Yoon, Alan Fern, Robert Givan; 9(Apr):683--718, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/yoon08a.html
</guid>
<description>
A number of today's state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to find domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-specific knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that
</description>
</item>

<item>
<title>
Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns; Shann-Ching Chen, Geoffrey J. Gordon, Robert F. Murphy; 9(Apr):651--682, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/chen08a.html
</guid>
<description>
In structured classification problems, there is a direct conflict between expressive models and efficient inference: while graphical models such as Markov random fields or factor graphs can represent arbitrary dependences among instance labels, the cost of inference via belief propagation in these models grows rapidly as the graph structure becomes more complicated. One important source of complexity in belief propagation is the need to marginalize large factors to compute messages. This operation takes time exponential in the number of variables in the factor, and can limit the expressiveness of the models we can use. In this paper, we study a new class of potential functions, which we call decomposable
</description>
</item>

<item>
<title>
Trust Region Newton Method for Logistic Regression; Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi; 9(Apr):627--650, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/lin08b.html
</guid>
<description>
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also extend the proposed method to large-scale L2-loss linear support vector machines (SVM).
</description>
</item>

<item>
<title>
A Library for Locally Weighted Projection Regression; Stefan Klanke, Sethu Vijayakumar, Stefan Schaal; 9(Apr):623--626, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/klanke08a.html
</guid>
<description>
In this paper we introduce an improved implementation of locally weighted projection regression (LWPR), a supervised learning algorithm that is capable of handling high-dimensional input data. As the key features, our code supports multi-threading, is available for multiple platforms, and provides wrappers for several programming languages.
</description>
</item>

<item>
<title>
Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2; Giorgio Corani, Marco Zaffalon; 9(Apr):581--621, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/corani08a.html
</guid>
<description>
In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, by issuing set-valued classifications, NCC2 is able to isolate and properly deal with instances that are hard to classify (on which naive Bayes accuracy drops considerably), and to perform as well as naive Bayes on the other instances. The experiments point 
</description>
</item>

<item>
<title>
Closed Sets for Labeled Data; Gemma C. Garriga, Petra Kralj, Nada Lavra&#269; 9(Apr):559--580, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/garriga08a.html
</guid>
<description>
Closed sets have been proven successful in the context of compacted data representation for association rule learning. However, their use is mainly descriptive, dealing only with unlabeled data. This paper shows that when considering labeled data, closed sets can be adapted for classification and discrimination purposes by conveniently contrasting covering properties on positive and negative examples. We formally prove that these sets characterize the space of relevant combinations of features for discriminating the target class. In practice, identifying 
</description>
</item>

<item>
<title>
An Information Criterion for Variable Selection in Support Vector Machines (Special Topic on Model Selection); Gerda Claeskens, Christophe Croux, Johan Van Kerckhoven; 9(Mar):541--558, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/claeskens08a.html
</guid>
<description>
Support vector machines for classification have the advantage that the curse of dimensionality is circumvented. It has been shown that a reduction of the dimension of the input space leads to even better results. For this purpose, we propose two information criteria which can be computed directly from the definition of the support vector machine. We assess the predictive performance of the models selected by our new criteria and compare them to existing variable selection techniques in a simulation study. The simulation results show that the new criteria are competitive in terms of generalization error rate while being much easier to compute. We arrive at the same findings for comparison on some real-world benchmark data sets.
</description>
</item>

<item>
<title>
Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers; Bo Jiang, Xuegong Zhang, Tianxi Cai; 9(Mar):521--540, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/jiang08a.html
</guid>
<description>
Support vector machine (SVM) is one of the most popular and promising classification algorithms. After a classification rule is constructed via the SVM, it is essential to evaluate its prediction accuracy. In this paper, we develop procedures for obtaining both point and interval estimators for the prediction error. Under mild regularity conditions, we derive the consistency and asymptotic normality of the prediction error estimators for SVM with finite-dimensional kernels. A perturbation-resampling procedure is proposed to obtain interval estimates for the prediction error in practice. With numerical studies on simulated data and a benchmark
</description>
</item>

<item>
<title>
Comments on the Complete Characterization of a Family of Solutions to a Generalized Fisher Criterion; Jieping Ye; 9(Mar):517--519, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/ye08a.html
</guid>
<description>
Loog (2007) provided a complete characterization of the family of solutions to a generalized Fisher criterion. We show that this characterization is essentially equivalent to the original characterization proposed in Ye (2005). The computational advantage of the original characterization over the new one is discussed, which justifies its practical use.
</description>
</item>

<item>
<title>
Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data; Onureena Banerjee, Laurent El Ghaoui, Alexandre d'Aspremont; 9(Mar):485--516, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/banerjee08a.html
</guid>
<description>
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added l1-norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive l1-norm penalized regression. Our second algorithm, based on Nesterov's first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation 
</description>
</item>

<item>
<title>
A Recursive Method for Structural Learning of Directed Acyclic Graphs; Xianchao Xie, Zhi Geng; 9(Mar):459--483, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/geng08a.html
</guid>
<description>
In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method.
</description>
</item>

<item>
<title>
Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective; Liviu Panait, Karl Tuyls, Sean Luke; 9(Mar):423--457, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/panait08a.html
</guid>
<description>
This paper presents the dynamics of multiple learning agents from an evolutionary game theoretic perspective. We provide replicator dynamics models for cooperative coevolutionary algorithms and for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mismatched teammate actions that resulted in low rewards. We use these extended formal models to study the convergence guarantees for these algorithms, and also to visualize the basins of attraction to optimal and suboptimal solutions in two benchmark coordination problems. The paper demonstrates that lenience provides learners 
</description>
</item>

<item>
<title>
A Tutorial on Conformal Prediction; Glenn Shafer, Vladimir Vovk; 9(Mar):371--421, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/shafer08a.html
</guid>
<description>
Conformal prediction uses past experience to determine precise levels of confidence in new predictions.  Given an error probability &#949;, together with a method that makes a prediction &#375; of a label y, it produces a set of labels, typically containing &#375;, that also contains y with probability 1-&#949;. Conformal prediction can be applied to any method for producing &#375;: a nearest-neighbor method, a support-vector machine, ridge regression, etc. Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted.  The most novel and valuable feature
</description>
</item>

<item>
<title>
Generalization from Observed to Unobserved Features by Clustering; Eyal Krupka, Naftali Tishby; 9(Mar):339--370, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/krupka08a.html
</guid>
<description>
We argue that when objects are characterized by many attributes, clustering them on the basis of a random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. We use our framework
</description>
</item>

<item>
<title>
Algorithms for Sparse Linear Classifiers in the Massive Data Setting; Suhrid Balakrishnan, David Madigan; 9(Feb):313--337, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/balakrishnan08a.html
</guid>
<description>
Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.
</description>
</item>

<item>
<title>
Support Vector Machinery for Infinite Ensemble Learning; Hsuan-Tien Lin, Ling Li; 9(Feb):285--312, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/lin08a.html
</guid>
<description>
Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that 
</description>
</item>

<item>
<title>
Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies; Andreas Krause, Ajit Singh, Carlos Guestrin; 9(Feb):235--284, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/krause08a.html
</guid>
<description>
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A-, D-, or E-optimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NP-complete. To address this issue, we describe a polynomial-time approximation that is within (1-1/e) of the optimum by exploiting the submodularity of mutual information. We also show how 
</description>
</item>

<item>
<title>
Optimization Techniques for Semi-Supervised Support Vector Machines; Olivier Chapelle, Vikas Sindhwan, Sathiya S. Keerthi; 9(Feb):203--233, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/chapelle08a.html
</guid>
<description>
Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S3VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting.
</description>
</item>

<item>
<title>
Evidence Contrary to the Statistical View of Boosting; David Mease, Abraham Wyner; 9(Feb):131--156, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/mease08a.html
</guid>
<description>
The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.
</description>
</item>

<item>
<title>
Active Learning by Spherical Subdivision; Falk-Florian Henrich, Klaus Obermayer; 9(Jan):105--130, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/henrich08a.html
</guid>
<description>
We introduce a computationally feasible, "constructive" active learning method for binary classification. The learning algorithm is initially formulated for separable classification problems, for a hyperspherical data space with constant data density, and for great spheres as classifiers. In order to reduce computational complexity the version space is restricted to spherical simplices and learning procedes by subdividing the edges of maximal length. We show that this procedure optimally reduces a tight upper bound on the generalization error. The method is then extended to other separable classification problems using products of spheres as data spaces and isometries induced by charts of the sphere. An upper bound is provided
</description>
</item>

<item>
<title>
Discriminative Learning of Max-Sum Classifiers; Vojt&#283;ch Franc, Bogdan Savchynskyy; (Jan):67--104, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/franc08a.html
</guid>
<description>
The max-sum classifier predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions defined over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classifier. Learning parameters of the max-sum classifier is a challenging problem because even computing the response of such classifier is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classifiers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classifiers, have been extended for learning some subclasses of the max-sum classifier. Besides the max-sum classifiers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulfill some additional constraints. In this article, we extend
account
</description>
</item>

<item>
<title>
On the Suitable Domain for SVM Training in Image Coding; Gustavo Camps-Valls, Juan Guti&#233;rrez, Gabriel G&#243;mez-P&#233;rez, Jes&#250;s Malo; (Jan):49--66, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/camps-valls08a.html
</guid>
<description>
Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation.  Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the &#949;-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into
account
</description>
</item>

<item>
<title>
Linear-Time Computation of Similarity Measures for Sequential Data; Konrad Rieck, Pavel Laskov; 9(Jan):23--48, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/rieck08a.html
</guid>
<description>
Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures.
</description>
</item>


<item>
<title>
Max-margin Classification of Data with Absent Features; Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller; 9(Jan):1--21, 2008. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/chechik08a.html
</guid>
<description>
We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, 
</description>
</item>

<item>
<title>
Volume 8 completed; Volume 9 begun.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v9/
</guid>
<description>
</description>
</item>

<item>
<title>
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts; J. Zico Kolter, Marcus A. Maloof; 8(Dec):2755--2790, 2007. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v8/kolter07a.html
</guid>
<description>
We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation---consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like---we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts.
</description>
</item>

<item>
<title>
A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning; Carine Hue, Marc Boull&#233;; 8(Dec):2727--2754, 2007.
</title>
<guid isPermaLink="true">
http://www.jmlr.org/papers/v8/hue07a.html
</guid>
<description>
In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a model selection problem. For that, we define a 2D partitioning family obtained by discretizing numerical variables and grouping categorical ones and we derive an analytical criterion to select the partition with the highest posterior probability. We show how these partitions can be used to build univariate predictors and multivariate ones under a naive Bayes assumption.
</description>
</item>

<item>
<title>
Stagewise Lasso; Peng Zhao, Bin Yu; 8(Dec):2701--2726, 2007.
</title>
<guid isPermaLink="true">
http://www.jmlr.org/papers/v8/zhao07a.html
</guid>
<description>
Many statistical machine learning algorithms minimize either an empirical loss function as in AdaBoost, or a penalized empirical loss as in Lasso or SVM. A single regularization tuning parameter controls the trade-off between fidelity to the data and generalizability, or equivalently between bias and variance. When this tuning parameter changes, a regularization "path" of solutions to the minimization problem is generated, and the whole path is needed to select a tuning parameter to optimize the prediction or interpretation performance. Algorithms such as homotopy-Lasso or LARS-Lasso and Forward Stagewise Fitting (FSF) (aka e-Boosting) are of great interest because of their resulted sparse models for interpretation in addition to prediction.  In this paper, we propose the BLasso algorithm that ties the FSF (e-Boosting) algorithm with the Lasso method that minimizes
</description>
</item>

<item>
<title>
Ranking the Best Instances; St&#233;phan Cl&#233;men&#231;on, Nicolas Vayatis; 8(Dec):2671--2699, 2007.
</title>
<guid isPermaLink="true">
http://www.jmlr.org/papers/v8/clemencon07a.html
</guid>
<description>
We formulate a local form of the bipartite ranking problem where the goal is to focus on the best instances. We propose a methodology based on the construction of real-valued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scores. We first state the problem of finding the best instances which can be cast as a classification problem with mass constraint. Next, we develop special performance measures for the local ranking problem which extend the Area Under an ROC Curve (AUC) criterion and describe the optimal elements of these new criteria. We also highlight 
</description>
</item>

<item>
<title>
Hierarchical Average Reward Reinforcement Learning; Mohammad Ghavamzadeh, Sridhar Mahadevan; 8(Nov):2629--2669, 2007. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v8/ghavamzadeh07a.html
</guid>
<description>
Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior work has been largely limited to flat policy representations.  In this paper, we develop a framework for HRL based on the average reward optimality criterion. We investigate two formulations of HRL
</description>
</item>

<item>
<title>
Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners; Marlon N&#250;&#241;ez, Ra&#250;l Fidalgo, Rafael Morales; 8(Nov):2595--2628, 2007.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v8/nunez07a.html
</guid>
<description>
In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the field of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts
</description>
</item>

<item>
<title>
VC Theory of Large Margin Multi-Category Classifiers; Yann Guermeur; 8(Nov):2551--2594, 2007. 
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v8/guermeur07a.html
</guid>
<description>
In the context of discriminant analysis, Vapnik's statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binary-valued functions, the computation of dichotomies with real-valued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself... In this paper, a VC theory of large margin multi-category classifiers is introduced. Central in this theory are generalized VC dimensions called the ...
</description>
</item>

<item>
<title>
Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data; Zakria Hussain, Fran&#231;ois Laviolette, Mario Marchand, John Shawe-Taylor, Spencer Charles Brubaker, Matthew D. Mullin; 8(Nov):2533--2549, 2007.
</title>
<guid isPermaLink="true">
http://jmlr.csail.mit.edu/papers/v8/hussain07a.html
</guid>
<description>
Marchand and Shawe-Taylor (2002) have proposed a loss bound for the set covering machine that has the property to depend on the observed fraction of positive examples and on what the classifier achieves on the positive training examples. We show that this loss bound is incorrect. We then propose a loss bound, valid for any sample-compression learning algorithm (including the set covering machine), that depends on the observed fraction of positive examples and on what the classifier achieves on them. We also compare numerically the loss bound proposed in this paper with the incorrect bound
</description>
</item>

</channel>
</rss>
