Home Page

Papers

Submissions

News

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Statistics

Login



RSS Feed

Treelets | A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Ann B. Lee, Boaz Nadler; JMLR W&P 2:259-266, 2007.

Abstract

In many modern data mining applications, such as analysis of gene expression or worddocument data sets, the data is highdimensional with hundreds or even thousands of variables, unstructured with no specific order of the original variables, and noisy. Despite the high dimensionality, the data is typically redundant with underlying structures that can be represented by only a few features. In such settings and specifically when the number of variables is much larger than the sample size, standard global methods may not perform well for common learning tasks such as classification, regression and clustering. In this paper, we present treelets -- a new tool for multi-resolution analysis that extends wavelets on smooth signals to general unstructured data sets. By construction, treelets provide an orthogonal basis that reflects the internal structure of the data. In addition, treelets can be useful for feature selection and dimensionality reduction prior to learning. We give a theoretical analysis of our algorithm for a linear mixture model, and present a variety of situations where treelets outperform classical principal component analysis, as well as variable selection schemes such as supervised (sparse) PCA.



Home Page

Papers

Submissions

News

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Statistics

Login



RSS Feed

Page last modified on Sat Oct 27 18:32:47 BST 2007.

webmasterjmlr.org Copyright © JMLR 2000. All rights reserved.