An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes

Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun- Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, Shou-de Lin
Proceedings of KDD-Cup 2009 Competition, PMLR 7:57-64, 2009.

Abstract

This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.

Cite this Paper


BibTeX
@InProceedings{pmlr-v7-lo09, title = {An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes}, author = {Lo, Hung-Yi and Chang, Kai-Wei and Chen, Shang-Tse and Chiang, Tsung-Hsien and Ferng, Chun- Sung and Hsieh, Cho-Jui and Ko, Yi-Kuang and Kuo, Tsung-Ting and Lai, Hung-Che and Lin, Ken-Yi and Wang, Chia-Hsuan and Yu, Hsiang-Fu and Lin, Chih-Jen and Lin, Hsuan-Tien and Lin, Shou-de}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {57--64}, year = {2009}, editor = {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/lo09/lo09.pdf}, url = {https://proceedings.mlr.press/v7/lo09.html}, abstract = {This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.} }
Endnote
%0 Conference Paper %T An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes %A Hung-Yi Lo %A Kai-Wei Chang %A Shang-Tse Chen %A Tsung-Hsien Chiang %A Chun- Sung Ferng %A Cho-Jui Hsieh %A Yi-Kuang Ko %A Tsung-Ting Kuo %A Hung-Che Lai %A Ken-Yi Lin %A Chia-Hsuan Wang %A Hsiang-Fu Yu %A Chih-Jen Lin %A Hsuan-Tien Lin %A Shou-de Lin %B Proceedings of KDD-Cup 2009 Competition %C Proceedings of Machine Learning Research %D 2009 %E Gideon Dror %E Mar Boullé %E Isabelle Guyon %E Vincent Lemaire %E David Vogel %F pmlr-v7-lo09 %I PMLR %P 57--64 %U https://proceedings.mlr.press/v7/lo09.html %V 7 %X This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.
RIS
TY - CPAPER TI - An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes AU - Hung-Yi Lo AU - Kai-Wei Chang AU - Shang-Tse Chen AU - Tsung-Hsien Chiang AU - Chun- Sung Ferng AU - Cho-Jui Hsieh AU - Yi-Kuang Ko AU - Tsung-Ting Kuo AU - Hung-Che Lai AU - Ken-Yi Lin AU - Chia-Hsuan Wang AU - Hsiang-Fu Yu AU - Chih-Jen Lin AU - Hsuan-Tien Lin AU - Shou-de Lin BT - Proceedings of KDD-Cup 2009 Competition DA - 2009/12/04 ED - Gideon Dror ED - Mar Boullé ED - Isabelle Guyon ED - Vincent Lemaire ED - David Vogel ID - pmlr-v7-lo09 PB - PMLR DP - Proceedings of Machine Learning Research VL - 7 SP - 57 EP - 64 L1 - http://proceedings.mlr.press/v7/lo09/lo09.pdf UR - https://proceedings.mlr.press/v7/lo09.html AB - This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009. ER -
APA
Lo, H., Chang, K., Chen, S., Chiang, T., Ferng, C.S., Hsieh, C., Ko, Y., Kuo, T., Lai, H., Lin, K., Wang, C., Yu, H., Lin, C., Lin, H. & Lin, S.. (2009). An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:57-64 Available from https://proceedings.mlr.press/v7/lo09.html.

Related Material