Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism

Si-Chi Chin, W. Nick Street
Proceedings of ICML Workshop on Unsupervised and Transfer Learning, PMLR 27:133-144, 2012.

Abstract

The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v27-chin12a, title = {Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism}, author = {Chin, Si-Chi and Street, W. Nick}, booktitle = {Proceedings of ICML Workshop on Unsupervised and Transfer Learning}, pages = {133--144}, year = {2012}, editor = {Guyon, Isabelle and Dror, Gideon and Lemaire, Vincent and Taylor, Graham and Silver, Daniel}, volume = {27}, series = {Proceedings of Machine Learning Research}, address = {Bellevue, Washington, USA}, month = {02 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v27/chin12a/chin12a.pdf}, url = {https://proceedings.mlr.press/v27/chin12a.html}, abstract = {The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.} }
Endnote
%0 Conference Paper %T Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism %A Si-Chi Chin %A W. Nick Street %B Proceedings of ICML Workshop on Unsupervised and Transfer Learning %C Proceedings of Machine Learning Research %D 2012 %E Isabelle Guyon %E Gideon Dror %E Vincent Lemaire %E Graham Taylor %E Daniel Silver %F pmlr-v27-chin12a %I PMLR %P 133--144 %U https://proceedings.mlr.press/v27/chin12a.html %V 27 %X The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.
RIS
TY - CPAPER TI - Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism AU - Si-Chi Chin AU - W. Nick Street BT - Proceedings of ICML Workshop on Unsupervised and Transfer Learning DA - 2012/06/27 ED - Isabelle Guyon ED - Gideon Dror ED - Vincent Lemaire ED - Graham Taylor ED - Daniel Silver ID - pmlr-v27-chin12a PB - PMLR DP - Proceedings of Machine Learning Research VL - 27 SP - 133 EP - 144 L1 - http://proceedings.mlr.press/v27/chin12a/chin12a.pdf UR - https://proceedings.mlr.press/v27/chin12a.html AB - The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models. ER -
APA
Chin, S. & Street, W.N.. (2012). Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism. Proceedings of ICML Workshop on Unsupervised and Transfer Learning, in Proceedings of Machine Learning Research 27:133-144 Available from https://proceedings.mlr.press/v27/chin12a.html.

Related Material