Home Page

Papers

Submissions

News

Editorial Board

Open Source Software

Proceedings (PMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

spark-crowd: A Spark Package for Learning from Crowdsourced Big Data

Enrique G. Rodrigo, Juan A. Aledo, José A. Gámez; 20(19):1−5, 2019.

Abstract

As the data sets increase in size, the process of manually labeling data becomes unfeasible by small groups of experts. Thus, it is common to rely on crowdsourcing platforms which provide inexpensive, but noisy, labels. Although implementations of algorithms to tackle this problem exist, none of them focus on scalability, limiting the area of application to relatively small data sets. In this paper, we present spark-crowd, an Apache Spark package for learning from crowdsourced data with scalability in mind.

[abs][pdf][bib]        [code]
© JMLR 2019. (edit, beta)