簡易檢索 / 詳目顯示

研究生: 李唯
Lee, Wei
論文名稱: 無作答指定限制的序列性群眾外包預算分配策略
Strategies of Sequential Budget Allocation without Worker Designation in Crowdsourcing
指導教授: 莊坤達
Chuang, Kun-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 35
中文關鍵詞: 群眾外包任務指派預算分配
外文關鍵詞: Crowdsourcing, Task Assignment, Budget Allocation
相關次數: 點閱:138下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著群眾外包系統的出現,要收集人類標記數據變得更加的容易和快速。由於群眾外包平台的匿名性,群眾工人的素質難以被保證。群眾之間的差異和表現不佳的標記者可能造成錯誤的標記結果。傳統的做法是對每一個任務都搜集多個不同的群眾工人的標記結果。真實推斷的技術在處理這樣的雜亂人類標記資料扮演著至關重要的角色。但是,要取得更高質量的培訓數據集通常代表更多的數量標籤和預算。此外,目前的群眾外包平台並不支援群眾工人的指定,這使得要透過區分品質良好的群眾工人和不好的群眾工人來節省成本更具挑戰性。我們提出了一個框架,可以利用群眾工人群組,並根據群組和任務之間的適應性分配任務。我們預計我們的框架能作為解決群眾外報平台預算分配問題的實用策略。

    With the emergence of crowdsourcing systems, the human labeled data are collected easier, faster and more efficient. Due to the anonymous nature of crowdsourcing platform, the quality of the crowd workers is difficult to guarantee. The human variance and the noisy annotators may lead to the incorrect result. A conventional approach is to consult different workers via collect repeated labels. Moreover, truth inference technique plays a vital role in tackling the human noise in thus collected data. However, higher quality training dataset usually comes with the more quantity labels as well as budget. Besides, worker designation is not supported in current crowdsourcing platforms which makes it more challenging to save money by distinguishing good workers from noisy workers. In this thesis, we propose a framework that can leverage the worker qualification group selection and assign tasks based on the fitness between the qualification groups and the tasks. We anticipate our framework to be a practical strategy for solving the budget allocation problem on crowdsourcing platforms.

    中文摘要 i Abstract ii Acknowledgment iii Contents iv List of Tables vi List of Figures vii 1 Introduction 1 2 Related Work 5 2.1 Task Assignment 5 2.1.1 Worker-Based Task Assignment 5 2.1.2 Task-Based Task Assignment 6 2.2 Truth Inference 6 2.3 Budget Allocation 7 3 Problem Formulation 9 3.1 Framework Overview 9 3.2 Problem Definition 9 4 Methodology 13 4.1 Uniform Assignment 13 4.2 Greedy Algorithm based on Confidence Gain (Greedy-CG) 13 4.3 Extensions 16 4.3.1 Working with Prior Worker Group Knowledge 16 4.3.2 Greedy Algorithm based on Confidence Gain with Fitness Decay (Greedy- CGFD) 17 5 Experimental Results 19 5.1 Experiment Setup and Data Description 19 5.2 Compared Method 20 5.3 Comparison of Difference Scenario 22 5.3.1 Traditional Inference Methods with All Collected Labels 22 5.3.2 Experiments on Literature Comparison 22 6 Conclusions 30 Bibliography 31

    [1] X. Chen, Q. Lin, and D. Zhou, “Optimistic knowledge gradient policy for optimal bud- get allocation in crowdsourcing,” in Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, 2013, pp. 64–72.
    [2] Q. Li, F. Ma, J. Gao, L. Su, and C. J. Quinn, “Crowdsourcing high quality labels with a tight budget,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22-25, 2016, 2016, pp. 237–246.
    [3] Amazon, “Amazon mechanical turk (AMT).” https://www.mturk.com/.
    [4] “Upwork,” Available: https://www.upwork.com/, [Online; accessed 2017].
    [5] “Crowdflower,” Available: https://www.crowdflower.com/, 2017.
    [6] G. Li, J. Wang, Y. Zheng, and M. J. Franklin, “Crowdsourced data management: A survey,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 9, pp. 2296–2319, 2016.
    [7] G. Demartini, D. E. Difallah, and P. Cudr ́e-Mauroux, “Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking,” in Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012, 2012, pp. 469–478.
    [8] B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas, “Crowdsourcing for multiple-choice question answering,” in Proceedings of the Twenty-Eighth AAAI Confer- ence on Artificial Intelligence, July 27 -31, 2014, Qu ́ebec City, Qu ́ebec, Canada., 2014, pp. 2946–2953.
    [9] A. P. Dawid and A. M. Skene, “Maximum likelihood estimation of observer error-rates using the em algorithm,” Journal of the Royal Statistical Society. Series C (Applied Statis- tics), pp. 20–28, 1979.
    [10] V. C. Raykar, S. Yu, L. H. Zhao, A. K. Jerebko, C. Florin, G. H. Valadez, L. Bogoni, and L. Moy, “Supervised learning from multiple experts: whom to trust when everyone lies a bit,” in Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, 2009, pp. 889–896.
    [11] Q. Liu, J. Peng, and A. T. Ihler, “Variational inference for crowdsourcing,” in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Infor- mation Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., 2012, pp. 701–709.
    [12] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy, “Learning from crowds,” Journal of Machine Learning Research, vol. 11, pp. 1297–1322, 2010.
    [13] P. Welinder, S. Branson, S. J. Belongie, and P. Perona, “The multidimensional wisdom of crowds,” in Advances in Neural Information Processing Systems 23: 24th Annual Confer- ence on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada., 2010, pp. 2424–2432.
    [14] B. Valeri, S. Elbassuoni, and S. Amer-Yahia, “Acquiring reliable ratings from the crowd,” in Proceedings of the Third AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2015, November 8-11, 2015, San Diego, California, USA., 2015, pp. 40–41.
    [15] P. Mavridis, D. Gross-Amblard, and Z. Miklo ́s, “Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing,” in Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016, 2016, pp. 843–853.
    [16] Z. Zhao, D. Yan, W. Ng, and S. Gao, “A transfer learning based framework of crowd- selection on twitter,” in The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, 2013, pp. 1514–1517.
    [17] Z. Zhao, F. Wei, M. Zhou, W. Chen, and W. Ng, “Crowd-selection query processing in crowdsourcing databases: A task-driven approach,” in Proceedings of the 18th Interna- tional Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015., 2015, pp. 397–408.
    [18] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan, “Whose vote should count more: Optimal integration of labels from labelers of unknown expertise,” in Ad- vances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada., 2009, pp. 2035–2043.
    [19] X. Zhang, G. Li, and J. Feng, “Crowdsourced top-k algorithms: An experimental evalua- tion,” PVLDB, vol. 9, no. 8, pp. 612–623, 2016.
    [20] W. Wang, X. Guo, S. Li, Y. Jiang, and Z. Zhou, “Obtaining high-quality label by distin- guishing between easy and hard items in crowdsourcing,” in Proceedings of the Twenty- Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, 2017, pp. 2964–2970.
    [21] J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng, “icrowd: An adaptive crowdsourcing frame- work,” in Proceedings of the 2015 ACM SIGMOD International Conference on Manage- ment of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015, pp. 1015–1030.
    [22] C. C. Cao, J. She, Y. Tong, and L. Chen, “Whom to ask? jury selection for decision making tasks on micro-blog services,” PVLDB, vol. 5, no. 11, pp. 1495–1506, 2012.
    [23] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng, “Cheap and fast - but is it good? evalu- ating non-expert annotations for natural language tasks,” in 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Proceedings of the Conference, 25-27 October 2008, Honolulu, Hawaii, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, 2008, pp. 254–263.
    [24] L. I. Kuncheva, C. J. Whitaker, C. A. Shipp, and R. P. W. Duin, “Limits on the majority vote accuracy in classifier fusion,” Pattern Anal. Appl., vol. 6, no. 1, pp. 22–31, 2003.
    [25] S. Geulen, B. V ̈ocking, and M. Winkler, “Regret minimization for online buffering problems using the weighted majority algorithm,” in COLT 2010 - The 23rd Conference on Learning Theory, Haifa, Israel, June 27-29, 2010, 2010, pp. 132–143.
    [26] Y. Liu and M. Liu, “An online learning approach to improving the quality of crowd- sourcing,” IEEE/ACM Trans. Netw., vol. 25, no. 4, pp. 2166–2179, 2017.
    [27] C. Ho, S. Jabbari, and J. W. Vaughan, “Adaptive task assignment for crowdsourced classi- fication,” in Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, 2013, pp. 534–542.
    [28] H. Kim and Z. Ghahramani, “Bayesian classifier combination,” in Proceedings of the Fif- teenth International Conference on Artificial Intelligence and Statistics, AISTATS 2012, La Palma, Canary Islands, Spain, April 21-23, 2012, 2012, pp. 619–627.
    [29] X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang, “CDAS: A crowdsourcing data analytics system,” PVLDB, vol. 5, no. 10, pp. 1040–1051, 2012.
    [30] Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng, “QASCA: A quality-aware task assign- ment system for crowdsourcing applications,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015, pp. 1031–1046.
    [31] M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi, “Community-based bayesian aggregation models for crowdsourcing,” in 23rd International World Wide Web Confer- ence, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014, 2014, pp. 155–164.
    [32] Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng, “Truth inference in crowdsourcing: Is the problem solved?” PVLDB, vol. 10, no. 5, pp. 541–552, 2017.
    [33] D. Zhou, J. C. Platt, S. Basu, and Y. Mao, “Learning from the wisdom of crowds by minimax entropy,” in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., 2012, pp. 2204–2212.
    [34] C. Ho and J. W. Vaughan, “Online task assignment in crowdsourcing markets,” in Pro- ceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada., 2012.
    [35] D. R. Karger, S. Oh, and D. Shah, “Budget-optimal task allocation for reliable crowd- sourcing systems,” Operations Research, vol. 62, no. 1, pp. 1–24, 2014.
    [36] M. Fang, J. Yin, and D. Tao, “Active learning for crowdsourcing using knowledge transfer,” in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Qu ́ebec City, Qu ́ebec, Canada., 2014, pp. 1809–1815.
    [37] V. C. Raykar and P. Agrawal, “Sequential crowdsourced labeling as an epsilon-greedy exploration in a markov decision process,” in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014, 2014, pp. 832–840.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE