Evaluating human versus machine learning performance in classifying research abstracts

Khiam Aik Khor, Giovanni Ko, Walter Theseira, Xin Qing Cai, Yeow Chong Goh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.

Original languageEnglish
Title of host publication17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
EditorsGiuseppe Catalano, Cinzia Daraio, Martina Gregori, Henk F. Moed, Giancarlo Ruocco
PublisherInternational Society for Scientometrics and Informetrics
Pages2157-2162
Number of pages6
ISBN (Electronic)9788833811185
Publication statusPublished - 2019
Externally publishedYes
Event17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Rome, Italy
Duration: Sept 2 2019Sept 5 2019

Publication series

Name17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
Volume2

Conference

Conference17th International Conference on Scientometrics and Informetrics, ISSI 2019
Country/TerritoryItaly
CityRome
Period9/2/199/5/19

Bibliographical note

Publisher Copyright:
© 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved.

ASJC Scopus Subject Areas

  • Statistics and Probability
  • Computer Science Applications
  • Management Science and Operations Research
  • Applied Mathematics
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Evaluating human versus machine learning performance in classifying research abstracts'. Together they form a unique fingerprint.

Cite this