Evaluating human versus machine learning performance in classifying research abstracts

Khiam Aik Khor; Giovanni Ko; Walter Theseira; Xin Qing Cai; Yeow Chong Goh

Evaluating human versus machine learning performance in classifying research abstracts

Khiam Aik Khor, Giovanni Ko, Walter Theseira, Xin Qing Cai, Yeow Chong Goh

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.

Original language	English
Title of host publication	17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
Editors	Giuseppe Catalano, Cinzia Daraio, Martina Gregori, Henk F. Moed, Giancarlo Ruocco
Publisher	International Society for Scientometrics and Informetrics
Pages	2157-2162
Number of pages	6
ISBN (Electronic)	9788833811185
Publication status	Published - 2019
Externally published	Yes
Event	17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Rome, Italy Duration: Sept 2 2019 → Sept 5 2019

Publication series

Name	17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
Volume	2

Conference

Conference	17th International Conference on Scientometrics and Informetrics, ISSI 2019
Country/Territory	Italy
City	Rome
Period	9/2/19 → 9/5/19

Bibliographical note

Publisher Copyright:
© 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved.

ASJC Scopus Subject Areas

Statistics and Probability
Computer Science Applications
Management Science and Operations Research
Applied Mathematics
Modelling and Simulation

Cite this

Khor, K. A., Ko, G., Theseira, W., Cai, X. Q., & Goh, Y. C. (2019). Evaluating human versus machine learning performance in classifying research abstracts. In G. Catalano, C. Daraio, M. Gregori, H. F. Moed, & G. Ruocco (Eds.), 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings (pp. 2157-2162). (17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings; Vol. 2). International Society for Scientometrics and Informetrics.

Khor, Khiam Aik ; Ko, Giovanni ; Theseira, Walter et al. / Evaluating human versus machine learning performance in classifying research abstracts. 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. editor / Giuseppe Catalano ; Cinzia Daraio ; Martina Gregori ; Henk F. Moed ; Giancarlo Ruocco. International Society for Scientometrics and Informetrics, 2019. pp. 2157-2162 (17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings).

@inproceedings{5b2adc006dd44d8eb261defdcf73cb09,

title = "Evaluating human versus machine learning performance in classifying research abstracts",

abstract = "Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.",

author = "Khor, \{Khiam Aik\} and Giovanni Ko and Walter Theseira and Cai, \{Xin Qing\} and Goh, \{Yeow Chong\}",

note = "Publisher Copyright: {\textcopyright} 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved.; 17th International Conference on Scientometrics and Informetrics, ISSI 2019 ; Conference date: 02-09-2019 Through 05-09-2019",

year = "2019",

language = "English",

series = "17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings",

publisher = "International Society for Scientometrics and Informetrics",

pages = "2157--2162",

editor = "Giuseppe Catalano and Cinzia Daraio and Martina Gregori and Moed, \{Henk F.\} and Giancarlo Ruocco",

booktitle = "17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings",

}

Khor, KA, Ko, G, Theseira, W, Cai, XQ & Goh, YC 2019, Evaluating human versus machine learning performance in classifying research abstracts. in G Catalano, C Daraio, M Gregori, HF Moed & G Ruocco (eds), 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings, vol. 2, International Society for Scientometrics and Informetrics, pp. 2157-2162, 17th International Conference on Scientometrics and Informetrics, ISSI 2019, Rome, Italy, 9/2/19.

Evaluating human versus machine learning performance in classifying research abstracts. / Khor, Khiam Aik; Ko, Giovanni; Theseira, Walter et al.
17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. ed. / Giuseppe Catalano; Cinzia Daraio; Martina Gregori; Henk F. Moed; Giancarlo Ruocco. International Society for Scientometrics and Informetrics, 2019. p. 2157-2162 (17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Evaluating human versus machine learning performance in classifying research abstracts

AU - Khor, Khiam Aik

AU - Ko, Giovanni

AU - Theseira, Walter

AU - Cai, Xin Qing

AU - Goh, Yeow Chong

PY - 2019

Y1 - 2019

N2 - Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.

AB - Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.

UR - http://www.scopus.com/inward/record.url?scp=85073879937&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073879937&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85073879937

T3 - 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings

SP - 2157

EP - 2162

BT - 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings

A2 - Catalano, Giuseppe

A2 - Daraio, Cinzia

A2 - Gregori, Martina

A2 - Moed, Henk F.

A2 - Ruocco, Giancarlo

PB - International Society for Scientometrics and Informetrics

T2 - 17th International Conference on Scientometrics and Informetrics, ISSI 2019

Y2 - 2 September 2019 through 5 September 2019

ER -

Khor KA, Ko G, Theseira W, Cai XQ, Goh YC. Evaluating human versus machine learning performance in classifying research abstracts. In Catalano G, Daraio C, Gregori M, Moed HF, Ruocco G, editors, 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. International Society for Scientometrics and Informetrics. 2019. p. 2157-2162. (17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings).

Evaluating human versus machine learning performance in classifying research abstracts

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Other files and links

Fingerprint

Cite this