Evaluating human versus machine learning performance in classifying research abstracts

Yeow Chong Goh; Xin Qing Cai; Walter Theseira; Giovanni Ko; Khiam Aik Khor

doi:10.1007/s11192-020-03614-2

Evaluating human versus machine learning performance in classifying research abstracts

Yeow Chong Goh, Xin Qing Cai, Walter Theseira, Giovanni Ko, Khiam Aik Khor^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

Abstract

We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

Original language	English
Pages (from-to)	1197-1212
Number of pages	16
Journal	Scientometrics
Volume	125
Issue number	2
DOIs	https://doi.org/10.1007/s11192-020-03614-2
Publication status	Published - Nov 2020
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020, The Author(s).

ASJC Scopus Subject Areas

General Social Sciences
Computer Science Applications
Library and Information Sciences

Keywords

Discipline classification
Supervised classification
Text classification

Access to Document

10.1007/s11192-020-03614-2

Cite this

@article{b0d1d50766004ad2b151dd83af9423e3,

title = "Evaluating human versus machine learning performance in classifying research abstracts",

abstract = "We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.",

keywords = "Discipline classification, Supervised classification, Text classification",

author = "Goh, \{Yeow Chong\} and Cai, \{Xin Qing\} and Walter Theseira and Giovanni Ko and Khor, \{Khiam Aik\}",

note = "Publisher Copyright: {\textcopyright} 2020, The Author(s).",

year = "2020",

month = nov,

doi = "10.1007/s11192-020-03614-2",

language = "English",

volume = "125",

pages = "1197--1212",

journal = "Scientometrics",

issn = "0138-9130",

publisher = "Springer Netherlands",

number = "2",

}

TY - JOUR

T1 - Evaluating human versus machine learning performance in classifying research abstracts

AU - Goh, Yeow Chong

AU - Cai, Xin Qing

AU - Theseira, Walter

AU - Ko, Giovanni

AU - Khor, Khiam Aik

PY - 2020/11

Y1 - 2020/11

N2 - We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

AB - We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

KW - Discipline classification

KW - Supervised classification

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85088147629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85088147629&partnerID=8YFLogxK

U2 - 10.1007/s11192-020-03614-2

DO - 10.1007/s11192-020-03614-2

M3 - Article

AN - SCOPUS:85088147629

SN - 0138-9130

VL - 125

SP - 1197

EP - 1212

JO - Scientometrics

JF - Scientometrics

IS - 2

ER -

Evaluating human versus machine learning performance in classifying research abstracts

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this