Abstract
Machine Learning (ML) methods are now applied to many problems in Scientometrics. Given sufficiently large training datasets, ML can efficiently complete natural language processing tasks such as classifying research abstracts and outputs, which otherwise require extensive manpower. But what are the relative strengths and limitations of ML methods versus human research assistance when training data is limited? Our study compares the performance of 63 student research assistants to that of an ML model. The task is classifying a research grant abstract into one of nineteen scientific funding areas in physical and life sciences defined by the European Research Council. We find that ML models, even trained on relatively small datasets, outperform the average human research assistant. While some research assistants perform at levelsjust below that of the ML models, the research assistants display lower inter-rater reliability. Crucially, human classification performance and reliability appears fixed over moderate levels of training and task exposure, suggesting that selecting research assistants based on pre-existing ability could be superior to relying on task-specific training. These results suggest ML classification may be superior to human research assistance for natural language processing tasks even when training datasets are limited.
Original language | English |
---|---|
Title of host publication | 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings |
Editors | Giuseppe Catalano, Cinzia Daraio, Martina Gregori, Henk F. Moed, Giancarlo Ruocco |
Publisher | International Society for Scientometrics and Informetrics |
Pages | 2157-2162 |
Number of pages | 6 |
ISBN (Electronic) | 9788833811185 |
Publication status | Published - 2019 |
Externally published | Yes |
Event | 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Rome, Italy Duration: Sept 2 2019 → Sept 5 2019 |
Publication series
Name | 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings |
---|---|
Volume | 2 |
Conference
Conference | 17th International Conference on Scientometrics and Informetrics, ISSI 2019 |
---|---|
Country/Territory | Italy |
City | Rome |
Period | 9/2/19 → 9/5/19 |
Bibliographical note
Publisher Copyright:© 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved.
ASJC Scopus Subject Areas
- Statistics and Probability
- Computer Science Applications
- Management Science and Operations Research
- Applied Mathematics
- Modelling and Simulation