Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy

Sung Yang Ho; Limsoon Wong; Wilson Wen Bin Goh

doi:10.1016/j.patter.2020.100025

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy

Sung Yang Ho, Limsoon Wong^*, Wilson Wen Bin Goh^*

^*Corresponding author for this work

Research output: Contribution to journal › Review article › peer-review

22 Citations (Scopus)

Abstract

Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics. There is a huge potential for machine learning, but blind reliance on oversimplified metrics can mislead. Class-prediction accuracy is a common metric used for determining classifier performance. This article provides examples to show how the class-prediction accuracy is superficial and even misleading. We propose some augmentative measures to supplement the class-prediction accuracy. This in turn helps us to better understand the quality of learning of the classifier. Class-prediction accuracy is an evaluative method for machine-learning classifiers. However, this method is simple and may produce spurious interpretations when used without caution. Contextualization, dimensionality reduction approaches, and bootstrapping with Jaccard coefficients are possible strategies that can be used to better inform the learning outcome.

Original language	English
Article number	100025
Journal	Patterns
Volume	1
Issue number	2
DOIs	https://doi.org/10.1016/j.patter.2020.100025
Publication status	Published - May 8 2020
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020 The Authors

ASJC Scopus Subject Areas

General Decision Sciences

Keywords

artificial intelligence
data science
DSML 5: Mainstream: Data science output is well understood and (nearly) universally adopted
machine learning
validation

Access to Document

10.1016/j.patter.2020.100025

Cite this

@article{d1d65e55c2ef4cd389242cdcab37b6cc,

title = "Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy",

abstract = "Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics. There is a huge potential for machine learning, but blind reliance on oversimplified metrics can mislead. Class-prediction accuracy is a common metric used for determining classifier performance. This article provides examples to show how the class-prediction accuracy is superficial and even misleading. We propose some augmentative measures to supplement the class-prediction accuracy. This in turn helps us to better understand the quality of learning of the classifier. Class-prediction accuracy is an evaluative method for machine-learning classifiers. However, this method is simple and may produce spurious interpretations when used without caution. Contextualization, dimensionality reduction approaches, and bootstrapping with Jaccard coefficients are possible strategies that can be used to better inform the learning outcome.",

keywords = "artificial intelligence, data science, DSML 5: Mainstream: Data science output is well understood and (nearly) universally adopted, machine learning, validation",

author = "Ho, {Sung Yang} and Limsoon Wong and Goh, {Wilson Wen Bin}",

note = "Publisher Copyright: {\textcopyright} 2020 The Authors",

year = "2020",

month = may,

day = "8",

doi = "10.1016/j.patter.2020.100025",

language = "English",

volume = "1",

journal = "Patterns",

issn = "2666-3899",

publisher = "Cell Press",

number = "2",

}

TY - JOUR

T1 - Avoid Oversimplifications in Machine Learning

T2 - Going beyond the Class-Prediction Accuracy

AU - Ho, Sung Yang

AU - Wong, Limsoon

AU - Goh, Wilson Wen Bin

PY - 2020/5/8

Y1 - 2020/5/8

N2 - Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics. There is a huge potential for machine learning, but blind reliance on oversimplified metrics can mislead. Class-prediction accuracy is a common metric used for determining classifier performance. This article provides examples to show how the class-prediction accuracy is superficial and even misleading. We propose some augmentative measures to supplement the class-prediction accuracy. This in turn helps us to better understand the quality of learning of the classifier. Class-prediction accuracy is an evaluative method for machine-learning classifiers. However, this method is simple and may produce spurious interpretations when used without caution. Contextualization, dimensionality reduction approaches, and bootstrapping with Jaccard coefficients are possible strategies that can be used to better inform the learning outcome.

AB - Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics. There is a huge potential for machine learning, but blind reliance on oversimplified metrics can mislead. Class-prediction accuracy is a common metric used for determining classifier performance. This article provides examples to show how the class-prediction accuracy is superficial and even misleading. We propose some augmentative measures to supplement the class-prediction accuracy. This in turn helps us to better understand the quality of learning of the classifier. Class-prediction accuracy is an evaluative method for machine-learning classifiers. However, this method is simple and may produce spurious interpretations when used without caution. Contextualization, dimensionality reduction approaches, and bootstrapping with Jaccard coefficients are possible strategies that can be used to better inform the learning outcome.

KW - artificial intelligence

KW - data science

KW - DSML 5: Mainstream: Data science output is well understood and (nearly) universally adopted

KW - machine learning

KW - validation

UR - http://www.scopus.com/inward/record.url?scp=85088865069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85088865069&partnerID=8YFLogxK

U2 - 10.1016/j.patter.2020.100025

DO - 10.1016/j.patter.2020.100025

M3 - Review article

AN - SCOPUS:85088865069

SN - 2666-3899

VL - 1

JO - Patterns

JF - Patterns

IS - 2

M1 - 100025

ER -

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this