What can scatterplots teach us about doing data science better?

Wilson Wen Bin Goh; Reuben Jyong Kiat Foo; Limsoon Wong

doi:10.1007/s41060-022-00362-9

What can scatterplots teach us about doing data science better?

Wilson Wen Bin Goh^*, Reuben Jyong Kiat Foo, Limsoon Wong^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

5 Citations (Scopus)

Abstract

A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex, relying solely on “eye power” alone may cause us to miss interesting associations, or worse, make wrong interpretations. We show that by combining scatterplots with statistical and logical reasoning (the sliding window and two-axis median bisection), we may identify interesting associations in a case study of Graduate Record Examination admission versus graduation outcomes, and whether low detectability of proteins in a biological sample are truly associated with low abundance. Due to subjective visual interpretability, we recommend graphing the data using a multitude of visual variables and graph types before concluding the absence of an association. Finally, even if associations are demonstrable, developing causal models that could explain the observed fuzziness and lack of apparent correlations in the scatterplot are helpful for better decision-making and interpretation.

Original language	English
Pages (from-to)	111-125
Number of pages	15
Journal	International Journal of Data Science and Analytics
Volume	17
Issue number	1
DOIs	https://doi.org/10.1007/s41060-022-00362-9
Publication status	Published - Jan 2024
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature Switzerland AG.

ASJC Scopus Subject Areas

Information Systems
Modelling and Simulation
Computer Science Applications
Computational Theory and Mathematics
Applied Mathematics

Keywords

Data science
Education
Graph literacy
Scatterplots
Visualization

Access to Document

10.1007/s41060-022-00362-9

Cite this

@article{c6e7af0f2a894f658f8870b13237fd6f,

title = "What can scatterplots teach us about doing data science better?",

abstract = "A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex, relying solely on “eye power” alone may cause us to miss interesting associations, or worse, make wrong interpretations. We show that by combining scatterplots with statistical and logical reasoning (the sliding window and two-axis median bisection), we may identify interesting associations in a case study of Graduate Record Examination admission versus graduation outcomes, and whether low detectability of proteins in a biological sample are truly associated with low abundance. Due to subjective visual interpretability, we recommend graphing the data using a multitude of visual variables and graph types before concluding the absence of an association. Finally, even if associations are demonstrable, developing causal models that could explain the observed fuzziness and lack of apparent correlations in the scatterplot are helpful for better decision-making and interpretation.",

keywords = "Data science, Education, Graph literacy, Scatterplots, Visualization",

author = "Goh, \{Wilson Wen Bin\} and Foo, \{Reuben Jyong Kiat\} and Limsoon Wong",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature Switzerland AG.",

year = "2024",

month = jan,

doi = "10.1007/s41060-022-00362-9",

language = "English",

volume = "17",

pages = "111--125",

journal = "International Journal of Data Science and Analytics",

issn = "2364-415X",