What can Venn diagrams teach us about doing data science better?

Sung Yang Ho; Sophia Tan; Chun Chau Sze; Limsoon Wong; Wilson Wen Bin Goh

doi:10.1007/s41060-020-00230-4

What can Venn diagrams teach us about doing data science better?

Sung Yang Ho, Sophia Tan, Chun Chau Sze, Limsoon Wong^*, Wilson Wen Bin Goh^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Data science is about deriving insight, learning and understanding from data. This process may be automated via the use of advanced algorithms or scaffolded cognitively via the use of graphs. While much emphasis is currently placed on machine learning, there is still much to learn about the role of the data scientist, in particular the thinking process by which he reaches conclusions. The thinking process of the data scientist needs to be scaffolded as the human brain is easily overwhelmed by many variables. Graphs are a form of data abstraction and constitute an essential part of the data scientist’s toolkit. Graphs are also a viable scaffold on which the data scientist may gain familiarity with data. But the process of extracting insight from graphs is not always a trivial or straightforward process; it requires interpretative logic as well. Generalizing from the example of a simple graph type, the Venn diagram, we discuss various logical fallacies that can be committed when interpreting a Venn diagram. Amidst various considerations that dictate how a graph should be tackled, we explain why context is most important, and should form the first guiding principle during data analysis.

Original language	English
Pages (from-to)	1-10
Number of pages	10
Journal	International Journal of Data Science and Analytics
Volume	11
Issue number	1
DOIs	https://doi.org/10.1007/s41060-020-00230-4
Publication status	Published - Jan 2021
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

ASJC Scopus Subject Areas

Information Systems
Modelling and Simulation
Computer Science Applications
Computational Theory and Mathematics
Applied Mathematics

Keywords

Data science
Exploratory data analysis
Graph literacy
Visualization

Access to Document

10.1007/s41060-020-00230-4

Cite this

@article{f8d2d3e3491a48129f8dcef37559a509,

title = "What can Venn diagrams teach us about doing data science better?",

abstract = "Data science is about deriving insight, learning and understanding from data. This process may be automated via the use of advanced algorithms or scaffolded cognitively via the use of graphs. While much emphasis is currently placed on machine learning, there is still much to learn about the role of the data scientist, in particular the thinking process by which he reaches conclusions. The thinking process of the data scientist needs to be scaffolded as the human brain is easily overwhelmed by many variables. Graphs are a form of data abstraction and constitute an essential part of the data scientist{\textquoteright}s toolkit. Graphs are also a viable scaffold on which the data scientist may gain familiarity with data. But the process of extracting insight from graphs is not always a trivial or straightforward process; it requires interpretative logic as well. Generalizing from the example of a simple graph type, the Venn diagram, we discuss various logical fallacies that can be committed when interpreting a Venn diagram. Amidst various considerations that dictate how a graph should be tackled, we explain why context is most important, and should form the first guiding principle during data analysis.",

keywords = "Data science, Exploratory data analysis, Graph literacy, Visualization",

author = "Ho, \{Sung Yang\} and Sophia Tan and Sze, \{Chun Chau\} and Limsoon Wong and Goh, \{Wilson Wen Bin\}",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG.",

year = "2021",

month = jan,

doi = "10.1007/s41060-020-00230-4",

language = "English",

volume = "11",

pages = "1--10",

journal = "International Journal of Data Science and Analytics",

issn = "2364-415X",