Programming Data Visualisation with R

R is a powerful tool for data science and visualization. Tidyverse which is an collection of R packages is higly useful and recommended. All packages share an underlying design philosophy, grammar, and data structures. We will explore data visualisation using tidyverse in this blog.

Alicia Zhang https://www.linkedin.com/in/alicia-zhang-22a1a6140/ (School of Computing and Information Systems, Singapore Management University)https://scis.smu.edu.sg/
06-26-2021

Installing and loading the required libraries

Use the below code to install and load packages. Whenever you have new packages, you can add them in the list of packages. It’s an easier way to load multiple packages.

packages = c('DT','ggiraph','patchwork',
             'plotly','tidyverse')
for (p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Importing Data

The code chunk below imports exam_data.csv into R environment using read_csv() function from readr package.

exam_data <- read_csv("data/Exam_data.csv")

Introducing ggplot2

ggplot 2 graphic ggplot 2 is powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.

bar chart

ggplot(exam_data, 
       aes(x=RACE)) +
  geom_bar()

dotplot

ggplot(exam_data, 
       aes(x=MATHS,
           fill = RACE)) +
  geom_dotplot(binwidth=2.5, #change the binwidth to 2.5
               dotsize=0.5) +
  scale_y_continuous(NULL,breaks = NULL) #turn off the y-axis

histgram

ggplot(exam_data, 
       aes(x=MATHS))+
  geom_histogram(bins=20, #default bin is 30
                 color='black',
                 fill = 'light blue')

histgram by gender

ggplot(exam_data, 
       aes(x=MATHS,
           fill = GENDER))+
  geom_histogram(bins=20, #default bin is 30
                 color='grey30')

boxplot

ggplot(exam_data, 
       aes(y = MATHS,
           x = GENDER))+
  geom_boxplot() +
  geom_point(position='jitter', #randomly distribute the data points
             size = 0.5)

Interactive with R using ggiraph and plotlyr

ggiraph Methods

create an interactive dotplot

p <- ggplot(exam_data,
            aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = ID),
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL, breaks = NULL)

girafe(
  ggobj = p,
  width_svg = 6,
  height_svg = 6*0.618
)
Hover effects
p <- ggplot(exam_data,
            aes(x = MATHS)) +
  geom_dotplot_interactive(
    (aes(data_id = CLASS)),
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL, breaks = NULL)

girafe(
  ggobj = p,
  width_svg = 6,
  height_svg = 6*0.618
)
Combine tooltip and hover effect
p <- ggplot(exam_data,
            aes(x = MATHS)) +
  geom_dotplot_interactive(
    (aes(data_id = CLASS,
         tooltip = CLASS)),
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL, breaks = NULL)

girafe(
  ggobj = p,
  width_svg = 6,
  height_svg = 6*0.618
)

Plotly Methods

Plotly has richer features than girraph. It has an important and useful function “Select”.

plot_ly(data=exam_data,
        x=~MATHS,
        y=~ENGLISH)

Adding in anther visual element color. Can map with either qualitative or quantitative variables.Legend can be used as the selection criteria.

plot_ly(data=exam_data,
        x=~MATHS, 
        y=~ENGLISH,
        color=~RACE)

Changing colour pallete

plot_ly(data=exam_data,
        x=~MATHS, 
        y=~ENGLISH,
        color=~RACE,
        colors="Set1")

Customise colour scheme

pal <- c("red","yellow","blue","green")
plot_ly(data=exam_data,
        x=~MATHS, 
        y=~ENGLISH,
        color=~RACE,
        colors=pal)

Customise tooltip Two good things about the tooltip in Plotly. One is that tooltip box will change the position automatically based on the space. Another is it will change the text colors automatically to provide better contrast. For example, text color become white if the background is red or blue.

plot_ly(data=exam_data,
        x=~MATHS, 
        y=~ENGLISH,
        text =~paste("Student ID:", ID, 
                     "<br>Class:",CLASS), #<br>: start a new line
        color=~RACE,
        colors="Set1")

Working with Layout Plotly provides inking feature. For example, x-axis and y-axis are black and the grid lines are grey.

plot_ly(data=exam_data,
        x=~MATHS, 
        y=~ENGLISH,
        text =~paste("Student ID:", ID, 
                     "<br>Class:",CLASS), #<br>: start a new line
        color=~RACE,
        colors="Set1") %>%
  layout(title='English Score versus Math Score',
         xaxis=list(range=c(0,100)),
         yaxis=list(range=c(0,100)))

Creating an unteractive scatter plot: ggplotly() method

p <- ggplot(data=exam_data,
        aes(x=MATHS, 
        y=ENGLISH))+
  geom_point(dotsize=1) +
  coord_cartesian(xlim = c(0,100),
                  ylim=c(0,100))
ggplotly(p)
Coordinated multiple veiws with plotly
p1 <- ggplot(data=exam_data,
        aes(x=MATHS, 
        y=ENGLISH))+
  geom_point(size=1) +
  coord_cartesian(xlim = c(0,100),
                  ylim=c(0,100))

p2 <- ggplot(data=exam_data,
        aes(x=MATHS, 
        y=SCIENCE))+
  geom_point(size=1) +
  coord_cartesian(xlim = c(0,100),
                  ylim=c(0,100))
subplot(ggplotly(p1),
        ggplotly(p2))

Coordinated multiple veiws with plotly

d <- highlight(exam_data) #shared data

p1 <- ggplot(data=d,
        aes(x=MATHS, 
        y=ENGLISH))+
  geom_point(size=1) +
  coord_cartesian(xlim = c(0,100),
                  ylim=c(0,100))

p2 <- ggplot(data=d,
        aes(x=MATHS, 
        y=SCIENCE))+
  geom_point(size=1) +
  coord_cartesian(xlim = c(0,100),
                  ylim=c(0,100))
subplot(ggplotly(p1),
        ggplotly(p2))

Interactive Data Table: DT package

DT::datatable(exam_data)