R is a powerful tool for data science and visualization. Tidyverse which is an collection of R packages is higly useful and recommended. All packages share an underlying design philosophy, grammar, and data structures. We will explore data visualisation using tidyverse in this blog.
Use the below code to install and load packages. Whenever you have new packages, you can add them in the list of packages. It’s an easier way to load multiple packages.
packages = c('DT','ggiraph','patchwork',
'plotly','tidyverse')
for (p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The code chunk below imports exam_data.csv into R environment using read_csv() function from readr package.
exam_data <- read_csv("data/Exam_data.csv")
ggplot 2 graphic ggplot 2 is powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.
ggplot(exam_data,
aes(x=RACE)) +
geom_bar()

ggplot(exam_data,
aes(x=MATHS,
fill = RACE)) +
geom_dotplot(binwidth=2.5, #change the binwidth to 2.5
dotsize=0.5) +
scale_y_continuous(NULL,breaks = NULL) #turn off the y-axis

ggplot(exam_data,
aes(x=MATHS))+
geom_histogram(bins=20, #default bin is 30
color='black',
fill = 'light blue')

ggplot(exam_data,
aes(x=MATHS,
fill = GENDER))+
geom_histogram(bins=20, #default bin is 30
color='grey30')

ggplot(exam_data,
aes(y = MATHS,
x = GENDER))+
geom_boxplot() +
geom_point(position='jitter', #randomly distribute the data points
size = 0.5)

create an interactive dotplot
p <- ggplot(exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL, breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
)
p <- ggplot(exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
(aes(data_id = CLASS)),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL, breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
)
p <- ggplot(exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
(aes(data_id = CLASS,
tooltip = CLASS)),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL, breaks = NULL)
girafe(
ggobj = p,
width_svg = 6,
height_svg = 6*0.618
)
Plotly has richer features than girraph. It has an important and useful function “Select”.
plot_ly(data=exam_data,
x=~MATHS,
y=~ENGLISH)
Adding in anther visual element color. Can map with either qualitative or quantitative variables.Legend can be used as the selection criteria.
plot_ly(data=exam_data,
x=~MATHS,
y=~ENGLISH,
color=~RACE)
Changing colour pallete
plot_ly(data=exam_data,
x=~MATHS,
y=~ENGLISH,
color=~RACE,
colors="Set1")
Customise colour scheme
pal <- c("red","yellow","blue","green")
plot_ly(data=exam_data,
x=~MATHS,
y=~ENGLISH,
color=~RACE,
colors=pal)
Customise tooltip Two good things about the tooltip in Plotly. One is that tooltip box will change the position automatically based on the space. Another is it will change the text colors automatically to provide better contrast. For example, text color become white if the background is red or blue.
plot_ly(data=exam_data,
x=~MATHS,
y=~ENGLISH,
text =~paste("Student ID:", ID,
"<br>Class:",CLASS), #<br>: start a new line
color=~RACE,
colors="Set1")
Working with Layout Plotly provides inking feature. For example, x-axis and y-axis are black and the grid lines are grey.
Creating an unteractive scatter plot: ggplotly() method
Coordinated multiple veiws with plotly
d <- highlight(exam_data) #shared data
p1 <- ggplot(data=d,
aes(x=MATHS,
y=ENGLISH))+
geom_point(size=1) +
coord_cartesian(xlim = c(0,100),
ylim=c(0,100))
p2 <- ggplot(data=d,
aes(x=MATHS,
y=SCIENCE))+
geom_point(size=1) +
coord_cartesian(xlim = c(0,100),
ylim=c(0,100))
subplot(ggplotly(p1),
ggplotly(p2))
Interactive Data Table: DT package
DT::datatable(exam_data)