# R Visualizations – ggplot2

• By
• August 28, 2019
• Big Data

# R Visualizations – ggplot2  (PART-1)

Type of visualization using ggplot2 and their implementations using R-language:

. There are 8 different categories of models you may construct plots.

1. A) Correlation:- Scatterplot, Scatterplot With Encircling, Jitter Plot, Counts Chart, Bubble Plot, Animated Bubble Plot, Marginal Histogram /Boxplot, Correlogram.

1. B) Deviation:- Diverging Bars, Diverging Lollipop Chart, Diverging Dot Plot, Area Chart.
1. C) Ranking: -Ordered Bar Chart, Lollipop Chart, Dot Plot, Slope Chart, Dumbbell Plot.
2. D) Distribution: -Histogram, Density Plot, Box Plot, Dot + Box Plot, Tufte Boxplot, Violin Plot, Population Pyramid.
3. E) Composition: -Waffle Chart, Pie Chart, Treemap, Bar Chart.
4. F) Change:- Time Series Plots, Stacked Area Chart, Calendar Heat Map, Slope Chart, Seasonal Plot.
5. G) Groups:– Dendrogram, Clusters.
7. Correlation

Correlation between two variables.

Scatterplot

In Data Analysis Scatterplot is the most frequently used plot. The scatterplot is used to understand the nature of the relationship between two variables.

theme_set(theme_bw())

data(“midwest”, package = “ggplot2”)

# Scatterplot

sample <- ggplot(midwest, aes(x=area, y=poptotal)) +  geom_point(aes(col=state, size=popdensity)) +

geom_smooth(method=”loess”, se=F) +  xlim(c(0, 0.1)) + ylim(c(0, 500000)) +

labs(subtitle=”Area Vs Population”,  y=”Population”, x=”Area”, title=”Scatterplot”,

caption = “Source: midwest”)

plot(sample)

Scatterplot With Encircling

I would encircle some specific group of points in the chart so as to draw those particular cases. This is done by the geom_encircle() in ggalt package.

Set the dataset to a new data frame that contains only the rows. You can expand the plot so as to pass outside the points. The color and size (thickness) parameters are changeable.

library(ggplot2)

library(ggalt)

midwest_choose <- midwest[midwest\$poptotal > 350000 &

midwest\$poptotal <= 500000 &

midwest\$area > 0.01 &

midwest\$area < 0.1, ]

ggplot(midwest, aes(x=area, y=poptotal)) +   geom_point(aes(col=state, size=popdensity)) +

geom_smooth(method=”loess”, se=F) +   xlim(c(0, 0.1)) + ylim(c(0, 500000)) +

geom_encircle(aes(x=area, y=poptotal),   data=midwest_select, color=”red”, size=2,   expand=0.08) +

labs(subtitle=”Area Vs Population”, y=”Population”,  x=”Area”, title=”Scatterplot + Encircle”,

caption=”Source: midwest”)

Jitter Plot

Plot city mileage (cty) vs highway mileage (hwy) .

library(ggplot2)

data(mpg, package=”ggplot2″) #

theme_set(theme_bw())

sample <- ggplot(mpg, aes(cty, hwy))

sample + geom_point() +  geom_smooth(method=”lm”, se=F) +

labs(subtitle=”mpg: city vs highway mileage”, y=”hwy”, x=”cty”,title=”Scatterplot with overlapping points”,

caption=”Source: midwest”)

This scatterplot gives you a clear idea of how the city mileage (city) and highway mileage (hwy) is well correlated to each other.

dim(mpg)

library(ggplot2)

data(mpg, package=”ggplot2″)

theme_set(theme_bw())

sample <- ggplot(mpg, aes(cty, hwy))

sample + geom_jitter(width = .5, size=1) +  labs(subtitle=”mpg: city vs highway mileage”,

y=”hwy”, x=”cty”, title=”Jittered Points”)

Counts Chart

counts chart is used to solve the problem of data points overlap. Increase in data points overlaps, increase in size of the circle.

library(ggplot2)

data(mpg, package=”ggplot2″)

theme_set(theme_bw())

sample <- ggplot(mpg, aes(cty, hwy))

sample + geom_count(col=”tomato3″, show.legend=F) +labs(subtitle=”mpg: city vs highway mileage”,

y=”hwy”,   x=”cty”, title=”Counts Plot”)

Bubble plot

The bubble chart is used to understand the relationship within the underlying groups based on  A Categorical variable and Another continuous variable.

Bubble charts are more suitable if you have Multi-Dimensional data like there are numeric data in X and Y form and categorical data in color form and numeric variable data in size.

library(ggplot2)

data(mpg, package=”ggplot2″)

sample_select <- mpg[mpg\$manufacturer %in% c(“audi”, “ford”, “honda”, “hyundai”), ]

theme_set(theme_bw())

sample <- ggplot(sample_select, aes(displ, cty)) + labs(subtitle=”mpg: Displacement vs City Mileage”,

title=”Bubble chart”)

sample + geom_jitter(aes(col=manufacturer, size=hwy)) + geom_smooth(aes(col=manufacturer),

method=”lm”, se=F)

Animated Bubble chart

The gganimate package is used to implement an animated bubble chart.

Set the aes(frame) to the specific column on which you want to animate. Another procedure-related to the plot is the same. You can use gganimate() after the plot is constructed.

library(ggplot2)

library(gganimate)

library(gapminder)

theme_set(theme_bw())

sample <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, frame = year)) +

geom_point() +geom_smooth(aes(group = year), method = “lm”, show.legend = FALSE) +

facet_wrap(~continent, scales = “free”) +  scale_x_log10()

gganimate(sample, interval=0.2)

Marginal Histogram / Boxplot

The marginal histogram is used to show the relationship and the distribution in the same chart. The margins of the scatterplot, there is a histogram of the X and Y variables.

It is implemented using the ‘ggExtra’ package. you could select to draw a marginal boxplot and density plot by setting the respective type option.

library(ggplot2)

library(ggExtra)

data(mpg, package=”ggplot2″)

theme_set(theme_bw())

sample_select <- mpg[mpg\$hwy >= 35 & mpg\$cty > 27, ]

sample <- ggplot(sample_select, aes(cty, hwy)) + geom_count() +   geom_smooth(method=”lm”, se=F)

ggMarginal(sample, type = “histogram”, fill=”transparent”)

ggMarginal(sample, type = “boxplot”, fill=”transparent”)

Correlogram

The correlogram is used to examine the correlation of multiple continuous variables. The ggcorrplot package is used to implement Correlogram.

library(ggplot2)

library(ggcorrplot)

data(mtcars)

corr_sample <- round(cor(mtcars), 1)

ggcorrplot(corr_sample, hc.order = TRUE,  type = “lower”, lab = TRUE, lab_size = 3, method=”circle”,

colors = c(“tomato2”, “white”, “springgreen3″),  title=”Correlogram of mtcars”,

ggtheme=theme_bw)

1. Deviation

Diverging bars

To handle both negative and positive values we used Diverging Bars. The geom_bar() function is used to implement the diverging bar. geom_bar() can be used to make a bar chart and a histogram.

geom_bar() has the stat set to count, i.e. when you provide just a continuous X variable it tries to plot a histogram.

To Plot a bar chart we provide Set stat=identity , and both x and y inside aes() .

library(ggplot2)

theme_set(theme_bw())

data(“mtcars”)

mtcars\$`car name` <- rownames(mtcars)

mtcars\$sample_z <- round((mtcars\$mpg – mean(mtcars\$mpg))/sd(mtcars\$mpg), 2)

mtcars\$sample_type <- ifelse(mtcars\$sample_z < 0, “below”, “above”)

mtcars <- mtcars[order(mtcars\$sample_z), ]

mtcars\$`car name` <- factor(mtcars\$`car name`, levels = mtcars\$`car name`)

ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +

geom_bar(stat=’identity’, aes(fill=sample_type), width=.5)  +

scale_fill_manual(name=”Mileage”, labels = c(“Above Average”, “Below Average”),

values = c(“above”=”#00ba38”, “below”=”#f8766d”)) + labs(subtitle=”Normalised mileage from ‘mtcars'”,

title= “Diverging Bars”) + coord_flip()

Diverging Lollipop Chart

Lollipop chart  looks more modern and use geom_point and geom_segment instead of geom._bar  .

library(ggplot2)

theme_set(theme_bw())

ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +

geom_point(stat=’identity’, fill=”black”, size=6)  +geom_segment(aes(y = 0, x = `car name`,

yend = sample_z,  xend = `car name`), color = “black”) +  geom_text(color=”white”, size=2) +

labs(title=”Diverging Lollipop Chart”, subtitle=”Normalized mileage from ‘mtcars’: Lollipop”) +

ylim(-2.5, 2.5) +coord_flip()

Diverging Dot Plot

library(ggplot2)

theme_set(theme_bw())

ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +

geom_point(stat=’identity’, aes(col=sample_type), size=6)  +

scale_color_manual(name=”Mileage”,labels = c(“Above Average”, “Below Average”),

values = c(“above”=”#00ba38”, “below”=”#f8766d”)) +   geom_text(color=”white”, size=2) +

labs(title=”Diverging Dot Plot”, subtitle=”Normalized mileage from ‘mtcars’: Dotplot”) +   ylim(-2.5, 2.5) +

coord_flip()

Area Chart

Area charts are used to plot a particular metric. The geom_area() function is used to implements this chart.

library(ggplot2)

library(quantmod)

data(“economics”, package = “ggplot2”)

economics\$sample_perc <- c(0, diff(economics\$psavert)/economics\$psavert[-length(economics\$psavert)])

brks <- economics\$date[seq(1, length(economics\$date), 12)]

lbls <- lubridate::year(economics\$date[seq(1, length(economics\$date), 12)])

ggplot(economics[1:100, ], aes(date, sample_perc)) +  geom_area() +

scale_x_date(breaks=brks, labels=lbls) + theme(axis.text.x = element_text(angle=90)) +

labs(title=”Area Chart”, subtitle = “Perc Returns for Personal Savings”,

y=”% Returns for Personal savings”, caption=”Source: economics”)

1. Ranking

Ordered Bar Chart

Sample_cty_mpg <- aggregate(mpg\$cty, by=list(mpg\$manufacturer), FUN=mean)

colnames(sample_cty_mpg) <- c(“make”, “mileage”)

sample_cty_mpg <- sample_cty_mpg[order(cty_mpg\$mileage), ]

sample_cty_mpg\$make <- factor(sample_cty_mpg\$make, levels = sample_cty_mpg\$make)

library(ggplot2)

theme_set(theme_bw())

ggplot(sample_cty_mpg, aes(x=make, y=mileage)) +geom_bar(stat=”identity”, width=.5, fill=”tomato3″) +

labs(title=”Ordered Bar Chart”,  subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +

theme(axis.text.x = element_text(angle=65, vjust=0.6))

Lollipop Chart

library(ggplot2)

theme_set(theme_bw())

ggplot(sample_cty_mpg, aes(x=make, y=mileage)) + geom_point(size=3) +

geom_segment(aes(x=make,xend=make,y=0,yend=mileage)) +

labs(title=”Lollipop Chart”, subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +

theme(axis.text.x = element_text(angle=65, vjust=0.6))

Dot Plot

library(ggplot2)

library(scales)

theme_set(theme_classic())

ggplot(sample_cty_mpg, aes(x=make, y=mileage)) + geom_point(col=”tomato2″, size=3) +

geom_segment(aes(x=make,xend=make,y=min(mileage),

yend=max(mileage)),linetype=”dashed”,size=0.1) +

labs(title=”Dot Plot”, subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +coord_flip()

Slope Chart

library(ggplot2)

library(scales)

theme_set(theme_classic())

colnames(data_f) <- c(“continent”, “1952”, “1957”)

left_label <- paste(data_f\$continent, round(data_f\$`1952`),sep=”, “)

right_label <- paste(data_f\$continent, round(data_f\$`1957`),sep=”, “)

data_f\$class <- ifelse((data_f\$`1957` – data_f\$`1952`) < 0, “red”, “green”)

sample <- ggplot(data_f) + geom_segment(aes(x=1, xend=2, y=`1952`, yend=`1957`, col=class),

size=. 75, show.legend=F) +  geom_vline(xintercept=1, linetype=”dashed”, size=.1) +

geom_vline(xintercept=2, linetype=”dashed”, size=.1) +

scale_color_manual(labels = c(“Up”, “Down”), values = c(“green”=”#00ba38”, “red”=”#f8766d”)) +

labs(x=””, y=”Mean GdpPerCap”) +  xlim(.5, 2.5) + ylim(0,(1.1*(max(data_f\$`1952`, data_f\$`1957`))))

sample <- sample + geom_text(label=left_label, y=data_f\$`1952`, x=rep(1, NROW(data_f)), hjust=1.1, size=3.5)

sample <- sample + geom_text(label=right_label, y= data_f\$`1957`, x=rep(2, NROW(data_f)), hjust=-0.1, size=3.5)

sample <- sample + geom_text(label=”Time 1″, x=1, y=1.1*(max(data_f \$`1952`, data_f \$`1957`)), hjust=1.2, size=5)

sample <- sample + geom_text(label=”Time 2″, x=2, y=1.1*(max(data_f \$`1952`, data_f \$`1957`)), hjust=-0.1, size=5)

sample + theme(panel.background = element_blank(),panel.grid = element_blank(),

axis.ticks = element_blank(),axis.text.x = element_blank(),

panel.border = element_blank(),plot.margin = unit(c(1,2,1,2), “cm”))

Dumbbell Plot

library(ggplot2)

library(ggalt)

theme_set(theme_classic())

health_sample\$Area <- factor(health_sample \$Area, levels=as.character(health_sample \$Area))

gg_sample <- ggplot(health_sample, aes(x=pct_2013, xend=pct_2014, y=Area, group=Area)) +

geom_dumbbell(color=”#a3c4dc”,size=0.75,point.colour.l=”#0e668b”) +

scale_x_continuous(label=percent) +  labs(x=NULL, y=NULL,title=”Dumbbell Chart”,

subtitle=”Pct Change: 2013 vs 2014″,   caption=”Source: https://github.com/hrbrmstr/ggalt”) +

theme(plot.title = element_text(hjust=0.5, face=”bold”),plot.background=element_rect(fill=”#f7f7f7″),

panel.background=element_rect(fill=”#f7f7f7″),   panel.grid.minor=element_blank(),

panel.grid.major.y=element_blank(),  panel.grid.major.x=element_line(),

axis.ticks=element_blank(), legend.position=”top”,  panel.border=element_blank())

plot(gg_sample)

Sample Plot:-

1. Sample_Numbers<-table(mtcars\$cyl,mtcars\$gear)

barplot(Sample_Numbers,main=’Automobile cylinder number grouped by number of gears’,col=c(‘red’,’orange’,’steelblue’), legend=rownames(Sample_Numbers),xlab=’Number of Gears’,

ylab=’count’)

1. hist(airquality\$Temp,col=’steelblue’,main=’Maximum Daily Temperature’,xlab=’Temperature (degrees Fahrenheit)’)

1. Sample_x<-rnorm(10,mean=rep(1:5,each=2),sd=0.7)

Sample_y<-rnorm(10,mean=rep(c(1,9),each=5),sd=0.1)

data<-data.frame(x=Sample_x,y=Sample_y)

set.seed(143)

data_Sample<-as.matrix(data)[sample(1:10),]

heatmap(data_Sample)

1. with(subset(airquality,Month==9),plot(Wind,Ozone,col=’steelblue’,pch=20,cex=1.5))

title(‘Wind and Temperature in NYC in September of 1973’)

1. sample_cars<-transform(sample_cars,cyl=factor(cyl))

class(sample_cars\$cyl)
boxplot(mpg~cyl,sample_cars,xlab=’Number of Cylinders’,ylab=’miles per gallon’,main=’miles per gallon for varied cylinders in automobiles’,cex.main=1.2)

1. corr_sample <- cor(sample_cars)

corrplot(corr_sample)

corrplot(corr_sample, method = ‘number’,type = “lower”)

1. airquality %>%

group_by(Day) %>%

summarise(mean_wind = mean(Wind)) %>%

ggplot() +geom_area(aes(x = Day, y = mean_wind)) +

labs(title = “Area Chart of Average Wind per Day”,

subtitle = “using airquality data”,y = “Mean Wind”)

Author:-
Rahul Pund

## Call the Trainer and Book your free demo Class now!!! 