For this ggplot demonstration, I will be using the R data set “USarrest”. Here is the structure:
library(ggplot2)
library(patchwork)
library(maps)
## Warning: package 'maps' was built under R version 3.4.4
library(readxl)
library(openxlsx)
head(USArrests)
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
## Colorado 7.9 204 78 38.7
As we can see this is state / location data. It would be nice to view this data on a map to observe general trends. For this we utilize the “maps” package.
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)
states_map <- map_data("state")
map1 <- ggplot(crimes, aes(map_id = state)) + geom_map(aes(fill = Murder), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat)
last_plot() + coord_map()
states_map2 <- map_data("state")
map2 <- ggplot(crimes, aes(map_id = state)) + geom_map(aes(fill = Assault), map = states_map2) + expand_limits(x = states_map$long, y = states_map$lat)
last_plot() + coord_map()
To see the initial trend we will plot geom_smooth using ‘loess’
Plot1 <- ggplot(USArrests, mapping = aes(x = UrbanPop, y = Assault)) +
geom_point()+
geom_smooth()
Plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
We can view the relationship between Population and Assualt Rates for each state. However I acknowledge that this is a bad method since we cannot pinpoint each state. It does demonstrate the power of ggplot!
Data <- read_excel("Data.xlsx")
gg <- ggplot(Data, aes(x=UrbanPop, y=Assault)) +
geom_point(aes(col=State)) +
geom_smooth(method="loess", se=F) +
labs(subtitle="Population Vs Assault",
y="Assalt",
x="UrbanPopulation",
title="Scatterplot")
plot(gg)
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
Furthermore we could make each data point a function of Rape levels as well. To do this we map the points to size = Rape.
g3 <- ggplot(Data, aes(x=UrbanPop, y=Assault)) +
geom_point(aes(col=State, size = Rape)) +
geom_smooth(method="loess", se=F) +
labs(subtitle="Population Vs Assault",
y="Assalt",
x="UrbanPopulation",
title="Scatterplot")
plot(g3)
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
As we discussed in class, this is information overload! But we have managed to represent four different variables: State, Population, Assault and Rape.
Maybe a simplier way to view these relationships is by using text to represent each point. We can easily change the test size as well. The command “check_overlap = TRUE” is useful to make sure the text does not over lap so it is clearly visible!
g4 <- ggplot(Data,
aes(UrbanPop, Murder, label = as.character(State))) + theme_classic() +
geom_text(check_overlap = TRUE, size = 3)
g4
## Warning: Removed 4 rows containing missing values (geom_text).
These types of graphs can be manipulated very easily. If we ever wanted to angle the text we simply indicate the angle we want. In this case its a 45 degree angle.
g5 <- ggplot(Data,
aes(UrbanPop, Murder, label = as.character(State))) + theme_classic() +
geom_text(angle =45, size = 3)
g5
## Warning: Removed 4 rows containing missing values (geom_text).
Using ‘Patchwork’ we can combine multiple graphs
map1 + Plot1 - g5 + plot_layout(ncol = 1)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 4 rows containing missing values (geom_text).