The Power of GGPlot

In the post GGPlot for starters we learned the basic syntax of GGPlot and I outlined that learning R becomes pretty easy when your start with GGplot, because it follows the simple logic of adding layers and components to the graph. We started with a simple bar chart and we adjusted the title, the theme and a few more options to illustrate what can be done with GGplot. So, we made the first steps to generate a graph that tells the story we’d like to provide for our audience. Don’t get me wrong, of course I know that we made a bar chart that every other software package is perfectly capable to produce, and I agree, we definitively do not need to learn another programming language to make a simple bar chart.

But there is a good reason why we should start learning R, in particular (but not solely) for the many powerful applications of GGPlot. The power of GGplot comes from the grammar of graphics and to a great extend from many GGplot extensions out there. What does it all mean?

In many instance we just need to adjust a few lines of code in order to change the whole graphical appearance of the plot, thanks to the many application and extension of GGPlot. For example, we can start with the code of our bar chart and simply use that code to generate a line graph. Of course, this doesn’t work in every instance. Sometimes you have to change a few more lines, especially if you want to visualize more advanced stuff, say a choropleth map (values displayed based on a map). Nevertheless, the principals work the same way even if you call ggmap to produce a map, instead of gglot, which we will cover in a later post. Plus, most of the code to adjust your graph (title, labs, background, on so one) is still the same. Regardless whether you make a bar chart, line graph or a map of a country.

So, hands on! Let’s start with a few examples.

A Bar Chart it is …?

In the last session we learned our basic syntax for the bar chart. Let’s recall this from the last session, generate some made-up data and plot these values as we already know.

#Lets make up some data to illustrate:

dat1 <- data.frame(
  Sex = factor(c("Female","Female","Male","Male")),
  time = factor(c("Experimental","Controlgroup","Experimental","Controlgroup"),
                levels=c("Experimental","Controlgroup")),
  total_bill = c(13.53, 16.81, 16.24, 17.42)
  )

# Reduced bar chart from the last R session
ggplot(data=dat1, aes(x=time, y=total_bill)) +
  geom_bar(stat="identity")

Here it is, our bar chart. In principle that’s where we stopped in the last post: Just a plain bar chart to compare the values of groups. This time, let’s focus on the basic syntax first: In the first line we call GGPlot and hand over which variables we’d like to visualize. In the second line we defined the way GGPlot will actually depict the data graphically. In our case bar geom. Now, we can adjust this second line to call another graphical way to display our data. Instead of calling the geom_bar, we can call geom_line and geom_point to make a very simple linegraph. Let’s see …

#Let's try to make a linegraph by adjusting the geom that GGplots uses

ggplot(data=dat1, aes(x=time, y=total_bill, group=Sex)) +
  geom_line() +
  geom_point()

Looking at that graph you may wonder why that guy makes a line graph that combines the measurement points from a (made-up) treatment with a control group. The simple answer: Because we can. Even this non-sense graph shows you how easily it is to change the graphical appearances by adjusting only the depicted geom. That’s exactly where the flexibility of GGplot comes from. Each graph can be decomposed into its elements. That’s why we can apply many aspects from one graph to another.

For example, we already know how to adjust the axis, provide a title, choose a preferred theme and so on. In most instances we do not need to adjust that code. We can simply copy and paste it from our project (or in our case from the post). Let’s have a look.

# Filled bar chart with grahical adjustments (Theme, labs, etc.)

ggplot(data=dat1, aes(x=time, y=total_bill, fill=Sex)) +
  geom_bar(stat="identity")+
  theme_minimal()+
  scale_fill_brewer(palette="Set1")+
  theme(legend.position="bottom")+
  ggtitle("Main Result")+
  ylab("Outcome")+
  xlab("Condition")

So, we improved our basic bar chart and switched the theme to theme_minimal(), provided a title by ggtitle("Main result") and corresponding labels for each axis by ylab("Y-Axis") and ylab("Y-Axis"). Let’s copy and paste those lines to make our line graph a bit more reasonable.

#Lets try to make a line graph by adjusting the geom that GGplots uses

ggplot(data=dat1, aes(x=time, y=total_bill, group=Sex)) +
  geom_line() +
  geom_point()+
  theme_minimal()+
  scale_fill_brewer(palette="Set1")+
  theme(legend.position="bottom")+
  ggtitle("Main Result")+
  ylab("Outcome")+
  xlab("Condition")

Now this plot looks bit more like a serious line graph, even though the connected line still doesn’t make any sense. Nevertheless, you see that I just copied and paste the code to change the appearances of the graph and most of them worked fine. Only one code of line seem not worked at all. To make this clearer, let’s switch back to your bar chart.

But this time, we want additionally compare if there are any differences between groups of participants. For example, we can use participant’s sex to fill the bar chart and compare those results by using the fill= option in the aesthetic. All we need to adjust is the fill option by fill=sex.

# Filled barchart based on participant's sex

ggplot(data=dat1, aes(x=time, y=total_bill, fill=Sex)) +
  geom_bar(stat="identity")

In accordance to the male vs. female bar chart, we can use the same data to produce the line chart with points and the lines based on participant’s sex by providing a color indicator colour=sex. Hopefully, the line graph makes now more sense to you ;-)

# Linegraph with participant's sex to color

ggplot(data=dat1, aes(x=time, y=total_bill, group=Sex, colour=Sex)) +
  geom_line() +
  geom_point() +
  theme_minimal()+
  scale_fill_brewer(palette="Set1")+
  theme(legend.position="bottom")+
  ggtitle("Main Result")+
  ylab("Outcome")+
  xlab("Condition")

Look like that participants’ sex interacts in some way in our made-up treatment. Nevertheless, in both instances we highlighted a subgroup by color. Unfortunately, the syntax is not completely identical, because the fill option doesn’t make much sense in a line graph, does it? (Is there anything to fill in a line graph?).

So, in some instances we cannot simply recycle your old syntax. You may need to adjust if the new chart is completely different due to the nature of data, as I said before. But most of the time you can simply use the codes of line we already learned. That’s why GGPlot is powerful. We can often neglect how the graph looks like, which colors to choose and all the other graphical details in the first instance, because we already know how it works. This gives us the chance to focus on the harder question: Is the bar chart (line graph) the best way to make our case? Do we communicate all information that is necessary to understand our graph well? This sounds ridiculous in the case of a simple barchart, but we’ll learn that this point is worth discussing regardless of the used way of visualization, because an unclear label may be confusing for the audience even in the most simplest graph. We have to discuss this point in a later session.

Nevertheless, hopefully you agree that the appearance can quickly be adjusted by only a few snippets of code. And far more important for the moment, we only got a first a glimpse of the real power of GGplot due to its many appearances and extensions. So give it a try, there is no need to know all aspects from a scratch. You will learn by doing your own code. But before you leave this post behind and delve into your own graphic projects, let’s consider a few more examples, just in case the latter was not convincing.

Histograms

This time we want to display a numerical variable with a histogram, so we have to generate first some random variables for illustration purposes. Based on the metric indicator rating we can illustrate the same adjustment circle to produce different graphs from the scratch. So in this case we can skip all other variables and provide ggplot only with our x variable and call the geom_histogram() for the histogram.

# Generate some numerical random data to display a histogram
dat <- data.frame(Sex = factor(rep(c("Male","Female"), each=200)), 
                  rating = c(rnorm(200),rnorm(200, mean=.8)))

#Let's plot it :)
ggplot(dat, aes(x=rating)) + 
  geom_histogram()

Let’s recall the fill option to highlight some sub-group analysis. We can now make a histogram that overlays several layers of the histograms, depending on your subgroups variable. Thus, we can stick to the very same example and generate a histogram for men and women, like we did in the bar chart and line graph. Most tricks work with very different charts because ggplot relies on a coherent syntax language.

So, providing the fill option in the aesthetic fill=sex does already the job.

#Filled histrogram based on participants' sex
ggplot(dat, aes(x=rating, fill=Sex)) +
  geom_histogram(binwidth=.5, alpha=.5, position="identity")

So you see, in our made-up data male tend to give lower ratings even though some overlap is clearly visible. Of course, we can do the same trick over and over again. For example, plot the density curves geom_density() instead of the histogram. Or compare the distributions based on box plots, as you can see in the output below, but I guess the point is clear.

#Simple Box-plot? geom_boxplot()!
ggplot(dat, aes(y=rating, fill=Sex)) + 
  geom_boxplot()

Switching position

However, one last point is still open. Only a few minutes ago we started with a simple bar chart, we adjusted it and filled each bar with different colors for male and females. Then we switched from the bar chart to a line chart, a histogram, a boxplot and I’m that sure you know how to generate a plot with density curves for men and women now. That’s the power of ggplot, but still it is the tip of the iceberg, because all we did was to adjust the geom that graphically depicted our data.

However, the GGplot comes with a powerful syntax that work with almost all GGplot applications and extensions. Believe me, there are a few ;) For instance, assume you have many subgroups/conditions and you want to compare the distribution of each group in a row. Try coord_flip() which makes the vertical comparison much easier.

#Use coord_flip() to switch "X" and Y
ggplot(dat, aes(y=rating, fill=Sex)) + 
  geom_boxplot()+
  coord_flip()

I’m sure it’s no longer a surprise if I tell you that you can flip the coordinates in many GGPlots graphs. So it’s up to you to decide if it makes any sense. In the very same way, assume that you want to split your graph into subgraphs of different units and plot each unit in a panel of the graph. Use facet_grid(vertical ~ horizontal) and tell GGplot whether you want to split the graph on a horizontal or vertical line. Let’s try on a horizontal line.

#Make subgraphs with facet_grid 
ggplot(dat, aes(y=rating, fill=Sex)) + 
  geom_boxplot()+
  facet_grid(. ~ Sex)

Of course, in this minimalistic example it doesn’t make any difference. You can plot a plain boxplot or use the facet grid, the result is still the same. But analyzing real data makes the difference. You can fully adjust the graph the way you want, in a way that’s tells us the story your data implies. For example, instead of a vertical or horizontal split, you can also use facet_wrap( ~ sex, ncol=1) to wrap your graphs by a variable, but determine how many colums ncol=2 are used to display and arrange your subgraphs. This time a panel with labels is created for each subgroup. This makes it really easy to interpret the graph:

#Even more fancy, try facet_wrap to wrap your subgraphs based on the number of columns
ggplot(dat, aes(y=rating, fill=Sex)) + 
  geom_boxplot()+
  facet_wrap( ~ Sex, ncol=2)

You see that on average male and women do not differ in terms of our made-up rating, even though you may not even have a clue what all this stuff is about. And guess what? Facet grid and facet wrap works for bar charts, histograms, line graphs and so on. Do didn’t see that coming, do you?

As always, now it is up to you to explore standard graphics in GGPlot (like box plots, density plots, etc.) to make your case. Adjust them to your needs and most aspects you learn for one particular graphic can be applied to another. For example, can you make a histogram for men and woman and provide a dashed line to indicate the mean rating of each group?