GitHub-style waffle plots in R

I have a little python script on my work computer that tracks hours I’ve spent in the office (assuming that those hours begin with a computer login, and end in a computer shutdown–sadly this is almost always true.) I don’t really do anything with the information, and it is really a remnant of my time as a contract worker. However, for shared or billed projects, such automation of data gathering can be very valuable. For example, GitHub does a nice job of displaying a user’s contributions to code repositories over time: center

This heatmap represents the activity (some function of number of contributions) of a user over time, plotting more activity as darker tones for each day, represented as little squares. Each light gray square indicates a day with no activity.

In this post, I’ll show how to recreate this “waffle” plot in R with the ggplot2 plotting package.

Simulate activity data

First, I’ll create a data frame for the simulated data, initializing the data types:

library(dplyr)
d <- data_frame(
    date = as.Date(1:813, origin = "2014-01-01"),
    year = format(date, "%Y"),
    week = as.integer(format(date, "%W")) + 1,  # Week starts at 1
    day = factor(weekdays(date, T), 
                 levels = rev(c("Mon", "Tue", "Wed", "Thu",
                                "Fri", "Sat", "Sun"))),
    hours = 0)

And then simulate hours worked for each date. I’ll simulate hours worked separately for weekends and weekdays to make the resulting data a little more realistic, and also simulate missing values to data (that is, days when no work occurred).

set.seed(1)
# Simulate weekends
weekends <- filter(d, grepl("S(at|un)", day))
# Hours worked are (might be) poisson distributed
weekends$hours <- rpois(nrow(weekends), lambda = 4)
# Simulate missing days with probability .7
weekends$na <- rbinom(nrow(weekends), 1, 0.7)
weekends$hours <- ifelse(weekends$na, NA, weekends$hours)

# Simulate weekdays
weekdays <- filter(d, !grepl("S(at|un)", day))
weekdays$hours <- rpois(nrow(weekdays), lambda = 8)  # Greater lambda
weekdays$na <- rbinom(nrow(weekdays), 1, 0.1)  # Smaller p(missing)
weekdays$hours <- ifelse(weekdays$na, NA, weekdays$hours)

# Concatenate weekends and weekdays and arrange by date
d <- bind_rows(weekends, weekdays) %>% 
    arrange(date) %>%  # Arrange by date
    select(-na)  # Remove na column
d
# # A tibble: 813 x 5
#          date  year  week    day hours
#        <date> <chr> <dbl> <fctr> <int>
#  1 2014-01-02  2014     1    Thu     6
#  2 2014-01-03  2014     1    Fri    13
#  3 2014-01-04  2014     1    Sat    NA
#  4 2014-01-05  2014     1    Sun    NA
#  5 2014-01-06  2014     2    Mon     9
#  6 2014-01-07  2014     2    Tue     5
#  7 2014-01-08  2014     2    Wed    NA
#  8 2014-01-09  2014     2    Thu     9
#  9 2014-01-10  2014     2    Fri     8
# 10 2014-01-11  2014     2    Sat    NA
# # ... with 803 more rows

Waffle-plot function

Then I’ll create a function that draws the waffle plot. If you have similarly structured data, you can copy-paste the function and use it on your data (but make sure the following packages are installed.)

library(ggplot2)
library(viridis)  # Color palette
library(ggthemes)
gh_waffle <- function(data, pal = "D", dir = -1){
    
    p <- ggplot(data, aes(x = week, y = day, fill = hours)) +
        scale_fill_viridis(name="Hours", 
                           option = pal,  # Variable color palette
                           direction = dir,  # Variable color direction
                           na.value = "grey93",
                           limits = c(0, max(data$hours))) +
        geom_tile(color = "white", size = 0.4) +
        facet_wrap("year", ncol = 1) +
        scale_x_continuous(
            expand = c(0, 0),
            breaks = seq(1, 52, length = 12),
            labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                       "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) +
        theme_tufte(base_family = "Helvetica") +
        theme(axis.title = element_blank(),
              axis.ticks = element_blank(),
              legend.position = "bottom",
              legend.key.width = unit(1, "cm"),
              strip.text = element_text(hjust = 0.01, face = "bold", size = 12))
    
    print(p)
}

Using the waffle plot function

gh_waffle() takes three arguments, the first, data is a data frame with columns date (type: Date), year (number or character), week (number), day (an ordered factor to make days run from top to bottom on the graph), and hours (number). The second option to gh_waffle(), pal specifies one of four color palettes used by the viridis color scale, and can be "A", "B", "C", or "D". The default is “D”, which is also what GitHub uses. The last option, dir specifies the direction of the color scale, and can be either -1 or 1. The GitHub default is -1.

Using gh_waffle() with the default settings, only providing the data frame d, gives the following result:

gh_waffle(d)

Here’s the same plot with the three other color palettes:

for (pal in c("A", "B", "C")) {
    gh_waffle(d, pal)
}

Unless you’re plotting some really awful events, I think it’s best to stick with the default color palette. That’s it for today folks.

Further reading & references

comments powered by Disqus