Casual readers of this blog will know that I’ve been using R (a statistical programming language) to sort out some data and plot some graphs. It’s a minor dabbling, and one which is often frustrating. I’m not a natural coder, and I certainly don’t relish the challenges it throws up. But, having used R on and off for a few years now, I’m starting to get a grip on a few things.
Take, for example, a diagram I produced in R for the upcoming GHGT-12 conference, Figure 1. I’ve deliberately taken off anything which might say what the data relates to; partly because it’s not relevant, and partly to keep things under wraps until the conference paper is published in a few months. Not that it’s of great importance, mind you, but at this point in my PhD I’m not sure which cats I can let out the bag yet.
Anyway, we can see that it’s a bunch of histograms of three sets of concentration data (green, purple, pink), subdivided by some other properties using the facet_grid() function in ggplot. Using facet_grid() is very straightforward: you set up an individual plot, and then use facet_grid() to split your data up based on one or two variables, e.g. see a snippet of code below:
library(ggplot2) #for plotting graph library(scales) #for log scale labelling #Sets the resolution of the printed PNG image file ppi=600 #sets up to plot as PNG image png("Facet Histogram.png", width=7.2*ppi, height=7.2*ppi, res=ppi) #plot function. Note variable1 is a substitute for what I actually called it ggplot(filename, aes(x=Concentration, fill=variable1)) + geom_histogram(position="identity", binwidth=0.25) + scale_fill_manual(values=c("#16c922", "#e587ff", "#c724f4"), labels=c("1", "2", "3")) + #I removed these labels from the diagram #scales the x axis on a log 10 scale, and formats the numbers with superscript powers scale_x_log10(breaks=trans_breaks("log10", function(x) 10^x), labels=trans_format("log10", math_format(10^.x))) + #The magic facet_grid function, variable names are substitutes facet_grid(variable2 ~ variable3) + ylab("Counts") + theme_bw() + #My own theme settings, not included here theme_ghgt12 dev.off()
So, being all pleased with the output, my supervisor had a look and suggested that I should add an indicator of the analytical detection limits, since there are lots of values in the data which sit at these limits. “Bugger“, was my first thought. “I’m going to have to manually draw in 27 different lines or arrows” was the second. But then I remembered that I can use ggplot to draw vertical lines using geom_vline(), and that the lines can be given values based on the data in a file. So, the moment of satisfaction for me was working out how to add that data in, and ensuring the lines plotted out in the facet. So long as the new added data (as a new data frame) has the same column names as those used to plot the facet, then job’s a good ‘un. You only need to add the geom_vline() in to the histogram plot code, and the facet_grid() function takes care of the rest, Figure 2. The code is below the figure.
#I've chopped off the top and bottom parts to show where the geom_vline was inserted scale_x_log10(breaks=trans_breaks("log10", function(x) 10^x), labels=trans_format("log10", math_format(10^.x))) + #In it goes... geom_vline(aes(xintercept=LOD), #adds vertical dotted line with detection limit data=filename2, linetype="dotted", size=0.4) + #The lod data frame has the same variable names as previously used facet_grid(variable2 ~ variable3) + #etc.
So it turns out this was very satisfying for me, mostly because it shows that I’m actually learning something about coding, even if it’s something fairly simple and nondescript!