Introduction
You are reading the free online version of this book. If you’d like to purchase a physical or electronic copy, you can buy it from No Starch Press, Powell’s, Barnes and Noble or Amazon.
In early 2020, as the world struggled to contain the spread of COVID-19, one country succeeded where others did not: New Zealand. There are many reasons New Zealand was able to tackle COVID-19. One was the R programming language (yes, really).
This humble tool for data analysis helped New Zealand fight COVID-19 by enabling a Ministry of Health team to generate daily reports on cases throughout New Zealand. Based on the information in these reports, officials were able to develop policies that kept the country largely free of COVID-19. The team was small, however, so producing the reports every day with a tool like Excel wouldn’t have been feasible. As team leader Chris Knox told me, “Trying to do what we did in a point-and-click environment is not possible.”
Instead, a few staff members wrote R code that they could run every day to produce updated reports. These reports did not involve any complicated statistics; they were literally counts of COVID-19 cases. Their value came from everything else that R can do: data analysis and visualization, report creation, and workflow automation.
This book explores the many ways that people use R to communicate and automate tasks. You’ll learn how to do the following:
- Make professional-quality data visualizations, maps, and tables
- Replace a clunky multi-tool workflow to create reports with R Markdown
- Use parameterized reporting to generate multiple reports at once
- Produce slideshow presentations and websites using R Markdown
- Automate the process of importing online data from Google Sheets and the US Census Bureau
- Create your own functions to automate tasks you do repeatedly
- Bundle your functions into a package that you can share with others
Best of all, you’ll do all of this without performing any statistical analysis more complex than calculating averages.
Isn’t R Just for Statistical Analysis?
Many people think of R as simply a tool for hardcore statistical analysis, but it can do much more than manipulate numerical values. After all, every R user must illuminate their findings and communicate their results somehow, whether that’s via data visualizations, reports, websites, or presentations. Also, the more you use R, the more you’ll find yourself wanting to automate tasks you currently do manually.
As a qualitatively trained anthropologist without a quantitative background, I used to feel ashamed about using R for my visualization and communication tasks. But the fact is, R is good at these jobs. The ggplot2
package is the tool of choice for many top information designers. Users around the world have taken advantage of R’s ability to automate reporting to make their work more efficient. Rather than simply replacing other tools, R can perform tasks that you’re probably already doing, like generating reports and tables, better than your existing workflow.
Who This Book Is For
No matter your background, using R can transform your work. This book is for you if you’re either a current R user keen to explore its uses for visualization and communication or a non-R user wondering if R is right for you. I’ve written R for the Rest of Us so that it should make sense whether or not you’ve ever written a line of R code. But even if you’ve written entire R programs, the book should help you learn plenty of new techniques to up your game.
R is a great tool for anyone who works with data. Maybe you’re a researcher looking for a new way to share your results. Perhaps you’re a journalist looking to analyze public data more efficiently. Or maybe you’re a data analyst tired of working in expensive, proprietary tools. If you have to work with data, you will get value from R.
About This Book
Each chapter focuses on one use of the R language and includes examples of real R projects that employ the techniques covered. I’ll dive into the project code, breaking the programs down to help you understand how they work, and suggest ways of going beyond the example. The book has three parts, outlined here.
In Part I, you’ll learn how to use R to visualize data.
Chapter 1: An R Programming Crash Course Introduces the RStudio programming environment and the foundational R syntax you’ll need to understand the rest of the book.
Chapter 2: Principles of Data Visualization Breaks down a visualization created for Scientific American on drought conditions in the United States. In doing so, this chapter introduces the ggplot2 package for data visualization and addresses important principles that can help you make high-quality graphics.
Chapter 3: Custom Data Visualization Themes Describes how journalists at the BBC made a custom theme for the ggplot2 data visualization package. As the chapter walks you through the package they created, you’ll learn how to make your own theme.
Chapter 4: Maps and Geospatial Data Explores the process of making maps in R using simple features data. You’ll learn how to write map-making code, find geospatial data, choose appropriate projections, and apply data visualization principles to make your map appealing.
Chapter 5: Designing Effective Tables Shows you how to use the gt package to make high-quality tables in R. With guidance from R table connoisseur Tom Mock, you’ll learn the design principles to present your table data effectively.
Part II focuses on using R Markdown to communicate efficiently. You’ll learn how to incorporate visualizations like the ones discussed in Part I into reports, slideshow presentations, and static websites generated entirely using R code.
Chapter 6: R Markdown Reports Introduces R Markdown, a tool that allows you to generate a professional report in R. This chapter covers the structure of an R Markdown document, shows you how to use inline code to automatically update your report’s text when data values change, and discusses the tool’s many export options.
Chapter 7: Parameterized Reporting Covers one of the advantages of using R Markdown: the ability to produce multiple reports at the same time using a technique called parameterized reporting. You’ll see how staff members at the Urban Institute used R to generate fiscal briefs for all 50 US states. In the process, you’ll learn how parameterized reporting works and how you can use it.
Chapter 8: Slideshow Presentations Explains how to use R Markdown to make slides with the xaringan package. You’ll learn how to make your own presentations, adjust your content to fit on a slide, and add effects to your slideshow.
Chapter 9: Websites Shows you how to create your own website with R Markdown and the distill package. By examining a website about COVID-19 rates in Westchester County, New York, you’ll see how to create pages on your site, add interactivity through R packages, and deploy your website in multiple ways.
Chapter 10: Quarto Explains how to use Quarto, the next-generation version of R Markdown. You’ll learn how to use Quarto for all of the projects you previously used R Markdown for (reports, parameterized reporting, slideshow presentations, and websites).
Part III focuses on ways you can use R to automate your work and share it with others.
Chapter 11: Automatically Accessing Online Data Explores two R packages that let you automatically import data from the internet: googlesheets4 for working with Google Sheets and tidycensus for working with US Census Bureau data. You’ll learn how the packages work and how to use them to automate the process of accessing data.
Chapter 12: Creating Functions and Packages Shows you how to create your own functions and packages and share them with others, which is one of R’s major benefits. Bundling your custom functions into a package can enable other R users to streamline their work, as you’ll read about with the packages that a group of R developers built for researchers working at the Moffitt Cancer Center.
By the end of this book, you should be able to use R for a wide range of nonstatistical tasks. You’ll know how to effectively visualize data and communicate your findings using maps and tables. You’ll be able to integrate your results into reports using R Markdown, as well as efficiently generate slideshow presentations and websites. And you’ll understand how to automate many tedious tasks using packages others have built or ones you develop yourself. Let’s dive in!