Web Scraping in R

Date & Local Time: 2017-04-19 10:30:00 AM
Location: University of Virginia | Brown Science and Engineering Library, Room 133 38.0335529 -78.5079772
Website: http://data.library.virginia.edu/endangered-data-week/

Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. A recently-developed R package called rvest makes this process easier. In this workshop we’ll introduce how to use rvest for scraping web pages by way of several examples. We’ll also present a general strategy for web scraping and demonstrate some basic programming approaches to scraping multi-page web sites. Bring a laptop if you want to work along during the workshop.

Contact Ricky Patterson