Chapter 3 APIs

Application Program Interfaces are a great way to access data. Often if a website or a data source has an API that means you can request access and do not have to build a custom web scrapper. Often such scrapes are against the terms of service for website anyway.

3.1 What is it?

An API represents a way to ask for information to a program or a website. The “call” to the API must conform to specifications. The response from the application or webserver will then be returned in a table and repeatable format that you can then parse. A website is really designed for a user and not necessarily to tranfer information. An API on the other hand is designed to transfer information from the webserver where the data is held directly to your computer. This is the best of both worlds and a great way to collect large amounts of data, efficiently.

3.2 Knock, Knock

Typically websites require that you have an API key. This key is a unique indentifer for a user that allows the data holder the ability to track what and how much data you pull. Often times the key is free, but if you pull a lot of data or pull data too frequently then you have to pay for the pulls. For instance pulling data from the Google transit API is free up until a point, then you have to pay for the pulls. This typically isn’t a big deal for onetime pulls, but it is something that you should watch. IT is also a good idea to be a good citizen of the internet and not get too agressive with the API calls. Many times websites are not set up to handle huge traffic and if you call on the API for a lot of data you are equivalently performing a DDOS attack. This is not ideal and could get your API token revoked or your IP Address blocked. Neither of these things are good, so don’t do it!

3.3 Where Do I Find Them?

If a website has an API you will typically find them in the “for developers section.” This section will specify what APIs are available if at all, often what the data you can access through the API is and the terms of service.

I good example of such a form is on the US Census Bureau’s website available (here)[https://www.census.gov/developers/]. Here they specify what information is available through their API, terms of service and the rest of the required information.

3.4 How to use it

The package to use for dealing with APIs is typically the httr package. It allows you to use commands that an API is familar with. The tutorial below is inspired by Hadley Whickhams available (here) [https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html].

So let’s start with loading the correct library and setting up the initial url for the API. As a reminder with API calls you will basically direct your computer to an computer only website.

library(httr)

url <- modify_url("https://api.github.com", path = "/users/medewitt")

Typically the url for the API will specify a location for the API token. In this case one isn’t needed. If it were required the url string would look like:

API_key <- "AbcdEfG12345678"
base_url <- paste("https://api.github.com", API_KEY, sep = "/")
url <- modify_url("https://api.github.com", path = "/users/medewitt")

The next step is to send the URL to

library(httr)

github_api <- function(path) {
  url <- modify_url("https://api.github.com", path = path)
  
  resp <- GET(url)
  if (http_type(resp) != "application/json") {
    stop("API did not return json", call. = FALSE)
  }
  
  parsed <- jsonlite::fromJSON(content(resp, "text"), simplifyVector = FALSE)
  
  structure(
    list(
      content = parsed,
      path = path,
      response = resp
    ),
    class = "github_api"
  )
}

print.github_api <- function(x, ...) {
  cat("<GitHub ", x$path, ">\n", sep = "")
  str(x$content)
  invisible(x)
}

github_api("/users/medewitt")
## <GitHub /users/medewitt>
## List of 30
##  $ login              : chr "medewitt"
##  $ id                 : int 25038837
##  $ avatar_url         : chr "https://avatars0.githubusercontent.com/u/25038837?v=4"
##  $ gravatar_id        : chr ""
##  $ url                : chr "https://api.github.com/users/medewitt"
##  $ html_url           : chr "https://github.com/medewitt"
##  $ followers_url      : chr "https://api.github.com/users/medewitt/followers"
##  $ following_url      : chr "https://api.github.com/users/medewitt/following{/other_user}"
##  $ gists_url          : chr "https://api.github.com/users/medewitt/gists{/gist_id}"
##  $ starred_url        : chr "https://api.github.com/users/medewitt/starred{/owner}{/repo}"
##  $ subscriptions_url  : chr "https://api.github.com/users/medewitt/subscriptions"
##  $ organizations_url  : chr "https://api.github.com/users/medewitt/orgs"
##  $ repos_url          : chr "https://api.github.com/users/medewitt/repos"
##  $ events_url         : chr "https://api.github.com/users/medewitt/events{/privacy}"
##  $ received_events_url: chr "https://api.github.com/users/medewitt/received_events"
##  $ type               : chr "User"
##  $ site_admin         : logi FALSE
##  $ name               : chr "Michael DeWitt"
##  $ company            : NULL
##  $ blog               : chr ""
##  $ location           : NULL
##  $ email              : NULL
##  $ hireable           : logi TRUE
##  $ bio                : NULL
##  $ public_repos       : int 33
##  $ public_gists       : int 1
##  $ followers          : int 3
##  $ following          : int 17
##  $ created_at         : chr "2017-01-10T18:16:58Z"
##  $ updated_at         : chr "2018-04-10T17:38:05Z"

3.5 Commonly Used APIs

  • Google
    • Transit
    • Maps
    • CloudML
  • US Census Bureau
  • Twitter
  • Zillow
  • US Election Board
  • Amazon Web Services