The Junar API is the basis for a number of Open Data initiatives in Latin America and the USA. The junr
package is a wrapper to make it easier to access data made public through the Junar API. Some examples of implementations are: the City of Pasadena, and the City of San Jose. Others are listed on the Junar website.
As an example we will use the data from the Costa Rican President’s Office.The first step is to access the website offering the open data to identify the base URL and to obtain an API Key to get access to the Junar API that hosts the data. You will find both on the developers page of the Open Data Costa Rica site.
Below we use a test API Key so that all the examples will run. You may want to get your own API Key instead to run the examples below. Note that with Junar each URL has its own API key.
library(junr)
base_url <- "http://api.datosabiertos.presidencia.go.cr/api/v2/datastreams/"
api_key <- "0bd55e858409eefabc629b28b2e7916361ef20ff"
Now that we have the basic information for a connection we can quickly check what data is available behind this URL.
get_index(base_url, api_key)
The get_index
function returns the complete list of available data with all meta-data included as a data frame.
To get only a list of the global unique identifiers (GUID) of the data sets, you can use list_guid
.
list_guid(base_url, api_key)
## [1] "PLANI-DEL-MINIS" "DATOS-CORRE-AL-PAGO-DE"
## [3] "COMPR-PUBLI-DEL-MINIS" "LICIT-ADJUD-POR-LAS-81483"
## [5] "LICIT-ADJUD-POR-LOS-MINIS" "LICIT-ADJUD-POR-LAS-INSTI"
## [7] "LICIT-ADJUD-DE-LAS-INSTI" "DATOS-CORRE-AL-PAGO-32327"
## [9] "DESCR-DE-ABREV-DE-LAS" "EJECU-DE-PRESU-DE-50724"
## [11] "EJECU-DE-PRESU-DE-INSTI" "COMPR-PUBLI-DE-PRESI"
This has the benefit that a reference can be made to the GUID based on position number. For example:
pres_list <-list_guid(base_url, api_key)
pres_list[3]
## [1] "COMPR-PUBLI-DEL-MINIS"
And the same numbers can be used based on the full title with list_titles
.
list_titles(base_url, api_key)
## [1] "Ministerio de la Presidencia"
## [2] "Datos correspondientes al pago de planilla de Presidencia"
## [3] "Compras públicas del Ministerio de la Presidencia"
## [4] "Licitaciones adjudicadas por las Instituciones Públicas según tipo de trámite"
## [5] "Licitaciones adjudicadas por los Ministerios"
## [6] "Licitaciones adjudicadas por las Instituciones Públicas según año"
## [7] "Licitaciones Adjudicadas de las Instituciones Públicas para el período 2014-2015"
## [8] "Datos correspondientes al pago de planilla del Ministerio"
## [9] "Descripción de abreviaturas de las ejecuciones "
## [10] "Ejecución de presupuesto de Instituciones para el 2014"
## [11] "Ejecución de presupuesto de Instituciones para el 2015"
## [12] "Compras públicas de Presidencia"
Both list_guid
and list_titles
where set up for convenience only because the results tend to fit in the console window making it easier to read.
Obviously, if you have the GUID of the data that interest you, you can use this directly to make a call to read all the data. For example, to view all public purchasing of the presidential office in Costa Rica:
data_guid <- "COMPR-PUBLI-DEL-MINIS"
purchasing_data <- get_data(base_url, api_key, data_guid)
With View(purchasing_data)
you can check whether the data have been downloaded correctly, and have a quick visual check on the mode of the data (see below to convert currency data from text to numeric).
On data platforms that run Junar, many data sets are just tables of data that has already been analyzed and summarized. It is not immediately obvious which sets contain many data points, and which sets contain only a few rows.
The function get_dimensions
will download all data sets offered through the base URL and determine how many rows and columns are available in each one. It is useful to make a quick assessment of the data available. However, please note that it may take a while before the function finishes, especially if there are many GUID’s.
get_dimensions(base_url, api_key)
## GUID NROW NCOL DIM
## 2 PLANI-DEL-MINIS 5561 8 44488
## 21 DATOS-CORRE-AL-PAGO-DE 2472 10 24720
## 3 COMPR-PUBLI-DEL-MINIS 324 4 1296
## 4 LICIT-ADJUD-POR-LAS-81483 7 2 14
## 5 LICIT-ADJUD-POR-LOS-MINIS 10 2 20
## 6 LICIT-ADJUD-POR-LAS-INSTI 3 2 6
## 7 LICIT-ADJUD-DE-LAS-INSTI 103471 7 724297
## 8 DATOS-CORRE-AL-PAGO-32327 5561 10 55610
## 9 DESCR-DE-ABREV-DE-LAS 27 4 108
## 10 EJECU-DE-PRESU-DE-50724 9249 40 369960
## 11 EJECU-DE-PRESU-DE-INSTI 8867 39 345813
## 12 COMPR-PUBLI-DE-PRESI 427 4 1708
In the example data above, and possibly in more Junar implementations, we need to clean up any data related to currency values. In our case we need to found all currency symbols (Costa Rica Colon) and all the comma’s separating thousands. As they stand these values are text strings, and cannot be converted directly to numeric without removing the symbols and commas.
There are two utilities to help cleaning the currency data: clean_currency
and get_currency_symbol
. For example:
currency_data <- get_data(base_url, api_key, "LICIT-ADJUD-POR-LOS-MINIS")
currency_data$`Monto Adjudicado` <- clean_currency(currency_data$`Monto Adjudicado`)