Open access copies of scholarly publications are sometimes hard to find. Some are published in open access journals. Others are made freely available as preprints before publication, and others are deposited in institutional repositories, digital archives maintained by universities and research institutions. This document guides you to roadoi, a R client that makes it easy to search for these open access copies by interfacing the oaDOI.org service where DOIs are matched with full-text links in open access journals and archives.
oaDOI.org, developed and maintained by the team of Impactstory, is a non-profit service that finds open access copies of scholarly literature simply by looking up a DOI (Digital Object Identifier). It not only returns open access full-text links, but also helpful metadata about the open access status of a publication such as licensing or provenance information.
oaDOI uses different data sources to find open access full-texts including:
There is one major function to talk with oaDOI.org, oadoi_fetch()
, taking DOIs and your email address as required arguments.
library(roadoi)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
"10.1016/j.cognition.2014.07.007"),
email = "name@example.com")
## # A tibble: 2 x 22
## `_best_open_url`
## <chr>
## 1 http://doi.org/10.1186/s12864-016-2566-9
## 2 http://pubman.mpdl.mpg.de/pubman/item/escidoc:2070098/component/escidoc:207
## # ... with 21 more variables: `_closed_base_ids` <list>,
## # `_closed_urls` <list>, `_green_base_collections` <list>,
## # `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## # doi_resolver <chr>, evidence <chr>, found_green <lgl>,
## # found_hybrid <lgl>, free_fulltext_url <chr>, is_boai_license <lgl>,
## # is_free_to_read <lgl>, is_subscription_journal <lgl>, license <chr>,
## # oa_color <chr>, oa_color_long <chr>,
## # reported_noncompliant_copies <list>, url <chr>, version <lgl>,
## # year <int>
According to the oaDOI.org API specification, the following variables with the following definitions are returned:
_best_open_url
: Link to free full-textdoi
: the requested DOIdoi_resolver
: Possible values:
evidence
: A phrase summarizing the step of the open access detection process where the free_fulltext_url
was found.found_green
:logical indicating whether a self-archived copy in a repository was foundfound_hybrid
: logical indicating whether an open access article was published in a toll-access journalfree_fulltext_url
: The URL where we found a free-to-read version of the DOI. None when no free-to-read version was found.green_base_collections
: internal collection ID from the Bielefeld Academic Search Engine (BASE)is_boai_license
: TRUE whenever the license indications Creative Commons - Attribution (CC BY), Creative Commons CC - Universal(CC 0)) or Public Domain were found. These permissive licenses comply with the highly-regarded BOAI definition of Open accessis_free_to_read
: TRUE whenever the free_fulltext_url is not None.is_subscription_journal
: TRUE whenever the journal is not in the Directory of Open Access Journals or DataCite. Please note that there might be a time-lag between the first publication of an open access journal and its registration in the DOAJ.license
: Contains the name of the Creative Commons license associated with the free_fulltext_url
, whenever one was found. Example: “cc-by”.oa_color
: Possible values:
_open_base_ids
: ids of oai metadata records with open access full-text links collected by the Bielefeld Academic Search Engine (BASE)_open_urls
: full-text urlsreported_noncompliant_copies
links to free full-texts found provided by service often considered as non compliant with open access policies and guidelinesurl
: the canonical DOI URLyear
: year of publicationNote that fields to be returned might change according to the oaDOI.org API specs
There are no API restrictions. However, providing your email address when using this client is required by oaDOI.org. Set email address in your .Rprofile
file with the option roadoi_email
when you are too tired to type in your email address every time you want to call oadDOI.
options(roadoi_email = "name@example.com")
To follow your API call, and to estimate the time until completion, use the .progress
parameter inherited from plyr
to display a progress bar.
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
"10.1016/j.cognition.2014.07.007"),
email = "name@example.com",
.progress = "text")
##
|
| | 0%
|
|================================ | 50%
|
|=================================================================| 100%
## # A tibble: 2 x 22
## `_best_open_url`
## <chr>
## 1 http://doi.org/10.1186/s12864-016-2566-9
## 2 http://pubman.mpdl.mpg.de/pubman/item/escidoc:2070098/component/escidoc:207
## # ... with 21 more variables: `_closed_base_ids` <list>,
## # `_closed_urls` <list>, `_green_base_collections` <list>,
## # `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## # doi_resolver <chr>, evidence <chr>, found_green <lgl>,
## # found_hybrid <lgl>, free_fulltext_url <chr>, is_boai_license <lgl>,
## # is_free_to_read <lgl>, is_subscription_journal <lgl>, license <chr>,
## # oa_color <chr>, oa_color_long <chr>,
## # reported_noncompliant_copies <list>, url <chr>, version <lgl>,
## # year <int>
oaDOI is a reliable API. However, this client follows Hadley Wickham’s Best practices for writing an API package and throws an error when API does not return valid JSON or is not available. To catch these errors, you may want to use plyr’s failwith()
function
random_dois <- c("ldld", "10.1038/ng.3260", "§dldl ")
purrr::map_df(random_dois,
plyr::failwith(f = function(x) roadoi::oadoi_fetch(x, email ="name@example.com")))
## Warning: oaDOI request failed [404]
## 'ldld' is an invalid doi. See http://doi.org/ldld
## # A tibble: 1 x 22
## `_best_open_url`
## <chr>
## 1 https://dash.harvard.edu/bitstream/handle/1/25290367/mallet%202015%20polyte
## # ... with 21 more variables: `_closed_base_ids` <list>,
## # `_closed_urls` <list>, `_green_base_collections` <list>,
## # `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## # doi_resolver <chr>, evidence <chr>, found_green <lgl>,
## # found_hybrid <lgl>, free_fulltext_url <chr>, is_boai_license <lgl>,
## # is_free_to_read <lgl>, is_subscription_journal <lgl>, license <chr>,
## # oa_color <chr>, oa_color_long <chr>,
## # reported_noncompliant_copies <list>, url <chr>, version <lgl>,
## # year <int>
An increasing number of universities, research organisations and funders have launched open access policies in recent years. Using roadoi together with other R-packages makes it easy to examine how and to what extent researchers comply with these policies in a reproducible and transparent manner. In particular, the rcrossref package, maintained by rOpenSci, provides many helpful functions for this task.
DOIs have become essential for referencing scholarly publications, and thus many digital libraries and institutional databases keep track of these persistent identifiers. For the sake of this vignette, instead of starting with a pre-defined set of publications originating from these sources, we simply generate a random sample of 100 DOIs registered with Crossref by using the rcrossref package.
library(dplyr)
library(rcrossref)
# get a random sample of DOIs and metadata describing these works
random_dois <- rcrossref::cr_r(sample = 100) %>%
rcrossref::cr_works() %>%
.$data
random_dois
## # A tibble: 100 x 34
## alternative.id container.title created
## <chr> <chr> <chr>
## 1 Progress of Theoretical Physics Supplement 2007-12-13
## 2 45 Petroleum Science 2008-08-06
## 3 2016-09-27
## 4 Technometrics 2006-05-09
## 5 Shokubutsugaku Zasshi 2014-07-15
## 6 2016-08-30
## 7 Science 2002-07-27
## 8 ChemInform 2010-09-09
## 9 7629 Applied Physics A 2013-02-21
## 10 BF01535702 Genetica 2005-04-19
## # ... with 90 more rows, and 31 more variables: deposited <chr>,
## # DOI <chr>, funder <list>, indexed <chr>, ISBN <chr>, ISSN <chr>,
## # issued <chr>, link <list>, member <chr>, page <chr>, prefix <chr>,
## # publisher <chr>, reference.count <chr>, score <chr>, source <chr>,
## # subject <chr>, title <chr>, type <chr>, URL <chr>, volume <chr>,
## # assertion <list>, author <list>, `clinical-trial-number` <list>,
## # issue <chr>, license_date <chr>, license_URL <chr>,
## # license_delay.in.days <chr>, license_content.version <chr>,
## # update.policy <chr>, subtitle <chr>, archive <chr>
Let’s see when these random publications were published
random_dois %>%
# convert to years
mutate(issued, issued = lubridate::parse_date_time(issued, c('y', 'ymd', 'ym'))) %>%
mutate(issued, issued = lubridate::year(issued)) %>%
group_by(issued) %>%
summarize(pubs = n()) %>%
arrange(desc(pubs))
## # A tibble: 48 x 2
## issued pubs
## <dbl> <int>
## 1 2015 8
## 2 2016 7
## 3 2010 5
## 4 NA 5
## 5 2000 4
## 6 2005 4
## 7 2013 4
## 8 1984 3
## 9 1997 3
## 10 1998 3
## # ... with 38 more rows
and of what type they are
random_dois %>%
group_by(type) %>%
summarize(pubs = n()) %>%
arrange(desc(pubs))
## # A tibble: 7 x 2
## type pubs
## <chr> <int>
## 1 journal-article 75
## 2 book-chapter 14
## 3 proceedings-article 5
## 4 component 2
## 5 report 2
## 6 book 1
## 7 journal-issue 1
Now let’s call oaDOI.org
oa_df <- roadoi::oadoi_fetch(dois = random_dois$DOI, email = "name@example.com")
## Warning: oaDOI request failed [404]
## '10.4028/0-87849-436-7.813' is an invalid doi. See http://doi.org/10.4028/0-87849-436-7.813
and merge the resulting information about open access full-text links with our Crossref metadata-set
my_df <- dplyr::left_join(oa_df, random_dois, by = c("doi" = "DOI"))
my_df
## # A tibble: 99 x 55
## `_best_open_url` `_closed_base_ids`
## <chr> <list>
## 1 http://arxiv.org/pdf/hep-ph/9612217v1.pdf <list [0]>
## 2 <NA> <list [0]>
## 3 http://doi.org/10.1371/journal.ppat.1005883.g005 <list [0]>
## 4 <NA> <list [0]>
## 5 <NA> <list [0]>
## 6 <NA> <list [0]>
## 7 <NA> <list [0]>
## 8 <NA> <list [0]>
## 9 <NA> <list [0]>
## 10 <NA> <list [0]>
## # ... with 89 more rows, and 53 more variables: `_closed_urls` <list>,
## # `_green_base_collections` <list>, `_open_base_ids` <list>,
## # `_open_urls` <list>, doi <chr>, doi_resolver <chr>, evidence <chr>,
## # found_green <lgl>, found_hybrid <lgl>, free_fulltext_url <chr>,
## # is_boai_license <lgl>, is_free_to_read <lgl>,
## # is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
## # oa_color_long <chr>, reported_noncompliant_copies <list>, url <chr>,
## # version <lgl>, year <int>, alternative.id <chr>,
## # container.title <chr>, created <chr>, deposited <chr>, funder <list>,
## # indexed <chr>, ISBN <chr>, ISSN <chr>, issued <chr>, link <list>,
## # member <chr>, page <chr>, prefix <chr>, publisher <chr>,
## # reference.count <chr>, score <chr>, source <chr>, subject <chr>,
## # title <chr>, type <chr>, URL <chr>, volume <chr>, assertion <list>,
## # author <list>, `clinical-trial-number` <list>, issue <chr>,
## # license_date <chr>, license_URL <chr>, license_delay.in.days <chr>,
## # license_content.version <chr>, update.policy <chr>, subtitle <chr>,
## # archive <chr>
After gathering the data, reporting with R is very straightforward. You can even generate dynamic reports using R Markdown and related packages, thus making your study reproducible and transparent for others.
To display how many full-text links were found and which sources were used in a nicely formatted markdown-table using the knitr
-package:
my_df %>%
group_by(evidence) %>%
summarise(Articles = n()) %>%
mutate(Proportion = Articles / sum(Articles)) %>%
arrange(desc(Articles)) %>%
knitr::kable()
evidence | Articles | Proportion |
---|---|---|
closed | 86 | 0.8686869 |
oa repository (via BASE) | 6 | 0.0606061 |
oa journal (via journal title in doaj) | 3 | 0.0303030 |
oa journal (via publisher name) | 2 | 0.0202020 |
oa repository (via pmcid lookup) | 2 | 0.0202020 |
How many of them are provided as green or gold open access?
my_df %>%
group_by(oa_color) %>%
summarise(Articles = n()) %>%
mutate(Proportion = Articles / sum(Articles)) %>%
arrange(desc(Articles)) %>%
knitr::kable()
oa_color | Articles | Proportion |
---|---|---|
NA | 86 | 0.8686869 |
green | 8 | 0.0808081 |
gold | 5 | 0.0505051 |
Let’s take a closer look and assess how green and gold is distributed over publication types?
my_df %>%
filter(!evidence == "closed") %>%
count(oa_color, type, sort = TRUE) %>%
knitr::kable()
oa_color | type | n |
---|---|---|
green | journal-article | 7 |
gold | journal-article | 3 |
gold | component | 2 |
green | book-chapter | 1 |