The boxoffice()
function scrapes information about daily box office results of movies. It scrapes the webpages of either http://www.boxofficemojo.com or https://www.the-numbers.com/ for this information. The data it returns are the following:
In essence, it shows how well each movie performed on a given day.
movies <- boxoffice::boxoffice(date = as.Date("2015-10-31"))
dim(movies)
#> [1] 46 9
movies[1:5, ]
#> movie distributor gross percent_change theaters
#> 1 The Martian 20th Century Fox 4564809 31 3218
#> 2 Bridge of Spies Walt Disney 3588796 45 2873
#> 3 Goosebumps Sony Pictures 3326075 9 3618
#> 4 The Last Witch Hunter Lionsgate 2023321 36 3082
#> 5 Hotel Transylvania 2 Sony Pictures 1905762 7 2962
#> per_theater total_gross days date
#> 1 1419 179446657 30 2015-10-31
#> 2 1249 43200132 16 2015-10-31
#> 3 919 53277832 16 2015-10-31
#> 4 656 17377961 9 2015-10-31
#> 5 643 153858782 37 2015-10-31
There are three parameters for boxoffice()
: dates
, site
, and top_n
.
dates
are simply an input dates (in Date format) that you want to get information on. In accepts either a single date or a vector of dates. site
indicates which site you want to scrape: the-numbers.com or boxofficemojo.com. The accepted inputs are “numbers” which is the default site or “mojo”. Both sites are very similar and provide nearly identical results. All results are ordered in descending order by how much that movie made on that day. For example, the top selling movie of the day is the first value while the worst selling movie is the last value.
Note that the terms of use for boxofficemojo.com does not permit scraping without their written permission. If you do not have written permission, please ask them for it or change or only scrape from the-numbers.com.
Here is the first 10 movie names for both sites. We will use the top_n
parameter to only return the top 10 selling movies.
mojo <- boxoffice::boxoffice(dates = as.Date("2015-10-31"),
site = "mojo", top_n = 10)
#> The terms of use for boxofficemojo.com does not permit scraping without their written permission. If you do not have written permission, please ask them for it or change the site parameter to 'numbers' to use the-numbers.com which does not forbid scraping without permission.
numbers <- boxoffice::boxoffice(dates = as.Date("2015-10-31"),
site = "numbers", top_n = 10)
cbind(mojo[, c(1,3)], numbers[, c(1,3)])
#> movie gross
#> 1 The Martian 4564809
#> 2 Bridge of Spies 3588796
#> 3 Goosebumps 3326075
#> 4 The Last Witch Hunter 2023321
#> 5 Hotel Transylvania 2 1905762
#> 6 Burnt 1733927
#> 7 Paranormal Activity: The Ghost Dimension 1452089
#> 8 Crimson Peak 1393460
#> 9 Our Brand Is Crisis 1260523
#> 10 Steve Jobs 1021780
#> movie gross
#> 1 The Martian 4564809
#> 2 Bridge of Spies 3588796
#> 3 Goosebumps 3326075
#> 4 The Last Witch Hunter 2023321
#> 5 Hotel Transylvania 2 1905762
#> 6 Burnt 1733927
#> 7 Paranormal Activity: The Gh… 1452089
#> 8 Crimson Peak 1393460
#> 9 Our Brand is Crisis 1260523
#> 10 The Met: Live in HD - Tannh… 1150000
The results are close. Some movie name spellings and numbers are slightly different. In this case, the 10th ranking movie is also different between the sites. Situations like this are rare. When looking at more recent releases (e.g. within the last two weeks), there will be more differences. These differences will disappear (at least for the most part) as time goes on.