The weatherjoin package attaches gridded weather data to event-based datasets in a reliable, efficient, and reproducible way.
Typical use cases include: - adding air temperature or precipitation to experimental observations, - linking weather data to monitoring events, - enriching spatial point data with meteorological context.
The package is designed around four core principles:
Currently, weatherjoin supports the NASA
POWER data service via the {nasapower} package.
This package is not affiliated with or endorsed by NASA.
At minimum, you need:
library(weatherjoin)
out <- join_weather(
x = events,
params = c("T2M", "PRECTOTCORR"),
time = "event_time",
lat_col = "lat",
lon_col = "lon"
)The result is the original table with weather variables appended.
weatherjoin always forms requests to NASA POWER using UTC timestamps. Your input time is interpreted using the tz argument, then standardised internally to UTC for planning, caching, and joining.
tz is the timezone used to interpret your event time input.
If your event timestamps are recorded in local clock time (for example UK time), set:
weatherjoin will interpret them as Europe/London and convert internally to UTC before matching with POWER data.
The time argument may refer to a single column containing any of:
Examples:
join_weather(x, params = "T2M", time = "event_time") # POSIXct or character
join_weather(x, params = "T2M", time = "event_date") # Date
join_weather(x, params = "T2M", time = "event_yyyymmdd")# numeric YYYYMMDDIf hourly weather is requested, hour-level information must be present: if you request hourly weather but provide only a date (no hour information), weatherjoin will raise an error.
You can also provide multiple columns, which weatherjoin will assemble into a timestamp. Supported schemas include:
Example:
Column roles are inferred from names (e.g. YEAR,
MO, DY, HR, DOY) and
validated:
Invalid inputs always produce informative errors.
The time_api argument controls whether daily or hourly POWER data are used:
Rules are explicit:
This avoids silent misinterpretation of temporal resolution.
Daily POWER data have no time-of-day. When constructing timestamps for daily data, weatherjoin assigns a configurable “dummy hour” (default: 12:00) to ensure consistent internal handling.
Advanced users can change this via:
This does not change the meaning of daily weather values; it only affects the internally constructed timestamp used for planning and joining.
Weather data are provided on a coarse spatial grid. When many nearby points are present, requesting data separately for each location would be pointless and inefficient, given the spatial coarseness of the NASA POWER data.
weatherjoin therefore uses spatial reduction by
default before calling the provider. Each group is reduced to a
representative location (centroid; can be changed to
median via options), and weather data are fetched once per group.
This behaviour is controlled by the spatial_mode argument:
cluster (default) Nearby points are clustered within
a user-defined radius (controlled by cluster_radius_m), and
one representative location is used per cluster. Larger values result in
fewer representative locations, although it depends on the shape of the
groups. The default radius is 250 m, which is suitable for election of a
single representative point per (e.g.) a field experimental site. Sanity
checks ensure that clustering is intentional and safe.
by_group Points are grouped by a user-supplied
variable (e.g. site or field), and one representative location per group
is used.
exact Each unique coordinate is queried separately.
This can result in a very large number of API calls.
Example using grouping:
Event data can contain large time gaps (e.g. a few observations in 2010 and a few in 2024). Downloading continuous weather data for the entire span would be wasteful.
weatherjoin detects such gaps and splits
requests into multiple time windows:
split_penalty_hours) trigger
a split.This dramatically reduces:
Advanced users can tune this behaviour via options:
Automatic, transparent caching is done to avoid multiple calls to API. Downloaded data segments are indexed by:
Segments are reused whenever they cover a new request.
Two scopes are supported:
Project-level cache: stored in a .weatherjoin/ directory
inside the project. This is useful for reproducible analyses and shared
projects.
You can control this via:
or provide an explicit directory via cache_dir.
Most users can ignore cache policy settings. For advanced control, weatherjoin reads:
Elevation is resolved per representative location, not per event row, and becomes part of the cache identity.
Supported modes:
site_elevation = “constant” A fixed elevation
(elev_constant) is used for all locations.
site_elevation = “auto” If elev_fun is
supplied, it is called as and must return elevation in meters.
If elev_fun is not supplied, weatherjoin falls back to
elev_constant and issues a warning.
Example:
Weather values are joined to events using:
Rolling joins are controlled by:
This ensures that weather values are not attached from implausibly distant timestamps.
Rows with missing latitude, longitude, or time are retained in the output:
NA,This design avoids accidental row loss and keeps joins explicit.
weatherjoin aims to make weather data attachment:
Most users need only a single function call, while advanced configuration remains available via options. Use withr::local_options() for temporary changes inside scripts or reports.