| Title: | Accessing and Validating Marine Environmental Data from 'SHARK' and Related Databases |
| Version: | 1.0.1 |
| Description: | Provides functions to retrieve, process, analyze, and quality-control marine physical, chemical, and biological data. The main focus is on Swedish monitoring data available through the 'SHARK' database https://shark.smhi.se/en/, with additional API support for 'Nordic Microalgae' https://nordicmicroalgae.org/, 'Dyntaxa' https://artfakta.se/, World Register of Marine Species ('WoRMS') https://www.marinespecies.org, 'AlgaeBase' https://www.algaebase.org, OBIS 'xylookup' web service https://iobis.github.io/xylookup/ and Intergovernmental Oceanographic Commission (IOC) - UNESCO databases on harmful algae https://www.marinespecies.org/hab/ and toxins https://toxins.hais.ioc-unesco.org/. |
| License: | MIT + file LICENSE |
| URL: | https://sharksmhi.github.io/SHARK4R/, https://github.com/sharksmhi/SHARK4R |
| BugReports: | https://github.com/sharksmhi/SHARK4R/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | dplyr, DT, ggplot2, httr, jsonlite, leaflet, lifecycle, purrr, readr, readxl, rlang, sf, sp, stringi, terra, tidyr, vroom, worrms |
| Suggests: | htmltools, iRfcb, knitr, plotly, RColorBrewer, rmarkdown, skimr, spelling, shiny, shinythemes, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2025-12-02 22:56:45 UTC; anders |
| Author: | Markus Lindh |
| Maintainer: | Anders Torstensson <anders.torstensson@smhi.se> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-09 08:50:07 UTC |
SHARK4R: Accessing and Validating Marine Environmental Data from 'SHARK' and Related Databases
Description
Provides functions to retrieve, process, analyze, and quality-control marine physical, chemical, and biological data. The main focus is on Swedish monitoring data available through the 'SHARK' database https://shark.smhi.se/en/, with additional API support for 'Nordic Microalgae' https://nordicmicroalgae.org/, 'Dyntaxa' https://artfakta.se/, World Register of Marine Species ('WoRMS') https://www.marinespecies.org, 'AlgaeBase' https://www.algaebase.org, OBIS 'xylookup' web service https://iobis.github.io/xylookup/ and Intergovernmental Oceanographic Commission (IOC) - UNESCO databases on harmful algae https://www.marinespecies.org/hab/ and toxins https://toxins.hais.ioc-unesco.org/.
Author(s)
Maintainer: Anders Torstensson anders.torstensson@smhi.se (ORCID) (Swedish Meteorological and Hydrological Institute)
Authors:
Markus Lindh (ORCID) (Swedish Meteorological and Hydrological Institute)
Other contributors:
Mikael Hedblom (ORCID) (Swedish Meteorological and Hydrological Institute) [contributor]
Bengt Karlson (ORCID) (Swedish Meteorological and Hydrological Institute) [contributor]
SHARK shark@smhi.se [copyright holder]
SBDI (Swedish Research Council, 2019-00242) [funder]
See Also
Useful links:
Report bugs at https://github.com/sharksmhi/SHARK4R/issues
Add WoRMS taxonomy hierarchy to AphiaIDs or scientific names
Description
This function enhances a dataset of AphiaIDs (and optionally scientific names) with their complete hierarchical taxonomy from the World Register of Marine Species (WoRMS). Missing AphiaIDs can be resolved from scientific names automatically.
Usage
add_worms_taxonomy(
aphia_ids,
scientific_names = NULL,
add_rank_to_hierarchy = FALSE,
verbose = TRUE,
aphia_id = deprecated(),
scientific_name = deprecated()
)
Arguments
aphia_ids |
Numeric vector of AphiaIDs. |
scientific_names |
Optional character vector of scientific names (same length as |
add_rank_to_hierarchy |
Logical (default FALSE). If TRUE, includes rank labels in the concatenated hierarchy string. |
verbose |
Logical (default TRUE). If TRUE, prints progress updates. |
aphia_id |
|
scientific_name |
Value
A tibble with taxonomy columns added, including:
-
aphia_id,scientific_name -
worms_kingdom,worms_phylum,worms_class,worms_order,worms_family,worms_genus,worms_species -
worms_scientific_name,worms_hierarchy
Examples
# Using AphiaID only
add_worms_taxonomy(c(1080, 109604), verbose = FALSE)
# Using a combination of AphiaID and scientific name
add_worms_taxonomy(
aphia_ids = c(NA, 109604),
scientific_names = c("Calanus finmarchicus", "Oithona similis"),
verbose = FALSE
)
Assign phytoplankton group to scientific names
Description
This function assigns default phytoplankton groups (Diatoms, Dinoflagellates, Cyanobacteria, or Other)
to a list of scientific names or Aphia IDs by retrieving species information from the
World Register of Marine Species (WoRMS). The function checks both Aphia IDs and scientific names,
handles missing records, and assigns the appropriate plankton group based on taxonomic classification in WoRMS.
Additionally, custom plankton groups can be specified using the custom_groups parameter,
allowing users to define additional classifications based on specific taxonomic criteria.
Usage
assign_phytoplankton_group(
scientific_names,
aphia_ids = NULL,
diatom_class = c("Bacillariophyceae", "Coscinodiscophyceae", "Mediophyceae",
"Diatomophyceae"),
dinoflagellate_class = "Dinophyceae",
cyanobacteria_class = "Cyanophyceae",
cyanobacteria_phylum = "Cyanobacteria",
match_first_word = TRUE,
marine_only = FALSE,
return_class = FALSE,
custom_groups = list(),
verbose = TRUE
)
Arguments
scientific_names |
A character vector of scientific names of marine species. |
aphia_ids |
A numeric vector of Aphia IDs corresponding to the scientific names. If provided, it improves the accuracy and speed of the matching process. The length of |
diatom_class |
A character string or vector representing the diatom class. Default is "Bacillariophyceae", "Coscinodiscophyceae", "Mediophyceae" and "Diatomophyceae". |
dinoflagellate_class |
A character string or vector representing the dinoflagellate class. Default is "Dinophyceae". |
cyanobacteria_class |
A character string or vector representing the cyanobacteria class. Default is "Cyanophyceae". |
cyanobacteria_phylum |
A character string or vector representing the cyanobacteria phylum. Default is "Cyanobacteria". |
match_first_word |
A logical value indicating whether to match the first word of the scientific name if the Aphia ID is missing. Default is TRUE. |
marine_only |
A logical value indicating whether to restrict the results to marine taxa only. Default is |
return_class |
A logical value indicating whether to include class information in the result. Default is |
custom_groups |
A named list of additional custom plankton groups (optional). The names of the list correspond to the custom group names (e.g., "Cryptophytes"), and the values should be character vectors specifying one or more of the following taxonomic levels: |
verbose |
A logical value indicating whether to print progress messages. Default is TRUE. |
Details
The aphia_ids parameter is not necessary but, if provided, will improve the certainty of the
matching process. If aphia_ids are available, they will be used directly to retrieve more accurate
WoRMS records. If missing, the function will attempt to match the scientific names to Aphia IDs by
querying WoRMS using the scientific name(s), with an additional fallback mechanism to match based on the
first word of the scientific name.
To skip one of the default plankton groups, you can set the class or phylum of the respective group to an empty string ("").
For example, to skip the "Cyanobacteria" group, you can set cyanobacteria_class = "" or cyanobacteria_phylum = "". These
taxa will then be placed in Others.
Custom groups are processed in the order they appear in the custom_groups list. If a taxon matches
multiple custom groups, it will be assigned to the group that appears last in the list, as later matches
overwrite earlier ones. For example, if Teleaulax amphioxeia matches both Cryptophytes (class-based)
and a specific group Teleaulax (name-based), it will be assigned to Teleaulax if Teleaulax is listed after
Cryptophytes in the custom_groups list.
Value
A tibble with two columns: scientific_name and plankton_group, where the plankton group is assigned based on taxonomic classification.
See Also
https://marinespecies.org/ for WoRMS website.
https://CRAN.R-project.org/package=worrms
Examples
# Assign plankton groups to a list of species
result <- assign_phytoplankton_group(
scientific_names = c("Tripos fusus", "Diatoma", "Nodularia spumigena", "Octactis speculum"),
aphia_ids = c(840626, 149013, 160566, NA), verbose = FALSE)
print(result)
# Assign plankton groups using additional custom grouping
custom_groups <- list(
Cryptophytes = list(class = "Cryptophyceae"),
Ciliates = list(phylum = "Ciliophora")
)
# Assign with custom groups
result_custom <- assign_phytoplankton_group(
scientific_names = c("Teleaulax amphioxeia", "Mesodinium rubrum", "Dinophysis acuta"),
aphia_ids = c(106306, 232069, 109604),
custom_groups = custom_groups, # Adding custom groups
verbose = FALSE
)
print(result_custom)
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Bacterioplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Chlorophyll(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Epibenthos(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_EpibenthosDropvideo(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_GreySeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_HarbourPorpoise(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_HarbourSeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_PhysicalChemical(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Phytoplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Picoplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_PrimaryProduction(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_RingedSeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_SealPathology(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Sedimentation(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Zoobenthos(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_Zooplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_bacterial_carbon(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_bacterial_concentration(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_bacterial_production(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_chlorophyll_conc(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check matches of reported codes in SMHI's SHARK codelist
Description
This function is deprecated and has been replaced by
check_codes().
Usage
check_code_proj(data, field = "sample_project_name_sv", clean_cache_days = 30)
Arguments
data |
for tibble be be checked |
field |
Character; name of the column in |
clean_cache_days |
Numeric; if not NULL, cached SHARK code Excel files older than this number of days will be automatically deleted and be replaced by a new download. Defaults to 30. Set to NULL to disable automatic cleanup. |
Value
unmatched codes with true or false results
See Also
get_shark_codes() to get the current code list.
Check matches of reported codes in SMHI's SHARK codelist
Description
This function checks whether the codes reported in a specified column of a
dataset (e.g., project codes, ship codes, etc.) are present in the
official SHARK codelist provided by SMHI. If a cell contains multiple codes
separated by commas, each code is checked individually. The function downloads
and caches the codelist if necessary, compares the reported values against
the valid codes, and returns a tibble showing which codes matched.
Informative messages are printed if unmatched codes are found.
Usage
check_codes(
data,
field = "sample_project_name_en",
code_type = "PROJ",
match_column = "Description/English translate",
clean_cache_days = 30,
verbose = TRUE
)
Arguments
data |
A tibble (or data.frame) containing the codes to check. |
field |
Character; name of the column in |
code_type |
Character; the type of code to check (e.g., |
match_column |
Character; the column in the SHARK codelist to match
against. Must be one of |
clean_cache_days |
Numeric; if not |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Value
A tibble with unique reported codes (after splitting comma-separated
entries) and a logical column match_type indicating if they exist in the
SHARK codelist.
See Also
get_shark_codes() to get the current code list.
clean_shark4r_cache() to manually clear cached files.
Validate SHARK system fields in a data frame
Description
This function checks whether the required and recommended global and datatype-specific SHARK system fields are present in a data frame.
Usage
check_datatype(data, level = "error")
Arguments
data |
A |
level |
Character. The level of validation:
|
Details
-
Required fields: Missing or empty required fields are reported as errors.
-
Recommended fields: Missing or empty recommended fields are reported as warnings, but only if
level = "warning"is specified.
Value
A tibble summarizing missing or empty fields, with columns:
-
level:"error"or"warning". -
field: Name of the missing or empty field. -
row: Row number where the value is missing (NA) orNAif the whole column is missing. -
message: Description of the issue.
Examples
# Example with required fields missing
df <- data.frame(
visit_year = 2024,
station_name = NA
)
check_datatype(df, level = "error")
# Example checking recommended fields as warnings
check_datatype(df, level = "warning")
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Bacterioplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Chlorophyll(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Epibenthos(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_EpibenthosDropvideo(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_GreySeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_HarbourPorpoise(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_HarbourSeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_PhysicalChemical(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Phytoplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Picoplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_PrimaryProduction(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_RingedSeal(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_SealPathology(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Sedimentation(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Zoobenthos(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Check if the required and recommended datatype-specific SHARK system fields are present
Description
This function is deprecated and has been replaced by
check_fields().
Usage
check_deliv_Zooplankton(data, level = "error")
Arguments
data |
The data frame. |
level |
The level of error reporting, i.e. "error" or "warning". Recommended fields are only checked in case of "warning". |
Value
Any warnings or errors.
Validate depth values against bathymetry and logical constraints
Description
check_depth() inspects one or two depth columns in a dataset and reports
potential problems such as missing values, non-numeric entries, or values
that conflict with bathymetry and shoreline information. It can also
validate depths against bathymetry data retrieved from a terra::SpatRaster
object or, if bathymetry = NULL, via the lookup_xy() function, which calls
the OBIS XY lookup API to obtain bathymetry (using EMODnet Bathymetry) and shore distance.
Usage
check_depth(
data,
depth_cols = c("sample_min_depth_m", "sample_max_depth_m"),
lat_col = "sample_latitude_dd",
lon_col = "sample_longitude_dd",
report = TRUE,
depthmargin = 0,
shoremargin = NA,
bathymetry = NULL
)
Arguments
data |
A data frame containing sample metadata, including longitude, latitude, and one or two depth columns. |
depth_cols |
Character vector naming the depth column(s). Can be one
column (e.g., |
lat_col |
Name of the column containing latitude values. Default:
|
lon_col |
Name of the column containing longitude values. Default:
|
report |
Logical. If |
depthmargin |
Numeric. Allowed deviation (in meters) above bathymetry
before a depth is flagged as an error. Default = |
shoremargin |
Numeric. Minimum offshore distance (in meters) required
for negative depths to be considered valid. If |
bathymetry |
Optional terra::SpatRaster object with one layer giving
bathymetry values. If |
Details
The following checks are performed:
-
Missing depth column → warning
-
Empty depth column (all values missing) → warning
-
Non-numeric depth values → warning
-
Depth exceeds bathymetry + margin (
depthmargin) → warning -
Negative depth at offshore locations (beyond
shoremargin) → warning -
Minimum depth greater than maximum depth (if two columns supplied) → error
-
Longitude/latitude outside raster bounds → warning
-
Missing bathymetry value at coordinate → warning
The function has been modified from the obistools package (Provoost and Bosch, 2024).
Value
A tibble with one row per detected problem, containing:
- level
Severity of the issue ("warning" or "error").
- row
Row index in the input data where the issue occurred.
- field
Name of the column(s) involved.
- message
Human-readable description of the problem.
If report = FALSE, returns the subset of input rows that failed any check.
References
Provoost P, Bosch S (2024). “obistools: Tools for data enhancement and quality control” Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. R package version 0.1.0, https://iobis.github.io/obistools/.
See Also
Examples
# Example dataset with one depth column
example_data <- data.frame(
sample_latitude_dd = c(59.3, 58.1, 57.5),
sample_longitude_dd = c(18.0, 17.5, 16.2),
sample_depth_m = c(10, -5, NA)
)
# Validate depths using OBIS XY lookup (bathymetry = NULL)
check_depth(example_data, depth_cols = "sample_depth_m")
# Example dataset with min/max depth columns
example_data2 <- data.frame(
sample_latitude_dd = c(59.0, 58.5),
sample_longitude_dd = c(18.0, 17.5),
sample_min_depth_m = c(5, 15),
sample_max_depth_m = c(3, 20)
)
check_depth(example_data2, depth_cols = c("sample_min_depth_m", "sample_max_depth_m"))
# Return only failing rows
check_depth(example_data, depth_cols = "sample_depth_m", report = FALSE)
Check if Abundance class exceeds 10
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_abundclass_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_epibenthos_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check if Epibenthos cover exceeds 100%
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_cover_logical(data, return_df = FALSE, return_logical = FALSE)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Check if Epibenthos cover class exceeds 10
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_coverclass_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Check if Epibenthos cover (%) exceeds 100%
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_coverpercent_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_epibenthos_dryweight(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check if Sediment deposition cover (%) exceeds 100%
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_sedimentdepos_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_epibenthos_specdistr_maxdepth(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_epibenthos_specdistr_mindepth(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check if Epibenthos total cover exceeds 100%
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_epibenthos_totcover_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Validate SHARK data fields for a given datatype
Description
This function checks a SHARK data frame against the required and recommended
fields defined for a specific datatype. It verifies that all required fields
are present and contain non-empty values. If level = "warning", it
also checks for recommended fields and empty values within them.
Usage
check_fields(
data,
datatype,
level = "error",
stars = 1,
bacterioplankton_subtype = "abundance",
field_definitions = .field_definitions
)
Arguments
data |
A data frame containing SHARK data to be validated. |
datatype |
A string giving the SHARK datatype to validate against.
Must exist as a name in the provided |
level |
Character string, either |
stars |
Integer. Maximum number of "" levels to include.
Default = 1 (only single "").
For example, |
bacterioplankton_subtype |
Character. For "Bacterioplankton" only: either "abundance" (default) or "production". Ignored for other datatypes. |
field_definitions |
A named list of field definitions. Each element
should contain two character vectors: |
Details
Note: A single "*" marks required fields in the standard SHARK template. A double "**" is often used to specify columns required for national monitoring only. For more information, see: https://www.smhi.se/data/hav-och-havsmiljo/datavardskap-oceanografi-och-marinbiologi/leverera-data
Field definitions for SHARK data can be loaded in two ways:
-
From the SHARK4R package bundle (default): The package contains a built-in object,
.field_definitions, which stores required and recommended fields for each datatype. -
From GitHub (latest official version): To use the most up-to-date field definitions, you can load them directly from the SHARK4R-statistics repository:
defs <- load_shark4r_fields() check_fields(my_data, "Phytoplankton", field_definitions = defs)
Delivery-format (all-caps) data:
If the column names in data are all uppercase (e.g. SDATE), check_fields() assumes
the dataset follows the official SHARK delivery template. In this case:
Required fields are determined from the delivery template using
get_delivery_template()andfind_required_fields().Recommended fields are ignored because the delivery templates do not define them.
The function validates that all required columns exist and contain non-empty values.
This ensures that both internal SHARK4R datasets (with camelCase or snake_case columns)
and official delivery files (ALL_CAPS columns) are validated correctly using the appropriate rules.
Stars in the template
Leading asterisks in the delivery template indicate required levels:
-
* = standard required column
* = required for national monitoring
Other symbols = additional requirement level
The stars parameter in check_fields() controls how many levels of required
columns to include.
Value
A tibble with the following columns:
- level
Either
"error"or"warning".- field
The name of the field that triggered the check.
- row
Row number(s) in
datawhere the issue occurred, orNAif the whole field is missing.- message
A descriptive message explaining the problem.
The tibble will be empty if no problems are found.
See Also
load_shark4r_fields for fetching the latest field definitions from GitHub,
get_delivery_template for downloading delivery templates from SMHI's website.
Examples
# Example 1: Using built-in field definitions for "Phytoplankton"
df_phyto <- data.frame(
visit_date = "2023-06-01",
sample_id = "S1",
scientific_name = "Skeletonema marinoi",
value = 123
)
# Check fields
check_fields(df_phyto, "Phytoplankton", level = "warning")
# Example 2: Load latest definitions from GitHub and use them
defs <- load_shark4r_fields(verbose = FALSE)
# Check fields using loaded field definitions
check_fields(df_phyto, "Phytoplankton", field_definitions = defs)
# Example 3: Custom datatype with required + recommended fields
defs <- list(
ExampleType = list(
required = c("id", "value"),
recommended = "comment"
)
)
# Example data
df_ok <- data.frame(id = 1, value = "x", comment = "ok")
# Check fields using custom field definitions
check_fields(df_ok, "ExampleType", level = "warning", field_definitions = defs)
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_greyseal_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_harbourseal_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_harbporp_positivemin(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
General checker for parameter-specific logical rules
Description
This function checks for logical rule violations in benthos/epibenthos data
by applying a user-defined condition to values for a given parameter.
It is intended to replace the old family of check_*_*_logical() functions.
Usage
check_logical_parameter(
data,
param_name,
condition,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
param_name |
Character; the name of the parameter to check. |
condition |
A function that takes a numeric vector of values and returns a logical vector (TRUE for rows considered problematic). |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data). Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Examples
# Example dataset
df <- dplyr::tibble(
station_name = c("A1", "A2", "A3", "A4"),
sample_date = as.Date("2023-05-01") + 0:3,
sample_id = 101:104,
parameter = c("Biomass", "Biomass", "Abundance", "Biomass"),
value = c(5, -2, 10, 0)
)
# 1. Check that Biomass is never negative
check_logical_parameter(df, "Biomass", function(x) x < 0, return_df = TRUE)
# 2. Same check, but return problematic rows as a data frame
check_logical_parameter(df, "Biomass", function(x) x < 0, return_df = TRUE)
# 3. Return logical vector marking problematic rows
check_logical_parameter(df, "Biomass", function(x) x < 0, return_logical = TRUE)
# 4. Check that Abundance is not zero (no problems found -> returns NULL)
abundance_check <- check_logical_parameter(df, "Abundance", function(x) x == 0)
print(abundance_check)
Check if stations are reported as nominal positions
Description
This function attempts to determine whether stations in a dataset are reported using nominal positions (i.e., generic or repeated coordinates across events), rather than actual measured coordinates.
Usage
check_nominal_station(data, verbose = TRUE)
Arguments
data |
A data frame containing at least the columns:
|
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
The function compares the number of unique sampling dates with the number of unique station coordinates.
If the number of unique sampling dates is larger than the number of unique station coordinates, the function suspects nominal station positions and issues a warning.
Value
A data frame with distinct station names and their corresponding
latitude/longitude positions, if nominal positions are suspected.
Otherwise, returns NULL.
Examples
df <- data.frame(
sample_date = rep(seq.Date(Sys.Date(), by = "day", length.out = 3), each = 2),
station_name = rep(c("ST1", "ST2"), 3),
sample_longitude_dd = rep(c(15.0, 16.0), 3),
sample_latitude_dd = rep(c(58.5, 58.6), 3)
)
check_nominal_station(df)
Check whether points are located on land
Description
Identifies records whose coordinates fall on land, optionally applying a buffer to allow points near the coast.
Usage
check_onland(
data,
land = NULL,
report = FALSE,
buffer = 0,
offline = FALSE,
plot_leaflet = FALSE,
only_bad = FALSE
)
Arguments
data |
A data frame containing at least |
land |
Optional |
report |
Logical; if |
buffer |
Numeric; distance in meters inland for which points are still considered valid. Only used in online mode. Default is 0. |
offline |
Logical; if |
plot_leaflet |
Logical; if |
only_bad |
Logical; if |
Details
The function supports both offline and online modes:
-
Offline mode (
offline = TRUE): uses a local simplified shoreline from a cached geopackage (land.gpkg). If the file does not exist, it is downloaded automatically and cached across R sessions. -
Online mode (
offline = FALSE): uses the OBIS web service to determine distance to the shore.
The function assumes all coordinates are in WGS84 (EPSG:4326). Supplying coordinates in a different CRS will result in incorrect intersection tests.
Optionally, a leaflet map can be plotted. Points on land are displayed as red markers,
while points in water are green. If only_bad = TRUE, only the red points (on land) are plotted.
Value
If report = TRUE, a tibble with columns:
-
field: alwaysNA(placeholder for future extension) -
level:"warning"for all flagged rows -
row: row numbers indataflagged as located on land -
message: description of the issue
If report = FALSE and plot_leaflet = FALSE, returns a subset of data with only the flagged rows.
If plot_leaflet = TRUE, returns a leaflet map showing points on land (red) and in water (green),
unless only_bad = TRUE, in which case only red points are plotted.
Examples
# Example data frame with coordinates
example_data <- data.frame(
sample_latitude_dd = c(59.3, 58.1, 57.5),
sample_longitude_dd = c(18.6, 17.5, 16.7)
)
# Report points on land with a 100 m buffer
report <- check_onland(example_data, report = TRUE, buffer = 100)
print(report)
# Plot all points colored by land/water
map <- check_onland(example_data, plot_leaflet = TRUE)
# Plot only bad points on land
map_bad <- check_onland(example_data, plot_leaflet = TRUE, only_bad = TRUE)
# Remove points on land by adding a buffer of 2000 m
ok <- check_onland(example_data, report = FALSE, buffer = 2000)
print(nrow(ok))
General outlier check function for SHARK data
Description
This function checks whether values for a specified parameter exceed a predefined
threshold. Thresholds are provided in a dataframe (default .threshold_values),
which should contain columns for parameter, datatype, and at least one numeric
threshold column (e.g., extreme_upper). Only rows in data matching both the
parameter and delivery_datatype (datatype) are considered. Optionally, data
can be grouped by a custom column (e.g., location_sea_basin) when thresholds vary by group.
Usage
check_outliers(
data,
parameter,
datatype,
threshold_col = "extreme_upper",
thresholds = .threshold_values,
custom_group = NULL,
direction = c("above", "below"),
return_df = FALSE,
verbose = TRUE
)
Arguments
data |
A tibble containing data in SHARK format. Must include columns:
|
parameter |
Character. Name of the parameter to check. Must exist in both
|
datatype |
Character. Data type to match against |
threshold_col |
Character. Name of the threshold column in |
thresholds |
A tibble/data frame of thresholds. Must include columns |
custom_group |
Character or NULL. Optional column name in |
direction |
Character. Either |
return_df |
Logical. If TRUE, returns a plain data.frame of flagged rows instead of a DT datatable. Default = FALSE. |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
Only rows in
datamatching bothparameteranddelivery_datatypeare checked.If
custom_groupis specified, thresholds are applied per group.If
threshold_coldoes not exist inthresholds, the function stops with a warning.Values exceeding (or below) the threshold are flagged as outliers.
Intended for interactive use in Shiny apps where
threshold_colcan be selected dynamically.
Value
If outliers are found, returns a DT::datatable or a data.frame (if return_df = TRUE)
containing:
datatype, station_name, sample_date, sample_id, parameter, value, threshold,
and custom_group if specified. Otherwise, prints a message indicating that values
are within the threshold range (if verbose = TRUE) and returns invisible(NULL).
See Also
get_shark_statistics() for preparing updated threshold data.
Examples
# Minimal example dataset
example_data <- dplyr::tibble(
station_name = c("S1", "S2"),
sample_date = as.Date(c("2025-01-01", "2025-01-02")),
sample_id = 1:2,
shark_sample_id_md5 = letters[1:2],
sample_min_depth_m = c(0, 5),
sample_max_depth_m = c(1, 6),
parameter = c("Param1", "Param1"),
value = c(5, 12),
delivery_datatype = c("TypeA", "TypeA")
)
example_thresholds <- dplyr::tibble(
parameter = "Param1",
datatype = "TypeA",
extreme_upper = 10,
mild_upper = 8
)
# Check for values above "extreme_upper"
check_outliers(
data = example_data,
parameter = "Param1",
datatype = "TypeA",
threshold_col = "extreme_upper",
thresholds = example_thresholds,
return_df = TRUE
)
# Check for values above "mild_upper"
check_outliers(
data = example_data,
parameter = "Param1",
datatype = "TypeA",
threshold_col = "mild_upper",
thresholds = example_thresholds,
return_df = TRUE
)
Check parameter values against logical rules
Description
Applies parameter-specific and row-wise logical rules to benthos/epibenthos data,
flagging measurements that violate defined conditions. This function replaces
multiple deprecated check_*_logical() functions with a general, flexible implementation.
Usage
check_parameter_rules(
data,
param_conditions = get(".param_conditions", envir = asNamespace("SHARK4R")),
rowwise_conditions = get(".rowwise_conditions", envir = asNamespace("SHARK4R")),
return_df = FALSE,
return_logical = FALSE,
verbose = TRUE
)
Arguments
data |
A data frame containing at least the columns |
param_conditions |
A named list of parameter-specific rules. Each element should be a list with:
Defaults to |
rowwise_conditions |
A named list of row-wise rules applied across multiple parameters.
Each element should be a function taking the full data frame and returning a logical vector.
Defaults to |
return_df |
Logical. If TRUE, problematic rows are returned as plain |
return_logical |
Logical. If TRUE, problematic rows are returned as logical vectors.
Overrides |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
This function evaluates each parameter in param_conditions and rowwise_conditions.
Only parameters present in the dataset are checked. Messages are printed
indicating whether values are within expected ranges or which rows violate rules.
Value
A named list of results for each parameter:
- Logical vector
If
return_logical = TRUE.- Data frame
If
return_df = TRUEand violations exist.- DT datatable
If violations exist and
return_df = FALSE.- NULL
If no violations exist for the parameter.
Invisible return.
Examples
df <- data.frame(
station_name = c("A1", "A2", "A3", "A4"),
sample_date = as.Date("2023-05-01") + 0:3,
sample_id = 101:104,
parameter = c("Wet weight", "Wet weight", "Abundance", "BQIm"),
value = c(0, 5, 0, 3)
)
# Check against default package rules
check_parameter_rules(df)
# Return problematic rows as data.frame
check_parameter_rules(df, return_df = TRUE)
# Return logical vectors for each parameter
rule_check <- check_parameter_rules(df, return_logical = TRUE)
print(rule_check)
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_phytoplankton_abund(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_phytoplankton_biovol(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_phytoplankton_carbon(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_phytoplankton_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_picoplankton_abundance(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_picoplankton_biovol(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_picoplankton_carbon(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_picoplankton_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_primaryproduction_carbonprod(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_primaryproduction_carbonprod_hour(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_primaryproduction_carbonprodlight(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_ringedseal_calccounted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Download and set up SHARK4R support files
Description
This function downloads the products folder from
the SHARK4R GitHub repository and places them in a user-specified directory.
These folders contain Shiny applications and R Markdown documents used for
quality control (QC) of SHARK data.
Usage
check_setup(path, run_app = FALSE, force = FALSE, verbose = TRUE)
Arguments
path |
Character string giving the directory where the products folder should be created. Must be provided by the user. |
run_app |
Logical, if |
force |
Logical, if |
verbose |
Logical, if |
Details
If the path folders already exist, the download will be skipped unless
force = TRUE is specified. Optionally, the function can launch the
QC Shiny app directly after setup.
Value
An (invisible) list with the path to the local products folder:
Examples
# Download support files into a temporary directory
check_setup(path = tempdir())
# Force re-download if already present
check_setup(path = tempdir(), force = TRUE)
# Download and run the QC Shiny app
if(interactive()){
check_setup(path = tempdir(), run_app = TRUE)
}
Check station distances against SMHI station list
Description
Matches reported station names against the SMHI curated station list
("station.txt") and checks whether matched stations fall within
pre-defined distance limits. This helps ensure that station assignments
are spatially consistent.
Usage
check_station_distance(
data,
station_file = NULL,
plot_leaflet = FALSE,
try_synonyms = TRUE,
fallback_crs = 4326,
only_bad = FALSE,
verbose = TRUE
)
Arguments
data |
A data frame containing at least the columns:
|
station_file |
Optional path to a custom station file (tab-delimited).
If |
plot_leaflet |
Logical; if |
try_synonyms |
Logical; if |
fallback_crs |
Integer; CRS (EPSG code) to use when creating spatial
points if no CRS is available. Defaults to |
only_bad |
Logical; if |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
Optionally, a leaflet map of stations can be plotted. SMHI stations that match the reported data are shown as blue circles, with their allowed radius visualized and displayed in the popup (e.g., "ST1 (Radius: 1000 m)"). Reported stations are shown as markers colored by whether they fall within the radius (green), outside the radius (red), or unmatched (gray).
If try_synonyms = TRUE, the function will attempt a second match
using the SYNONYM_NAMES column in the station database, splitting
multiple synonyms separated by <or>.
The function first checks if a station file path is provided via the
station_file argument. If not, it looks for the
NODC_CONFIG environment variable. This variable can point to a folder
where the NODC (Swedish National Oceanographic Data Center) configuration and station file
are stored, typically including:
-
<NODC_CONFIG>/config/station.txt
If NODC_CONFIG is set and the folder exists, the function will use
station.txt from that location. Otherwise, it falls back to the
bundled station.zip included in the SHARK4R package.
Value
If plot_leaflet = FALSE, returns a data frame with columns:
- station_name
Reported station name.
- match_type
TRUEif station matched in SMHI list,FALSEotherwise.- distance_m
Distance in meters from reported station to matched SMHI station.
- within_limit
TRUEif distance <= allowed radius,FALSEif outside,NAif unmatched.
If plot_leaflet = TRUE, the function produces a leaflet map showing:
Blue circles for SMHI stations with radius in the popup.
Reported stations colored by status: green (within radius), red (outside radius), gray (unmatched).
If
only_bad = TRUE, only the red stations (outside radius) are displayed.
Examples
# Example data
df <- data.frame(
station_name = c("ANHOLT E", "BY5 BORNHOLMSDJ", "NEW STATION"),
sample_longitude_dd = c(12.1, 15.97, 17.5),
sample_latitude_dd = c(56.7, 55.25, 58.7)
)
# Check station distance
check_station_distance(df, try_synonyms = TRUE, verbose = FALSE)
# Plot bad points in leaflet map
map <- check_station_distance(df,
plot_leaflet = TRUE,
only_bad = TRUE,
verbose = FALSE)
Identify non-numeric or non-logical values in measurement data
Description
This function checks whether entries in the value column of a dataset are valid
numeric or logical values. It is particularly useful for identifying common data
entry errors such as inequality symbols (<, >) or unintended text strings
(e.g., "NA", "below detection"). The function reports any invalid entries
in an interactive DT::datatable for easy inspection.
Usage
check_value_logical(data, return_df = FALSE)
Arguments
data |
A data frame. Must contain a column named |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows instead of a DT datatable. Default = FALSE. |
Value
A DT::datatable or data frame listing unique invalid entries, or NULL (invisibly)
if all values are correctly formatted as numeric or logical.
Examples
# Example dataset with mixed valid and invalid values
df <- data.frame(
station_name = c("A", "B", "C", "D", "E"),
value = c("3.4", "<0.2", "TRUE", "NA", "5e-3")
)
# Check for invalid (non-numeric / non-logical) entries
check_value_logical(df, return_df = TRUE)
# Example with all valid numeric and logical values
df_valid <- data.frame(value = c(1.2, 0, TRUE, FALSE, 3.5))
check_value_logical(df_valid)
Identify samples with zero-valued station coordinates
Description
This function inspects a dataset containing sample coordinates to identify potential issues where longitude or latitude values are zero (0), which typically indicate missing or erroneous station positions. The function can return a summary table, a filtered data frame, or a logical vector highlighting problematic rows. It is useful as a data quality control step before spatial analyses or database imports.
Usage
check_zero_positions(
data,
coord = "longitude",
return_df = FALSE,
return_logical = FALSE,
verbose = TRUE
)
Arguments
data |
A data frame. Must contain |
coord |
Character. Which coordinate(s) to check: "longitude", "latitude", or "both". Default = "longitude". |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows instead of a DT datatable. Default = FALSE. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows have zero in the selected coordinate(s). Overrides return_df. Default = FALSE. |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Value
A DT datatable, a data.frame, a logical vector, or NULL (if no problems found and return_logical = FALSE).
Examples
# Example data
df <- data.frame(
station_name = c("A", "B", "C"),
sample_longitude_dd = c(15.2, 0, 18.7),
sample_latitude_dd = c(56.3, 58.1, 0)
)
# Check for zeroes in both coordinates and return as data.frame
check_zero_positions(df, coord = "both", return_df = TRUE)
# Return a logical vector instead of a table
check_zero_positions(df, coord = "both", return_logical = TRUE)
Identify records with zero-valued measurement data
Description
This function scans a dataset for cases where the measurement column (value)
contains zero (0) values, which may indicate missing, censored, or erroneous data.
It returns either a DT::datatable for easy inspection or a plain data.frame of
the affected rows. This function is useful for quality control and validation
prior to data aggregation, reporting, or database submission.
Usage
check_zero_value(data, return_df = FALSE)
Arguments
data |
A data frame. Must contain a column named |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows instead of a DT datatable. Default = FALSE. |
Value
A DT datatable or a data.frame of zero-value records, or NULL (invisibly)
if no zero values are found.
Examples
# Example dataset
df <- data.frame(
station_name = c("A", "B", "C", "D"),
sample_date = as.Date(c("2023-06-01", "2023-06-02", "2023-06-03", "2023-06-04")),
value = c(3.2, 0, 1.5, 0)
)
# Return a plain data.frame of zero-value records
check_zero_value(df, return_df = TRUE)
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zoobenthos_BQIm(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check logical relationship between Abundance and BQIm
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_zoobenthos_BQIm_logical(data, return_df = FALSE, return_logical = FALSE)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zoobenthos_abund(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zoobenthos_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zoobenthos_wetweight(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Check if wet weight measurements are zero
Description
This function is deprecated and has been replaced by
check_logical_parameter(). Alternatively, you can use check_parameter_rules().
Usage
check_zoobenthos_wetweight_logical(
data,
return_df = FALSE,
return_logical = FALSE
)
Arguments
data |
A data frame. Must contain columns |
return_df |
Logical. If TRUE, return a plain data.frame of problematic rows. |
return_logical |
Logical. If TRUE, return a logical vector of length nrow(data) indicating which rows exceed 100% for Total cover. Overrides return_df. |
Value
A DT datatable, a data.frame, a logical vector, or NULL if no problems found.
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_abund(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_carbon(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_counted(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_length_mean(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_length_median(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_wetweight(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_wetweight_area(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Uses data from national marine monitoring for the last 5 years to identify outliers
Description
This function is deprecated and has been replaced by
check_outliers().
Ranges and IQR (interquantile range) for specific parameters is adapted to each datatype
Usage
check_zooplankton_wetweight_volume(data)
Arguments
data |
for tibble be be checked |
Value
tibble of data with outliers
Clean SHARK4R cache by file age and session
Description
Deletes cached files in the SHARK4R cache directory that are older than a specified number of days.
Usage
clean_shark4r_cache(
days = 1,
cache_dir = tools::R_user_dir("SHARK4R", "cache"),
clear_perm_cache = FALSE,
search_pattern = NULL,
verbose = TRUE
)
Arguments
days |
Numeric; remove files older than this number of days. Default is 1. |
cache_dir |
Character; path to the cache directory to clean.
Defaults to the SHARK4R cache directory in the user-specific R folder
(via |
clear_perm_cache |
Logical. If |
search_pattern |
Character; optional regex pattern to filter which files to consider for deletion. |
verbose |
Logical. If |
Details
The cache is automatically cleared after 24h.
Value
Invisible NULL. Messages are printed about what was deleted
and whether the in-memory session cache was cleared.
See Also
get_peg_list(), get_nomp_list(), get_shark_codes(), get_dyntaxa_dwca(), get_shark_statistics()
for functions that populate the cache.
Examples
# Remove files older than 60 days and clear session cache
clean_shark4r_cache(days = 60)
Construct a hierarchical taxonomy table from Dyntaxa
Description
This function constructs a taxonomy table based on Dyntaxa taxon IDs. It queries the SLU Artdatabanken API (Dyntaxa) to fetch taxonomy information and organizes the data into a hierarchical table.
Usage
construct_dyntaxa_table(
taxon_ids,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
shark_output = TRUE,
add_parents = TRUE,
add_descendants = FALSE,
add_descendants_rank = "genus",
add_synonyms = TRUE,
add_missing_taxa = FALSE,
add_hierarchy = FALSE,
verbose = TRUE,
add_genus_children = deprecated(),
recommended_only = deprecated(),
parent_ids = deprecated()
)
Arguments
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A data frame representing the constructed taxonomy table.
See Also
get_worms_taxonomy_tree for an equivalent WoRMS function
SLU Artdatabanken API Documentation
Examples
## Not run:
# Construct Dyntaxa taxonomy table for taxon IDs 238366 and 1010380
taxon_ids <- c(238366, 1010380)
taxonomy_table <- construct_dyntaxa_table(taxon_ids, "your_subscription_key")
print(taxonomy_table)
## End(Not run)
Convert coordinates from DDMM format to decimal degrees
Description
This function converts geographic coordinates provided in the DDMM format (degrees and minutes) to decimal degrees. It can handle:
DDMM (e.g., 5733 to 57°33' to 57.55°)
DDMMss or DDMMss… (extra digits after minutes are interpreted as fractional minutes, e.g., 573345 to 57°33.45' to 57.5575°)
Usage
convert_ddmm_to_dd(coord)
Arguments
coord |
A numeric or character vector of coordinates in DDMM format. |
Details
Non-numeric characters are removed before conversion. Coordinates
shorter than 4 digits are returned as NA.
Value
A numeric vector of decimal degrees corresponding to the input coordinates. Names from the input vector are removed.
Examples
# Basic DDMM input
convert_ddmm_to_dd(c(5733, 6045))
# Input with fractional minutes
convert_ddmm_to_dd(c("573345", "604523"))
# Input with non-numeric characters
convert_ddmm_to_dd(c("57°33'", "60°45'23\""))
Defunct functions
Description
These functions were deprecated before being made defunct. If there's a known replacement, calling the function will tell you about it.
Usage
# Deprecated in 1.0.0 -------------------------------------
# Deprecated in 0.1.4 -------------------------------------
Find required fields in a SHARK delivery template
Description
Identifies which columns are mandatory in the SHARK delivery template based on rows starting with "*" (one or more). You can specify how many levels of asterisks to include.
Usage
find_required_fields(
datatype,
stars = 1,
bacterioplankton_subtype = "abundance"
)
Arguments
datatype |
Character. The datatype name. Available options include:
|
stars |
Integer. Maximum number of "" levels to include.
Default = 1 (only single "").
For example, |
bacterioplankton_subtype |
Character. For "Bacterioplankton" only: either "abundance" (default) or "production". Ignored for other datatypes. |
Details
Note: A single "*" marks required fields in the standard SHARK template. A double "**" is often used to specify columns required for national monitoring only. For more information, see: https://www.smhi.se/data/hav-och-havsmiljo/datavardskap-oceanografi-och-marinbiologi/leverera-data
Value
A character vector of column names that are required in the template.
Examples
# Only single "*" required columns
find_required_fields("Bacterioplankton")
# Include both "*" and "**" required columns (national monitoring too)
find_required_fields("Bacterioplankton", stars = 2)
# Include up to three levels of "*"
find_required_fields("Phytoplankton", stars = 3)
Search AlgaeBase for information about a genus of algae
Description
This function has been deprecated. Users are encouraged to use match_algaebase_species instead.
This function searches the AlgaeBase API for genus information and returns detailed taxonomic data, including higher taxonomy, taxonomic status, scientific names, and other related metadata.
Usage
get_algaebase_genus(
genus,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
higher = TRUE,
unparsed = FALSE,
newest_only = TRUE,
exact_matches_only = TRUE,
apikey = deprecated()
)
Arguments
genus |
The genus name to search for (character string). This parameter is required. |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable You can provide the key in three ways:
|
higher |
A boolean flag indicating whether to include higher taxonomy in the output (default is TRUE). |
unparsed |
A boolean flag indicating whether to return the raw JSON output from the API (default is FALSE). |
newest_only |
A boolean flag to return only the most recent entry (default is TRUE). |
exact_matches_only |
A boolean flag to limit results to exact matches (default is TRUE). |
apikey |
Details
A valid API key is requested from the AlgaeBase team.
Value
A data frame with the following columns:
-
id— AlgaeBase identifier. -
accepted_name— Accepted scientific name (if different from the input). -
input_name— The genus name supplied by the user. -
input_match— Indicator of exact match (1= exact,0= not exact). -
currently_accepted— Indicator if the taxon is currently accepted (1= TRUE,0= FALSE). -
genus_only— Indicator if the search was for a genus only (1= genus,0= genus + species). -
kingdom,phylum,class,order,family— Higher taxonomy (returned ifhigher = TRUE). -
taxonomic_status— Status of the taxon (e.g., currently accepted, synonym, unverified). -
taxon_rank— Taxonomic rank of the accepted name (e.g., genus, species). -
mod_date— Date when the entry was last modified. -
long_name— Full scientific name including author and date (if available). -
authorship— Author information (if available).
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
## Not run:
get_algaebase_genus("Anabaena", subscription_key = "your_api_key")
## End(Not run)
Search AlgaeBase for information about a species of algae
Description
This function has been deprecated. Users are encouraged to use match_algaebase_species instead.
This function searches the AlgaeBase API for species based on genus and species names. It allows for flexible search parameters such as filtering by exact matches, returning the most recent results, and including higher taxonomy details.
Usage
get_algaebase_species(
genus,
species,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
higher = TRUE,
unparsed = FALSE,
newest_only = TRUE,
exact_matches_only = TRUE,
apikey = deprecated()
)
Arguments
genus |
A character string specifying the genus name. |
species |
A character string specifying the species or specific epithet. |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable You can provide the key in three ways:
|
higher |
A logical value indicating whether to include higher taxonomy details (default is |
unparsed |
A logical value indicating whether to print the full JSON response from the API (default is |
newest_only |
A logical value indicating whether to return only the most recent entries (default is |
exact_matches_only |
A logical value indicating whether to return only exact matches (default is |
apikey |
Details
A valid API key is requested from the AlgaeBase team.
This function queries the AlgaeBase API for species based on the genus and species names, and filters the results based on various parameters. The function handles different taxonomic ranks and formats the output for easy use. It can merge higher taxonomy data if requested.
Value
A data frame with details about the species, including:
-
taxonomic_status— The current status of the taxon (e.g., accepted, synonym, unverified). -
taxon_rank— The rank of the taxon (e.g., species, genus). -
accepted_name— The currently accepted scientific name, if applicable. -
authorship— Author information for the scientific name (if available). -
mod_date— Date when the taxonomic record was last modified. -
...— Other relevant information returned by the data source.
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
## Not run:
# Search for a species with exact matches only, return the most recent results
result <- get_algaebase_species(
genus = "Skeletonema", species = "marinoi", subscription_key = "your_api_key"
)
# Print result
print(result)
## End(Not run)
Get a delivery template for a SHARK datatype
Description
Downloads and reads the SHARK Excel delivery template for a given datatype. The template contains the column definitions and headers used for submission.
Usage
get_delivery_template(
datatype,
sheet = "Kolumner",
header_row = 4,
skip = 1,
bacterioplankton_subtype = "abundance",
force = FALSE,
clean_cache_days = 1
)
Arguments
datatype |
Character. The datatype name. Available options include:
|
sheet |
Character or numeric. Name (e.g., "Kolumner") or index (e.g., 1) of the sheet in the Excel file to read. Default is "Kolumner". |
header_row |
Integer. Row number in the Excel file that contains the column headers. Default is 4. |
skip |
Integer. Number of rows to skip before reading data. Default is 1. |
bacterioplankton_subtype |
Character. For "Bacterioplankton" only: either "abundance" (default) or "production". Ignored for other datatypes. |
force |
Logical; if |
clean_cache_days |
Numeric; if not |
Value
A tibble containing the delivery template. Column names are set
from the header row.
Examples
# Bacterioplankton abundance
abun <- get_delivery_template("Bacterioplankton",
bacterioplankton_subtype = "abundance")
print(abun)
# Bacterioplankton production
prod <- get_delivery_template("Bacterioplankton",
bacterioplankton_subtype = "production")
# Phytoplankton template
phyto <- get_delivery_template("Phytoplankton")
# Phytoplankton column explanation (sheet number 3)
phyto_column_explanation <- get_delivery_template("Phytoplankton",
sheet = 3,
header_row = 4,
skip = 3)
print(phyto_column_explanation)
Download and read Darwin Core Archive files from Dyntaxa
Description
This function downloads a complete Darwin Core Archive (DwCA) of Dyntaxa from the SLU Artdatabanken API, extracts the archive, and reads the specified CSV file into R.
Usage
get_dyntaxa_dwca(
subscription_key = Sys.getenv("DYNTAXA_KEY"),
file_to_read = "Taxon.csv",
force = FALSE,
verbose = TRUE
)
Arguments
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
file_to_read |
A string specifying the name of the CSV file to read from the extracted archive.
Allowed options are: |
force |
A logical value indicating whether to force a fresh download of the archive,
even if a cached copy is available. Defaults to |
verbose |
A logical value indicating whether to show download progress. Defaults to |
Details
By default, the archive is downloaded and cached across R sessions. On subsequent calls,
the function reuses the cached copy of the extracted files to avoid repeated downloads.
Use the force parameter to re-download the archive if needed. The cache is cleared
automatically after 24 hours, but you can also manually clear it using
clean_shark4r_cache.
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A tibble containing the data from the specified CSV file.
See Also
clean_shark4r_cache() to manually clear cached files.
Examples
## Not run:
# Provide your Dyntaxa API subscription key
subscription_key <- "your_subscription_key"
# Download and read the Taxon.csv file
taxon_data <- get_dyntaxa_dwca(subscription_key, file_to_read = "Taxon.csv")
## End(Not run)
Get parent taxon IDs for specified taxon IDs from Dyntaxa
Description
This function queries the SLU Artdatabanken API (Dyntaxa) to retrieve parent taxon IDs for the specified taxon IDs. It constructs a request with the provided taxon IDs, sends the request to the SLU Artdatabanken API, and processes the response to return a list of parent taxon IDs.
Usage
get_dyntaxa_parent_ids(
taxon_ids,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
verbose = TRUE
)
Arguments
taxon_ids |
A vector of numeric taxon IDs for which parent taxon IDs are requested. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
verbose |
Logical. Default is TRUE. |
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A list containing parent taxon IDs corresponding to the specified taxon IDs.
See Also
SLU Artdatabanken API Documentation
Examples
## Not run:
# Get parent taxon IDs for taxon IDs 238366 and 1010380
parent_ids <- get_dyntaxa_parent_ids(c(238366, 1010380), "your_subscription_key")
print(parent_ids)
## End(Not run)
Get taxonomic information from Dyntaxa for specified taxon IDs
Description
This function queries the SLU Artdatabanken API (Dyntaxa) to retrieve taxonomic information for the specified taxon IDs. It constructs a request with the provided taxon IDs, sends the request to the SLU Artdatabanken API, and processes the response to return taxonomic information in a data frame.
Usage
get_dyntaxa_records(taxon_ids, subscription_key = Sys.getenv("DYNTAXA_KEY"))
Arguments
taxon_ids |
A vector of numeric taxon IDs (Dyntaxa ID) for which taxonomic information is requested. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A data frame containing taxonomic information for the specified taxon IDs.
Columns include taxonId, names, category, rank, isRecommended, and parentTaxonId.
See Also
SLU Artdatabanken API Documentation
Examples
## Not run:
# Get taxonomic information for taxon IDs 238366 and 1010380
taxon_info <- get_dyntaxa_records(c(238366, 1010380), "your_subscription_key")
print(taxon_info)
## End(Not run)
Download the IOC-UNESCO Taxonomic Reference List of Harmful Micro Algae
Description
This function retrieves the IOC-UNESCO Taxonomic Reference List of Harmful Micro Algae from the World Register of Marine Species (WoRMS). The data is returned as a dataframe, with options to customize the fields included in the download.
Usage
get_hab_list(
aphia_id = TRUE,
scientific_name = TRUE,
authority = TRUE,
fossil = TRUE,
rank_name = TRUE,
status_name = TRUE,
qualitystatus_name = TRUE,
modified = TRUE,
lsid = TRUE,
parent_id = TRUE,
stored_path = TRUE,
citation = TRUE,
classification = TRUE,
environment = TRUE,
accepted_taxon = TRUE
)
Arguments
aphia_id |
Logical. Include the AphiaID field. Defaults to |
scientific_name |
Logical. Include the scientific name field. Defaults to |
authority |
Logical. Include the authority field. Defaults to |
fossil |
Logical. Include information about fossil status. Defaults to |
rank_name |
Logical. Include the taxonomic rank (e.g., species, variety, forma). Defaults to |
status_name |
Logical. Include the taxonomic status field. Defaults to |
qualitystatus_name |
Logical. Include the quality status field. Defaults to |
modified |
Logical. Include the date of last modification field. Defaults to |
lsid |
Logical. Include the Life Science Identifier (LSID) field. Defaults to |
parent_id |
Logical. Include the parent AphiaID field. Defaults to |
stored_path |
Logical. Include the stored path field. Defaults to |
citation |
Logical. Include citation information. Defaults to |
classification |
Logical. Include the full taxonomic classification (e.g., kingdom, phylum, class). Defaults to |
environment |
Logical. Include environmental data (e.g., marine, brackish, freshwater, terrestrial). Defaults to |
accepted_taxon |
Logical. Include information about the accepted taxon (e.g., scientific name and authority). Defaults to |
Details
This function submits a POST request to the WoRMS database to retrieve the IOC-UNESCO Taxonomic Reference List of Harmful Micro Algae.
The downloaded data can include various fields, which are controlled by the input parameters.
If a field is not required, set the corresponding parameter to FALSE to exclude it from the output.
Value
A tibble containing the HABs taxonomic list, with columns based on the selected parameters.
See Also
https://www.marinespecies.org/hab/ for IOC-UNESCO Taxonomic Reference List of Harmful Micro Algae
Examples
# Download the default HABs taxonomic list
habs_taxlist_df <- get_hab_list()
head(habs_taxlist_df)
# Include only specific fields in the output
habs_taxlist_df <- get_hab_list(aphia_id = TRUE, scientific_name = TRUE, authority = FALSE)
head(habs_taxlist_df)
Get the latest NOMP biovolume Excel list
Description
This function downloads the latest available Nordic Marine Phytoplankton Group (NOMP) biovolume zip archive from SMHI, unzips it, and reads the first Excel file by default. You can also specify which file in the archive to read.
Usage
get_nomp_list(
year = as.numeric(format(Sys.Date(), "%Y")),
file = NULL,
sheet = NULL,
force = FALSE,
base_url = NULL,
clean_cache_days = 30,
verbose = TRUE
)
Arguments
year |
Numeric year to download. Default is current year; if not available, previous years are automatically tried. |
file |
Character string specifying which file in the zip archive to read. Defaults to the first Excel file in the archive. |
sheet |
Character or numeric; the name or index of the sheet to read from the Excel file. If neither argument specifies the sheet, defaults to the first sheet. |
force |
Logical; if |
base_url |
Base URL (without "/nomp_taxa_biovolumes_and_carbon_YYYY.zip") for the NOMP biovolume files. Defaults to the SMHI directory. |
clean_cache_days |
Numeric; if not |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
Value
A tibble with the contents of the requested Excel file.
See Also
clean_shark4r_cache() to manually clear cached files.
Examples
# Get the latest available list
nomp_list <- get_nomp_list()
head(nomp_list)
# Get the 2023 list and clean old cache files older than 60 days
nomp_list_2023 <- get_nomp_list(2023, clean_cache_days = 60)
head(nomp_list_2023)
Retrieve external links or facts for taxa from Nordic Microalgae
Description
This function retrieves external links related to algae taxa from the Nordic Microalgae API. It takes a vector of slugs (taxon identifiers) and returns a data frame containing the external links associated with each taxon. The data includes the provider, label, external ID, and the URL of the external link.
Usage
get_nua_external_links(slug, verbose = TRUE, unparsed = FALSE)
Arguments
slug |
A vector of taxon slugs (identifiers) for which to retrieve external links. |
verbose |
A logical flag indicating whether to display a progress bar. Default is |
unparsed |
Logical. If |
Details
The slugs (taxon identifiers) used in this function can be retrieved using the get_nua_taxa() function,
which returns a data frame with a column for taxon slugs, along with other relevant metadata for each taxon.
Value
When unparsed = FALSE: a tibble containing the following columns:
slug |
The slug (identifier) of the taxon. |
provider |
The provider of the external link. |
label |
The label of the external link. |
external_id |
The external ID associated with the external link. |
external_url |
The URL of the external link. |
collection |
The collection category, which is "External Links" for all rows. |
See Also
https://nordicmicroalgae.org/ for Nordic Microalgae website.
https://nordicmicroalgae.org/api/ for Nordic Microalgae API documentation.
Examples
# Retrieve external links for a vector of slugs
external_links <- get_nua_external_links(slug = c("chaetoceros-debilis", "alexandrium-tamarense"),
verbose = FALSE)
head(external_links)
Retrieve harmfulness for taxa from Nordic Microalgae
Description
This function retrieves harmfulness information related to algae taxa from the Nordic Microalgae API. It takes a vector of slugs (taxon identifiers) and returns a data frame containing the harmfulness information associated with each taxon. The data includes the provider, label, external ID, and the URL of the external link.
Usage
get_nua_harmfulness(slug, verbose = TRUE)
Arguments
slug |
A vector of taxon slugs (identifiers) for which to retrieve external links. |
verbose |
A logical flag indicating whether to display a progress bar. Default is |
Details
The slugs (taxon identifiers) used in this function can be retrieved using the get_nua_taxa() function,
which returns a data frame with a column for taxon slugs, along with other relevant metadata for each taxon.
Value
A tibble containing the following columns:
slug |
The slug (identifier) of the taxon. |
provider |
The provider of the external link. |
label |
The label of the external link. |
external_id |
The external ID associated with the external link. |
external_url |
The URL of the external link. |
collection |
The collection category, which is "Harmful algae blooms" for all rows. |
See Also
https://nordicmicroalgae.org/ for Nordic Microalgae website.
https://nordicmicroalgae.org/api/ for Nordic Microalgae API documentation.
Examples
# Retrieve external links for a vector of slugs
harmfulness <- get_nua_harmfulness(slug = c("dinophysis-acuta",
"alexandrium-ostenfeldii"),
verbose = FALSE)
print(harmfulness)
Retrieve and extract media URLs from Nordic Microalgae
Description
This function retrieves media information from the Nordic Microalgae API and extracts slugs and URLs for different renditions (large, original, small, medium) for each media item.
Usage
get_nua_media_links(unparsed = FALSE)
Arguments
unparsed |
Logical. If |
Value
When unparsed = FALSE: a tibble with the following columns:
-
slug: The slug of the related taxon. -
l_url: The URL for the "large" rendition. -
o_url: The URL for the "original" rendition. -
s_url: The URL for the "small" rendition. -
m_url: The URL for the "medium" rendition.
See Also
https://nordicmicroalgae.org/ for Nordic Microalgae website.
https://nordicmicroalgae.org/api/ for Nordic Microalgae API documentation.
Examples
# Retrieve media information
media_info <- get_nua_media_links(unparsed = FALSE)
# Preview the extracted data
head(media_info)
Retrieve taxa information from Nordic Microalgae
Description
This function retrieves all taxonomic information for algae taxa from the Nordic Microalgae API. It fetches details including scientific names, authorities, ranks, and image URLs (in different sizes: large, medium, original, and small).
Usage
get_nua_taxa(unparsed = FALSE)
Arguments
unparsed |
Logical. If |
Value
When unparsed = FALSE: a tibble containing the following columns:
slug |
A unique identifier for the taxon. |
scientific_name |
The scientific name of the taxon. |
authority |
The authority associated with the scientific name. |
rank |
The taxonomic rank of the taxon. |
See Also
https://nordicmicroalgae.org/ for Nordic Microalgae website.
https://nordicmicroalgae.org/api/ for Nordic Microalgae API documentation.
Examples
# Retrieve and display taxa data
taxa_data <- get_nua_taxa(unparsed = FALSE)
head(taxa_data)
Get the latest EG-Phyto/PEG biovolume Excel list
Description
This function downloads the EG-Phyto (previously PEG) biovolume zip archive from ICES (using
cache_peg_zip()), unzips it, and reads the first Excel file by default.
You can also specify which file in the archive to read.
Usage
get_peg_list(
file = NULL,
sheet = NULL,
force = FALSE,
url = "https://www.ices.dk/data/Documents/ENV/PEG_BVOL.zip",
clean_cache_days = 30,
verbose = TRUE
)
Arguments
file |
Character string specifying which file in the zip archive to read. Defaults to the first Excel file in the archive. |
sheet |
Character or numeric; the name or index of the sheet to read from the Excel file. If neither argument specifies the sheet, defaults to the first sheet. |
force |
Logical; if |
url |
Character string with the URL of the PEG zip file. Defaults to the official ICES link. |
clean_cache_days |
Numeric; if not |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
Value
A tibble with the contents of the requested Excel file.
See Also
clean_shark4r_cache() to manually clear cached files.
Examples
# Read the first Excel file from the PEG zip
peg_list <- get_peg_list()
head(peg_list)
Get SHARK codelist from SMHI
Description
This function downloads the SHARK codes Excel file from SMHI (if not already cached) and reads it into R. The file is stored in a persistent cache directory so it does not need to be downloaded again in subsequent sessions.
Usage
get_shark_codes(
url =
"https://smhi.se/oceanografi/oce_info_data/shark_web/downloads/codelist_SMHI.xlsx",
sheet = 1,
skip = 1,
force = FALSE,
clean_cache_days = 30
)
Arguments
url |
Character string with the URL to the SHARK codes Excel file. Defaults to the official SMHI codelist. |
sheet |
Sheet to read. Can be either the sheet name or its index
(default is |
skip |
Number of rows to skip before reading data
(default is |
force |
Logical; if |
clean_cache_days |
Numeric; if not |
Value
A tibble containing the contents of the requested sheet.
See Also
clean_shark4r_cache() to manually clear cached files.
Examples
# Read the first sheet, skipping the first row
codes <- get_shark_codes()
head(codes)
# Force re-download of the Excel file
codes <- get_shark_codes(force = TRUE)
Retrieve tabular data from SHARK
Description
The get_shark_data() function retrieves tabular data from the SHARK database hosted by SMHI. The function sends a POST request
to the SHARK API with customizable filters, including year, month, taxon name, water category, and more, and returns the
retrieved data as a structured tibble. To view available filter options, see get_shark_options.
Usage
get_shark_data(
tableView = "sharkweb_overview",
headerLang = "internal_key",
save_data = FALSE,
file_path = NULL,
delimiters = "point-tab",
lineEnd = "win",
encoding = "utf_8",
dataTypes = c(),
bounds = c(),
fromYear = NULL,
toYear = NULL,
months = c(),
parameters = c(),
checkStatus = "",
qualityFlags = c(),
deliverers = c(),
orderers = c(),
projects = c(),
datasets = c(),
minSamplingDepth = "",
maxSamplingDepth = "",
redListedCategory = c(),
taxonName = c(),
stationName = c(),
vattenDistrikt = c(),
seaBasins = c(),
counties = c(),
municipalities = c(),
waterCategories = c(),
typOmraden = c(),
helcomOspar = c(),
seaAreas = c(),
hideEmptyColumns = FALSE,
row_limit = 10^7,
prod = TRUE,
utv = FALSE,
verbose = TRUE
)
Arguments
tableView |
Character. Specifies the columns of the table to retrieve. Options include:
Default is |
headerLang |
Character. Language option for column headers. Possible values:
|
save_data |
Logical. If |
file_path |
Character. The file path where the data should be saved. Required if |
delimiters |
Character. Specifies the delimiter used to separate values in the file, if |
lineEnd |
Character. Defines the type of line endings in the file, if |
encoding |
Character. Sets the file's text encoding, if |
dataTypes |
Character vector. Specifies data types to filter. Possible values include:
|
bounds |
A numeric vector of length 4 specifying the geographical search boundaries in decimal degrees,
formatted as |
fromYear |
Integer (optional). The starting year for data retrieval.
If set to |
toYear |
Integer (optional). The ending year for data retrieval.
If set to |
months |
Integer vector. The months to retrieve data for, e.g., |
parameters |
Character vector. Optional parameters to filter the results by, such as |
checkStatus |
Character string. Optional status check to filter results. |
qualityFlags |
Character vector. Specifies the quality flags to filter the data. By default, all data are included, including those with the "B" flag (Bad). |
deliverers |
Character vector. Specifies the data deliverers to filter by. |
orderers |
Character vector. Orderers to filter by specific organizations or individuals. |
projects |
Character vector. Projects to filter data by specific research or monitoring projects. |
datasets |
Character vector. Datasets to filter data by specific datasets. |
minSamplingDepth |
Numeric. Minimum sampling depth (in meters) to filter the data. |
maxSamplingDepth |
Numeric. Maximum sampling depth (in meters) to filter the data. |
redListedCategory |
Character vector. Red-listed taxa for conservation filtering. |
taxonName |
Character vector. Optional vector of taxa names to filter by. |
stationName |
Character vector. Station names to filter data by specific stations. |
vattenDistrikt |
Character vector. Water district names to filter by Swedish water districts. |
seaBasins |
Character vector. Sea basins to filter by. |
counties |
Character vector. Counties to filter by specific administrative regions. |
municipalities |
Character vector. Municipalities to filter by. |
waterCategories |
Character vector. Water categories to filter by. |
typOmraden |
Character vector. Type areas to filter by. |
helcomOspar |
Character vector. HELCOM or OSPAR areas for regional filtering. |
seaAreas |
Character vector. Sea area codes to filter by specific sea areas. |
hideEmptyColumns |
Logical. Whether to hide empty columns. Default is FALSE. |
row_limit |
Numeric. Specifies the maximum number of rows that can be retrieved in a single request.
If the requested data exceeds this limit, the function automatically downloads the data in yearly chunks
(ignored when |
prod |
Logical, whether to download from the production
( |
utv |
Logical. Select UTV server when |
verbose |
Logical. Whether to display progress information. Default is TRUE. |
Details
This function sends a POST request to the SHARK API with the specified filters.
The API returns a delimited text file (e.g., tab- or semicolon-separated), which is
downloaded and read into R as a tibble. If the row_limit parameter is exceeded,
the data is retrieved in yearly chunks and combined into a single table. Adjusting the
row_limit parameter may be necessary when retrieving large datasets or detailed reports.
Note that making very large requests (e.g., retrieving the entire SHARK database)
can be extremely time- and memory-intensive.
Value
A tibble containing the retrieved SHARK data, parsed from
the API's delimited text response. Column types are inferred automatically.
Note
For large queries spanning multiple years or including several data types, retrieval can be time-consuming and memory-intensive. Consider filtering by year, data type, or region for improved performance.
See Also
-
https://shark.smhi.se/en – SHARK database portal
-
get_shark_options()– Retrieve available filters -
get_shark_table_counts()– Check table row counts before download -
get_shark_datasets()– To download datasets as zip-archives
Examples
# Retrieve chlorophyll data from 2019 to 2020 for April to June
shark_data <- get_shark_data(fromYear = 2019, toYear = 2020,
months = c(4, 5, 6), dataTypes = "Chlorophyll",
verbose = FALSE)
print(shark_data)
Download SHARK dataset zip archives
Description
Downloads one or more datasets (zip-archives) from the SHARK database (Swedish national marine environmental data archive) and optionally unzips them. The function matches provided dataset names against all available SHARK datasets.
Usage
get_shark_datasets(
dataset_name,
save_dir = NULL,
prod = TRUE,
utv = FALSE,
unzip_file = FALSE,
return_df = FALSE,
encoding = "latin_1",
guess_encoding = TRUE,
verbose = TRUE
)
Arguments
dataset_name |
Character vector with one or more dataset
names (or partial names). Each entry will be matched against
available SHARK dataset identifiers (e.g.,
|
save_dir |
Directory where zip files (and optionally their
extracted contents) should be stored. Defaults to |
prod |
Logical, whether to download from the production
( |
utv |
Logical. Select UTV server when |
unzip_file |
Logical, whether to extract downloaded zip
archives ( |
return_df |
Logical, whether to return a combined data frame
with the contents of all downloaded datasets ( |
encoding |
Character. File encoding of |
guess_encoding |
Logical. If |
verbose |
Logical, whether to show download and extraction
progress messages. Default is |
Value
If return_df = FALSE, a named list of character vectors.
Each element corresponds to one matched dataset and contains either
the path to the downloaded zip file (if unzip_file = FALSE) or
the path to the extraction directory (if unzip_file = TRUE).
If return_df = TRUE, a single combined data frame with all
dataset contents, including a source column indicating the dataset.
See Also
https://shark.smhi.se/en for SHARK database.
get_shark_options() for listing available datasets.
get_shark_data() for downloading tabular data.
Examples
# Get a specific dataset
get_shark_datasets("SHARK_Phytoplankton_2023_SMHI_BVVF", verbose = FALSE)
# Get all Zooplankton datasets from 2022 and unzip them
get_shark_datasets(
dataset_name = "Zooplankton_2022",
unzip_file = TRUE,
verbose = FALSE
)
# Get all Chlorophyll datasets and return as a combined data frame
combined_df <- get_shark_datasets(
dataset_name = "Chlorophyll",
return_df = TRUE,
verbose = FALSE
)
head(combined_df)
Retrieve available search options from SHARK
Description
The get_shark_options() function retrieves available search options from the SHARK database.
It sends a GET request to the SHARK API and returns the results as a structured named list.
Usage
get_shark_options(prod = TRUE, utv = FALSE, unparsed = FALSE)
Arguments
prod |
Logical value that selects the production server when |
utv |
Logical value that selects the UTV server when |
unparsed |
Logical. If |
Details
This function sends a GET request to the /api/options endpoint of the SHARK API
to retrieve available search filters and options that can be used in SHARK data queries.
Value
A named list of available search options from the SHARK API.
If unparsed = TRUE, returns the raw JSON structure as a list.
See Also
get_shark_data() for retrieving actual data from the SHARK API.
https://shark.smhi.se/en for the SHARK database portal.
Examples
# Retrieve available search options (simplified)
shark_options <- get_shark_options()
names(shark_options)
# Retrieve full unparsed JSON response
raw_options <- get_shark_options(unparsed = TRUE)
# View available datatypes
print(shark_options$dataTypes)
Summarize numeric SHARK parameters with ranges and outlier thresholds
Description
Downloads SHARK data for a given time period, filters to numeric parameters, and calculates descriptive statistics and Tukey outlier thresholds.
Usage
get_shark_statistics(
fromYear = NULL,
toYear = NULL,
datatype = NULL,
group_col = NULL,
min_obs = 3,
max_non_numeric_frac = 0.05,
cache_result = FALSE,
prod = TRUE,
utv = FALSE,
verbose = TRUE
)
Arguments
fromYear |
Start year for download (numeric). Defaults to 5 years before the last complete year. |
toYear |
End year for download (numeric). Defaults to the last complete year. |
datatype |
Optional, one or more datatypes to filter on
(e.g. |
group_col |
Optional column name in the SHARK data to group by
(e.g. |
min_obs |
Minimum number of numeric observations required for a parameter to be included (default: 3). |
max_non_numeric_frac |
Maximum allowed fraction of non-numeric values for a parameter to be kept (default: 0.05). |
cache_result |
Logical, whether to save the result in a persistent cache
( |
prod |
Logical, whether to download from the production
( |
utv |
Logical. Select UTV server when |
verbose |
Logical, whether to show download progress messages. Default is |
Details
By default, the function uses the previous five complete years. For example, if called in 2025 it will use data from 2020–2024.
Value
A tibble with one row per parameter (and optionally per group) and the following columns:
- parameter
Parameter name (character).
- datatype
SHARK datatype (character).
- min, Q1, median, Q3, max
Observed quantiles.
- P01, P05, P95, P99
1st, 5th, 95th and 99th percentiles.
- IQR
Interquartile range.
- mean
Arithmetic mean of numeric values.
- sd
Standard deviation of numeric values.
- var
Variance of numeric values.
- cv
Coefficient of variation (sd / mean).
- mad
Median absolute deviation.
- mild_lower, mild_upper
Lower/upper bounds for mild outliers (1.5 × IQR).
- extreme_lower, extreme_upper
Lower/upper bounds for extreme outliers (3 × IQR).
- n
Number of numeric observations used.
- fromYear
First year included in the SHARK data download (numeric).
- toYear
Last year included in the SHARK data download (numeric).
- <group_col>
Optional grouping column if provided.
Examples
# Uses previous 5 years automatically, Chlorophyll data only
res <- get_shark_statistics(datatype = "Chlorophyll", verbose = FALSE)
print(res)
# Group by station name and save result in persistent cache
res_station <- get_shark_statistics(datatype = "Chlorophyll",
group_col = "station_name",
cache_result = TRUE,
verbose = FALSE)
print(res_station)
Retrieve SHARK data table row counts
Description
The get_shark_table_counts() function retrieves the number of records (row counts)
from various SHARK data tables based on specified filters such as year, months,
data type, stations, and taxa. To view available filter options, see
get_shark_options.
Usage
get_shark_table_counts(
tableView = "sharkweb_overview",
fromYear = 2019,
toYear = 2020,
months = c(),
dataTypes = c(),
parameters = c(),
orderers = c(),
qualityFlags = c(),
deliverers = c(),
projects = c(),
datasets = c(),
minSamplingDepth = "",
maxSamplingDepth = "",
checkStatus = "",
redListedCategory = c(),
taxonName = c(),
stationName = c(),
vattenDistrikt = c(),
seaBasins = c(),
counties = c(),
municipalities = c(),
waterCategories = c(),
typOmraden = c(),
helcomOspar = c(),
seaAreas = c(),
prod = TRUE,
utv = FALSE
)
Arguments
tableView |
Character. Specifies the view of the table to retrieve. Options include:
Default is |
fromYear |
Integer. The starting year for the data to retrieve. Default is |
toYear |
Integer. The ending year for the data to retrieve. Default is |
months |
Integer vector. The months to retrieve data for (e.g., |
dataTypes |
Character vector. Specifies data types to filter, such as |
parameters |
Character vector. Optional. Parameters to filter results, such as |
orderers |
Character vector. Optional. Orderers to filter data by specific organizations. |
qualityFlags |
Character vector. Optional. Quality flags to filter data. |
deliverers |
Character vector. Optional. Deliverers to filter data by data providers. |
projects |
Character vector. Optional. Projects to filter data by specific research or monitoring projects. |
datasets |
Character vector. Optional. Datasets to filter data by specific dataset names. |
minSamplingDepth |
Numeric. Optional. Minimum depth (in meters) for sampling data. |
maxSamplingDepth |
Numeric. Optional. Maximum depth (in meters) for sampling data. |
checkStatus |
Character string. Optional. Status check to filter results. |
redListedCategory |
Character vector. Optional. Red-listed taxa for conservation filtering. |
taxonName |
Character vector. Optional. Taxa names for filtering specific species or taxa. |
stationName |
Character vector. Optional. Station names to retrieve data from specific stations. |
vattenDistrikt |
Character vector. Optional. Water district names to filter data by Swedish water districts. |
seaBasins |
Character vector. Optional. Sea basin names to filter data by different sea areas. |
counties |
Character vector. Optional. Counties to filter data within specific administrative regions in Sweden. |
municipalities |
Character vector. Optional. Municipalities to filter data within specific local regions. |
waterCategories |
Character vector. Optional. Water categories to filter data by. |
typOmraden |
Character vector. Optional. Type areas to filter data by specific areas. |
helcomOspar |
Character vector. Optional. HELCOM or OSPAR areas for regional filtering. |
seaAreas |
Character vector. Optional. Sea area codes for filtering by specific sea areas. |
prod |
Logical. Select production server when |
utv |
Logical. Select UTV server when |
Value
An integer representing the total number of rows in the requested SHARK table after applying the specified filters.
See Also
https://shark.smhi.se/en for SHARK database.
get_shark_options to see filter options
get_shark_data to download SHARK data
Examples
# Retrieve chlorophyll data for April to June from 2019 to 2020
shark_data_counts <- get_shark_table_counts(fromYear = 2019, toYear = 2020,
months = c(4, 5, 6), dataTypes = c("Chlorophyll"))
print(shark_data_counts)
Retrieve marine biotoxin data from IOC-UNESCO Toxins Database
Description
This function collects data from the IOC-UNESCO Toxins Database and returns information about toxins.
Usage
get_toxin_list(return_count = FALSE)
Arguments
return_count |
Logical. If |
Value
If return_count = TRUE, the function returns a numeric value representing the number of toxins in the database. Otherwise, it returns a tibble of toxins with detailed information.
See Also
https://toxins.hais.ioc-unesco.org/ for IOC-UNESCO Toxins Database.
Examples
# Retrieve the full list of toxins
toxin_list <- get_toxin_list()
head(toxin_list)
# Retrieve only the count of toxins
toxin_count <- get_toxin_list(return_count = TRUE)
print(toxin_count)
Retrieve hierarchical classification from WoRMS
Description
Retrieves the hierarchical taxonomy for one or more AphiaIDs from the World Register of Marine Species (WoRMS) and returns it in a wide format. Optionally, a hierarchy string column can be added that concatenates ranks.
Usage
get_worms_classification(
aphia_ids,
add_rank_to_hierarchy = FALSE,
verbose = TRUE
)
Arguments
aphia_ids |
Numeric vector of AphiaIDs to retrieve classification for. Must not be NULL or empty. Duplicates are allowed and will be preserved in the output. |
add_rank_to_hierarchy |
Logical (default FALSE). If TRUE, the hierarchy
string prepends rank names (e.g., |
verbose |
Logical (default TRUE). If TRUE, prints progress messages and a progress bar during data retrieval. |
Details
The function performs the following steps:
Validates input AphiaIDs and removes NA values.
Retrieves the hierarchical classification for each AphiaID using
worrms::wm_classification().Converts the hierarchy to a wide format with one column per rank.
Adds a
worms_hierarchystring concatenating all ranks.Preserves input order and duplicates.
Value
A tibble where each row corresponds to an input AphiaID. Typical
columns include:
- aphia_id
The AphiaID of the taxon (matches input).
- scientific_name
The last scientific name in the hierarchy for this AphiaID.
- taxonomic ranks
Columns for each rank present in the WoRMS hierarchy (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species). Missing ranks are NA.
- worms_hierarchy
A concatenated string of all ranks for this AphiaID. Added for every row if
wm_classification()returned hierarchy data. Format depends onadd_rank_to_hierarchy.
See Also
wm_classification, https://marinespecies.org/
Examples
# Single AphiaID
single_taxa <- get_worms_classification(109604, verbose = FALSE)
print(single_taxa)
# Multiple AphiaIDs
multiple_taxa <- get_worms_classification(c(109604, 376667), verbose = FALSE)
print(multiple_taxa)
# Hierarchy with ranks in the string
with_rank <- get_worms_classification(c(109604, 376667),
add_rank_to_hierarchy = TRUE,
verbose = FALSE)
# Print hierarchy columns with ranks
print(with_rank$worms_hierarchy[1])
# Compare with result when add_rank_to_hierarchy = FALSE
print(multiple_taxa$worms_hierarchy[1])
Retrieve WoRMS records
Description
This function retrieves records from the WoRMS (World Register of Marine Species) database using the worrms R package for a given list of Aphia IDs.
If the retrieval fails, it retries a specified number of times before stopping.
Usage
get_worms_records(
aphia_ids,
max_retries = 3,
sleep_time = 10,
verbose = TRUE,
aphia_id = deprecated()
)
Arguments
aphia_ids |
A vector of Aphia IDs for which records should be retrieved. |
max_retries |
An integer specifying the maximum number of retry attempts for each Aphia ID in case of failure. Default is 3. |
sleep_time |
A numeric value specifying the time (in seconds) to wait between retry attempts. Default is 10 seconds. |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
aphia_id |
Details
The function attempts to fetch records for each Aphia ID in the provided vector. If a retrieval fails, it retries up to
the specified max_retries, with a pause of sleep_time seconds between attempts. If all retries fail for an Aphia ID, the function
stops with an error message.
Value
A tibble containing the retrieved WoRMS records for the provided Aphia IDs. Each row corresponds to one Aphia ID.
See Also
https://marinespecies.org/ for WoRMS website.
https://CRAN.R-project.org/package=worrms
Examples
# Example usage with a vector of Aphia IDs
aphia_ids <- c(12345, 67890, 112233)
worms_records <- get_worms_records(aphia_ids, verbose = FALSE)
print(worms_records)
Retrieve WoRMS records by taxonomic names with retry logic
Description
This function has been deprecated. Users are encouraged to use match_worms_taxa instead.
This function retrieves records from the WoRMS database using the worrms R package for a vector of taxonomic names.
It includes retry logic to handle temporary failures and ensures all names are processed.
Usage
get_worms_records_name(
taxa_names,
fuzzy = TRUE,
best_match_only = TRUE,
max_retries = 3,
sleep_time = 10,
marine_only = TRUE,
verbose = TRUE
)
Arguments
taxa_names |
A vector of taxonomic names for which to retrieve records. |
fuzzy |
A logical value indicating whether to search using a fuzzy search pattern. Default is TRUE. |
best_match_only |
A logical value indicating whether to automatically select the first match and return a single match. Default is TRUE. |
max_retries |
An integer specifying the maximum number of retries for the request in case of failure. Default is 3. |
sleep_time |
A numeric value specifying the number of seconds to wait before retrying a failed request. Default is 10. |
marine_only |
A logical value indicating whether to restrict the results to marine taxa only. Default is |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
Details
The function attempts to retrieve records for the input taxonomic names using the wm_records_names function from the WoRMS API.
If a request fails, it retries up to max_retries times, pausing for sleep_time seconds between attempts.
If all attempts fail, the function stops and throws an error.
Value
A tibble containing the retrieved WoRMS records. Each row corresponds to a record for a taxonomic name.
See Also
https://marinespecies.org/ for WoRMS website.
https://CRAN.R-project.org/package=worrms
Examples
# Retrieve WoRMS records for the taxonomic names "Amphidinium" and "Karenia"
records <- get_worms_records_name(c("Amphidinium", "Karenia"),
max_retries = 3, sleep_time = 5, marine_only = TRUE)
Retrieve hierarchical taxonomy data from WoRMS
Description
Retrieves the hierarchical taxonomy for one or more AphiaIDs from the World Register of Marine Species (WoRMS). Optionally, the function can include all descendants of taxa at a specified rank and/or synonyms for all retrieved taxa.
Usage
get_worms_taxonomy_tree(
aphia_ids,
add_descendants = FALSE,
add_descendants_rank = "Species",
add_synonyms = FALSE,
add_hierarchy = FALSE,
add_rank_to_hierarchy = FALSE,
verbose = TRUE
)
Arguments
aphia_ids |
Numeric vector of AphiaIDs to retrieve taxonomy for. Must not be missing or all NA. |
add_descendants |
Logical (default FALSE). If TRUE, retrieves all
child taxa for each taxon at the rank specified by |
add_descendants_rank |
Character (default |
add_synonyms |
Logical (default FALSE). If TRUE, retrieves synonym records for all retrieved taxa and appends them to the dataset. |
add_hierarchy |
Logical (default FALSE). If TRUE, adds a |
add_rank_to_hierarchy |
Logical (default FALSE). If TRUE, the hierarchy
string prepends rank names (e.g., |
verbose |
Logical (default TRUE). If TRUE, prints progress messages and progress bars during data retrieval. |
Details
The function performs the following steps:
Validates input AphiaIDs and removes NA values.
Retrieves the hierarchical classification for each AphiaID using
worrms::wm_classification().Optionally retrieves all descendants at the rank specified by
add_descendants_rankifadd_descendants = TRUE.Optionally retrieves synonyms for all retrieved taxa if
add_synonyms = TRUE.Optionally adds a
hierarchycolumn ifadd_hierarchy = TRUE.Returns a combined, distinct dataset of all records.
Value
A tibble containing detailed WoRMS records for all requested
AphiaIDs, including optional descendants and synonyms. Typical columns
include:
- AphiaID
The AphiaID of the taxon.
- parentNameUsageID
The AphiaID of the parent taxon.
- scientificname
Scientific name of the taxon.
- rank
Taxonomic rank (e.g., Kingdom, Phylum, Genus, Species).
- status
Taxonomic status (e.g., accepted, unaccepted).
- valid_AphiaID
AphiaID of the accepted taxon, if the record is a synonym.
- species
Added only if a
Speciesrank exists in the retrieved data and ifadd_hierarchy = TRUE; otherwise not present.- parentName
Added only if a
parentNamerank exists in the retrieved data and ifadd_hierarchy = TRUE; otherwise not present.- hierarchy
Added only if
add_hierarchy = TRUEand hierarchy data are available. Contains a concatenated string of the taxonomic path.- ...
Additional columns returned by WoRMS, including authorship and source information.
See Also
add_worms_taxonomy, construct_dyntaxa_table
wm_classification, wm_children, wm_synonyms
https://marinespecies.org/ for the WoRMS website.
Examples
# Retrieve hierarchy for a single AphiaID
get_worms_taxonomy_tree(aphia_ids = 109604, verbose = FALSE)
# Retrieve hierarchy including species-level descendants
get_worms_taxonomy_tree(
aphia_ids = c(109604, 376667),
add_descendants = TRUE,
verbose = FALSE
)
# Retrieve hierarchy including hierarchy column
get_worms_taxonomy_tree(
aphia_ids = c(109604, 376667),
add_hierarchy = TRUE,
verbose = FALSE
)
Determine if positions are near land
Description
This function has been deprecated. Users are encouraged to use positions_are_near_land instead.
Determines whether given positions are near land based on a coastline shape file.
The Natural Earth 1:50m land vectors are included as default shapefile in SHARK4R.
Usage
ifcb_is_near_land(
latitudes,
longitudes,
distance = 500,
shape = NULL,
crs = 4326,
utm_zone = 33,
remove_small_islands = TRUE,
small_island_threshold = 2e+06
)
Arguments
latitudes |
Numeric vector of latitudes for positions. |
longitudes |
Numeric vector of longitudes for positions. |
distance |
Buffer distance in meters around the coastline. Default is 500 m. |
shape |
Optional path to a shapefile containing coastline data. If provided, the function will use this shapefile instead of the default Natural Earth 1:50m land vectors. Using a more detailed shapefile allows for a smaller buffer distance. For detailed European coastlines, download polygons from the EEA at https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-2/gis-data/eea-coastline-polygon. For more detailed world maps, download from Natural Earth at https://www.naturalearthdata.com/downloads/10m-physical-vectors/. |
crs |
Coordinate reference system (CRS) to use for positions and output. Default is EPSG code 4326 (WGS84). |
utm_zone |
UTM zone for buffering the coastline. Default is 33 (between 12°E and 18°E, northern hemisphere). |
remove_small_islands |
Logical indicating whether to remove small islands from the coastline if a custom shapefile is provided. Default is TRUE. |
small_island_threshold |
Area threshold in square meters below which islands will be considered small and removed, if remove_small_islands is set to TRUE. Default is 2 sqkm. |
Details
This function calculates a buffered area around the coastline and checks if given positions (specified by longitudes and latitudes) are within this buffer or intersect with land.
This function is re-exported from the iRfcb package available at https://github.com/EuropeanIFCBGroup/iRfcb
Value
Logical vector indicating whether each position is near land.
Examples
# Define coordinates
latitudes <- c(62.500353, 58.964498, 57.638725, 56.575338)
longitudes <- c(17.845993, 20.394418, 18.284523, 16.227174)
# Call the function
near_land <- ifcb_is_near_land(latitudes, longitudes, distance = 300, crs = 4326)
# Print the result
print(near_land)
Determine if points are in a specified sea basin
Description
This function has been deprecated. Users are encouraged to use which_basin instead.
This function identifies which sub-basin a set of latitude and longitude points belong to, using a user-specified or default shapefile.
The default shapefile includes the Baltic Sea, Kattegat, and Skagerrak basins and is included in the SHARK4R package.
Usage
ifcb_which_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
Arguments
latitudes |
A numeric vector of latitude points. |
longitudes |
A numeric vector of longitude points. |
plot |
A boolean indicating whether to plot the points along with the sea basins. Default is FALSE. |
shape_file |
The absolute path to a custom polygon shapefile in WGS84 (EPSG:4326) that represents the sea basin.
Defaults to the Baltic Sea, Kattegat, and Skagerrak basins included in the |
Details
This function reads a pre-packaged shapefile of the Baltic Sea, Kattegat, and Skagerrak basins from the SHARK4R package by default, or a user-supplied
shapefile if provided. The shapefiles originate from SHARK (https://shark.smhi.se/en/). It sets the CRS, transforms the CRS to WGS84 (EPSG:4326) if necessary, and checks if the given points
fall within the specified sea basin. Optionally, it plots the points and the sea basin polygons together.
This function is re-exported from the iRfcb package available at https://github.com/EuropeanIFCBGroup/iRfcb
Value
A vector indicating the basin each point belongs to, or a ggplot object if plot = TRUE.
Examples
# Define example latitude and longitude vectors
latitudes <- c(55.337, 54.729, 56.311, 57.975)
longitudes <- c(12.674, 14.643, 12.237, 10.637)
# Check in which Baltic sea basin the points are in
points_in_the_baltic <- ifcb_which_basin(latitudes, longitudes)
print(points_in_the_baltic)
# Plot the points and the basins
map <- ifcb_which_basin(latitudes, longitudes, plot = TRUE)
Check if taxon names exist in Dyntaxa
Description
Checks whether the supplied scientific names exist in the Swedish taxonomic database Dyntaxa. Optionally, returns a data frame with taxon names, taxon IDs, and match status.
Usage
is_in_dyntaxa(
taxon_names,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
use_dwca = FALSE,
return_df = FALSE,
verbose = FALSE
)
Arguments
taxon_names |
Character vector of taxon names to check. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
use_dwca |
Logical; if TRUE, uses the DwCA version of Dyntaxa instead of querying the API. |
return_df |
Logical; if TRUE, returns a data frame with columns |
verbose |
Logical; if TRUE, prints messages about unmatched taxa. |
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Value
If return_df = FALSE (default), a logical vector indicating whether each input
name was found in Dyntaxa. Returned invisibly if verbose = TRUE.
If return_df = TRUE, a data frame with columns:
-
taxon_name: original input names -
taxon_id: corresponding Dyntaxa taxon IDs (NA if not found) -
match: logical indicating presence in Dyntaxa
Examples
## Not run:
# Using an environment variable (recommended for convenience)
Sys.setenv(DYNTAXA_KEY = "your_key_here")
is_in_dyntaxa(c("Skeletonema marinoi", "Nonexistent species"))
# Return a data frame instead of logical vector
is_in_dyntaxa(c("Skeletonema marinoi", "Nonexistent species"), return_df = TRUE)
# Or pass the key directly
is_in_dyntaxa("Skeletonema marinoi", subscription_key = "your_key_here")
# Suppress messages
is_in_dyntaxa("Skeletonema marinoi", verbose = FALSE)
## End(Not run)
Load SHARK4R fields from GitHub
Description
This function downloads and sources the SHARK4R required and recommended field definitions directly from the SHARK4R-statistics GitHub repository.
Usage
load_shark4r_fields(verbose = TRUE)
Arguments
verbose |
Logical; if |
Details
The definitions are stored in an R script (fields.R) located in the fields/ folder of the repository.
The function sources this file directly from GitHub into the current R session.
The sourced script defines two main objects:
-
required_fields— vector or data frame of required SHARK fields. -
recommended_fields— vector or data frame of recommended SHARK fields.
The output of this function can be directly supplied to the
check_fields function through its field_definitions argument
for validating SHARK4R data consistency.
If sourcing fails (e.g., due to a network issue or repository changes), the function throws an error with a descriptive message.
Value
Invisibly returns a list with two elements:
- required_fields
Object containing required SHARK fields.
- recommended_fields
Object containing recommended SHARK fields.
See Also
check_fields for validating datasets using the loaded field definitions (as field_definitions).
load_shark4r_stats for loading precomputed SHARK4R statistics,
Examples
# Load SHARK4R field definitions from GitHub
fields <- load_shark4r_fields(verbose = FALSE)
# Access required or recommended fields for the first entry
fields[[1]]$required
fields[[1]]$recommended
## Not run:
# Use the loaded definitions in check_fields()
check_fields(my_data, field_definitions = fields)
## End(Not run)
Load SHARK4R statistics from GitHub
Description
This function downloads and loads precomputed SHARK4R statistical data
(e.g., threshold or summary statistics) directly from the
SHARK4R-statistics GitHub repository.
The data are stored as .rds files and read into R as objects.
Usage
load_shark4r_stats(file_name = "sea_basin.rds", verbose = TRUE)
Arguments
file_name |
Character string specifying the name of the |
verbose |
Logical; if |
Details
The function retrieves the file from the GitHub repository’s data/ folder.
It temporarily downloads the file to the local system and then reads it into R using readRDS().
If the download fails (e.g., due to a network issue or invalid filename), the function throws an error with a descriptive message.
Value
An R object (typically a tibble or data.frame) read from the specified .rds file.
See Also
check_outliers for detecting threshold exceedances using the loaded statistics,
get_shark_statistics for generating and caching statistical summaries used in SHARK4R.
scatterplot for generating interactive plots with threshold values.
Examples
# Load the default SHARK4R statistics file
stats <- load_shark4r_stats(verbose = FALSE)
print(stats)
# Load a specific file
thresholds <- load_shark4r_stats("scientific_name.rds", verbose = FALSE)
print(thresholds)
Load station database (station.txt) from path, NODC_CONFIG, or package bundle
Description
Load station database (station.txt) from path, NODC_CONFIG, or package bundle
Usage
load_station_bundle(station_file = NULL, verbose = TRUE)
Arguments
station_file |
Optional path to a station.txt file. |
verbose |
Logical; if TRUE, prints messaging about which source is used. |
Value
A data frame containing the station database.
Lookup spatial information for geographic points
Description
Retrieves shore distance, environmental grids, and area values for given coordinates. Coordinates may be supplied either through a data frame or as separate numeric vectors.
Usage
lookup_xy(
data = NULL,
lon = NULL,
lat = NULL,
shoredistance = TRUE,
grids = TRUE,
areas = FALSE,
as_data_frame = TRUE
)
Arguments
data |
Optional data frame containing coordinate columns. The expected names are
|
lon |
Optional numeric vector of longitudes. Must be supplied together with |
lat |
Optional numeric vector of latitudes. Must be supplied together with |
shoredistance |
Logical; if |
grids |
Logical; if |
areas |
Logical or numeric. When logical, |
as_data_frame |
Logical; if |
Details
When both vector inputs and a data frame are provided, the vector inputs take precedence.
Coordinates are validated and cleaned before lookup, and only unique values are queried.
Queries are processed in batches to avoid overloading the remote service.
Area retrieval accepts either a logical flag or a radius. A radius of zero corresponds to requesting a single area value.
Final results are reordered to match the original input positions.
The function has been modified from the
obistoolspackage (Provoost and Bosch, 2024).
Value
A data frame or list, depending on as_data_frame. Invalid coordinates produce
NA entries (data frame) or NULL elements (list). Duplicate input coordinates
return repeated results.
References
Provoost P, Bosch S (2024). “obistools: Tools for data enhancement and quality control” Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. R package version 0.1.0, https://iobis.github.io/obistools/.
See Also
check_onland, check_depth, https://iobis.github.io/xylookup/ – OBIS xylookup web service
Examples
# Using a data frame
df <- data.frame(sample_longitude_dd = c(10.9, 18.3),
sample_latitude_dd = c(58.1, 58.3))
lookup_xy(df)
# Area search within a radius
lookup_xy(df, areas = 500)
# Using separate coordinate vectors
lookup_xy(lon = c(10.9, 18.3), lat = c(58.1, 58.3))
Search AlgaeBase for taxonomic information
Description
This function has been deprecated. Users are encouraged to use match_algaebase_taxa instead.
This function queries the AlgaeBase API to retrieve taxonomic information for a list of algae names based on genus and (optionally) species. It supports exact matching, genus-only searches, and retrieval of higher taxonomic ranks.
Usage
match_algaebase(
genus,
species,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
genus_only = FALSE,
higher = TRUE,
unparsed = FALSE,
exact_matches_only = TRUE,
sleep_time = 1,
newest_only = TRUE,
verbose = TRUE,
apikey = deprecated()
)
Arguments
genus |
A character vector of genus names. |
species |
A character vector of species names corresponding to the |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable You can provide the key in three ways:
|
genus_only |
Logical. If |
higher |
Logical. If |
unparsed |
Logical. If |
exact_matches_only |
Logical. If |
sleep_time |
Numeric. The delay (in seconds) between consecutive AlgaeBase API queries. Defaults to |
newest_only |
A logical value indicating whether to return only the most recent entries (default is |
verbose |
Logical. If |
apikey |
Details
A valid API key is requested from the AlgaeBase team.
Scientific names can be parsed using the parse_scientific_names() function before being processed by match_algaebase().
Duplicate genus-species combinations are handled efficiently by querying each unique combination only once. Genus-only searches are performed when genus_only = TRUE
or when the species name is missing or invalid. Errors during API queries are gracefully handled by returning rows with NA values for missing or unavailable data.
The function allows for integration with data analysis workflows that require resolving or verifying taxonomic names against AlgaeBase.
Value
A data frame containing taxonomic information for each input genus–species combination. The following columns may be included:
-
id— AlgaeBase ID (if available). -
kingdom,phylum,class,order,family— Higher taxonomy (returned ifhigher = TRUE). -
genus,species,infrasp— Genus, species, and infraspecies names (if applicable). -
taxonomic_status— Status of the name (e.g., accepted, synonym, unverified). -
currently_accepted— Logical indicator whether the name is currently accepted (TRUE/FALSE). -
accepted_name— Currently accepted name if different from the input name. -
input_name— The name supplied by the user. -
input_match— Indicator of exact match (1= exact,0= not exact). -
taxon_rank— Taxonomic rank of the accepted name (e.g., genus, species). -
mod_date— Date when the entry was last modified in AlgaeBase. -
long_name— Full species name with authorship and date. -
authorship— Author(s) associated with the species name.
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
## Not run:
# Example with genus and species vectors
genus_vec <- c("Thalassiosira", "Skeletonema", "Tripos")
species_vec <- c("pseudonana", "costatum", "furca")
algaebase_results <- match_algaebase(
genus = genus_vec,
species = species_vec,
subscription_key = "your_api_key",
exact_matches_only = TRUE,
verbose = TRUE
)
head(algaebase_results)
## End(Not run)
Search AlgaeBase for information about a genus of algae
Description
This function searches the AlgaeBase API for genus information and returns detailed taxonomic data, including higher taxonomy, taxonomic status, scientific names, and other related metadata.
Usage
match_algaebase_genus(
genus,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
higher = TRUE,
unparsed = FALSE,
newest_only = TRUE,
exact_matches_only = TRUE,
apikey = deprecated()
)
Arguments
genus |
The genus name to search for (character string). This parameter is required. |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable You can provide the key in three ways:
|
higher |
A boolean flag indicating whether to include higher taxonomy in the output (default is TRUE). |
unparsed |
A boolean flag indicating whether to return the raw JSON output from the API (default is FALSE). |
newest_only |
A boolean flag to return only the most recent entry (default is TRUE). |
exact_matches_only |
A boolean flag to limit results to exact matches (default is TRUE). |
apikey |
Details
A valid API key is requested from the AlgaeBase team.
Value
A data frame with the following columns:
-
id— AlgaeBase identifier. -
accepted_name— Accepted scientific name (if different from the input). -
input_name— The genus name supplied by the user. -
input_match— Indicator of exact match (1= exact,0= not exact). -
currently_accepted— Indicator if the taxon is currently accepted (1= TRUE,0= FALSE). -
genus_only— Indicator if the search was for a genus only (1= genus,0= genus + species). -
kingdom,phylum,class,order,family— Higher taxonomy (returned ifhigher = TRUE). -
taxonomic_status— Status of the taxon (e.g., currently accepted, synonym, unverified). -
taxon_rank— Taxonomic rank of the accepted name (e.g., genus, species). -
mod_date— Date when the entry was last modified. -
long_name— Full scientific name including author and date (if available). -
authorship— Author information (if available).
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
## Not run:
match_algaebase_genus("Anabaena", subscription_key = "your_api_key")
## End(Not run)
Search AlgaeBase for information about a species of algae
Description
This function searches the AlgaeBase API for species based on genus and species names. It allows for flexible search parameters such as filtering by exact matches, returning the most recent results, and including higher taxonomy details.
Usage
match_algaebase_species(
genus,
species,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
higher = TRUE,
unparsed = FALSE,
newest_only = TRUE,
exact_matches_only = TRUE,
apikey = deprecated()
)
Arguments
genus |
A character string specifying the genus name. |
species |
A character string specifying the species or specific epithet. |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable You can provide the key in three ways:
|
higher |
A logical value indicating whether to include higher taxonomy details (default is |
unparsed |
A logical value indicating whether to print the full JSON response from the API (default is |
newest_only |
A logical value indicating whether to return only the most recent entries (default is |
exact_matches_only |
A logical value indicating whether to return only exact matches (default is |
apikey |
Details
A valid API key is requested from the AlgaeBase team.
This function queries the AlgaeBase API for species based on the genus and species names, and filters the results based on various parameters. The function handles different taxonomic ranks and formats the output for easy use. It can merge higher taxonomy data if requested.
Value
A data frame with details about the species, including:
-
taxonomic_status— The current status of the taxon (e.g., accepted, synonym, unverified). -
taxon_rank— The rank of the taxon (e.g., species, genus). -
accepted_name— The currently accepted scientific name, if applicable. -
authorship— Author information for the scientific name (if available). -
mod_date— Date when the taxonomic record was last modified. -
...— Other relevant information returned by the data source.
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
## Not run:
# Search for a species with exact matches only, return the most recent results
result <- match_algaebase_species(
genus = "Skeletonema", species = "marinoi", subscription_key = "your_api_key"
)
# Print result
print(result)
## End(Not run)
Search AlgaeBase for taxonomic information
Description
This function queries the AlgaeBase API to retrieve taxonomic information for a list of algae names based on genus and (optionally) species. It supports exact matching, genus-only searches, and retrieval of higher taxonomic ranks.
Usage
match_algaebase_taxa(
genera,
species,
subscription_key = Sys.getenv("ALGAEBASE_KEY"),
genus_only = FALSE,
higher = TRUE,
unparsed = FALSE,
exact_matches_only = TRUE,
sleep_time = 1,
newest_only = TRUE,
verbose = TRUE,
apikey = deprecated(),
genus = deprecated()
)
Arguments
genera |
A character vector of genus names. |
species |
A character vector of species names corresponding to the |
subscription_key |
A character string containing the API key for accessing the AlgaeBase API. By default, the key
is read from the environment variable
|
genus_only |
Logical. If |
higher |
Logical. If |
unparsed |
Logical. If |
exact_matches_only |
Logical. If |
sleep_time |
Numeric. The delay (in seconds) between consecutive AlgaeBase API queries. Defaults to |
newest_only |
A logical value indicating whether to return only the most recent entries (default is |
verbose |
Logical. If |
apikey |
|
genus |
Details
A valid API key is requested from the AlgaeBase team.
Scientific names can be parsed using the parse_scientific_names() function before being processed by match_algaebase_taxa().
Duplicate genus-species combinations are handled efficiently by querying each unique combination only once. Genus-only searches are performed when genus_only = TRUE
or when the species name is missing or invalid. Errors during API queries are gracefully handled by returning rows with NA values for missing or unavailable data.
The function allows for integration with data analysis workflows that require resolving or verifying taxonomic names against AlgaeBase.
Value
A data frame containing taxonomic information for each input genus–species combination. The following columns may be included:
-
id— AlgaeBase ID (if available). -
kingdom,phylum,class,order,family— Higher taxonomy (returned ifhigher = TRUE). -
genus,species,infrasp— Genus, species, and infraspecies names (if applicable). -
taxonomic_status— Status of the name (e.g., accepted, synonym, unverified). -
currently_accepted— Logical indicator whether the name is currently accepted (TRUE/FALSE). -
accepted_name— Currently accepted name if different from the input name. -
input_name— The name supplied by the user. -
input_match— Indicator of exact match (1= exact,0= not exact). -
taxon_rank— Taxonomic rank of the accepted name (e.g., genus, species). -
mod_date— Date when the entry was last modified in AlgaeBase. -
long_name— Full species name with authorship and date. -
authorship— Author(s) associated with the species name.
See Also
https://www.algaebase.org/ for AlgaeBase website.
parse_scientific_names for parsing taxonomic names before passing them to the function.
Examples
## Not run:
# Example with genus and species vectors
genus_vec <- c("Thalassiosira", "Skeletonema", "Tripos")
species_vec <- c("pseudonana", "costatum", "furca")
algaebase_results <- match_algaebase_taxa(
genera = genus_vec,
species = species_vec,
subscription_key = "your_api_key",
exact_matches_only = TRUE,
verbose = TRUE
)
head(algaebase_results)
## End(Not run)
Taxon matching using Dyntaxa (https://www.dyntaxa.se/)
Description
This function is deprecated and has been replaced by is_in_dyntaxa().
Usage
match_dyntaxa(names, subscription_key = Sys.getenv("DYNTAXA_KEY"))
Arguments
names |
Character vector of scientific names to check in Dyntaxa. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable |
Details
This function is retained for backward compatibility but may be removed in future versions.
Use the newer function is_in_dyntaxa() instead.
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Value
A logical vector indicating whether each input name was found in Dyntaxa,
same as is_in_dyntaxa(). Messages about unmatched taxa are printed.
Examples
## Not run:
# Deprecated function usage
match_dyntaxa(c("Skeletonema marinoi", "Nonexistent species"),
subscription_key = "your_key_here")
## End(Not run)
Match Dyntaxa taxon names
Description
This function matches a list of taxon names against the SLU Artdatabanken API (Dyntaxa) and retrieves the best matches along with their taxon IDs.
Usage
match_dyntaxa_taxa(
taxon_names,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
multiple_options = FALSE,
searchFields = "Both",
isRecommended = "NotSet",
isOkForObservationSystems = "NotSet",
culture = "sv_SE",
page = 1,
pageSize = 100,
verbose = TRUE
)
Arguments
taxon_names |
A vector of taxon names to match. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
multiple_options |
Logical. If TRUE, the function will return multiple matching names. Default is FALSE, selecting the first match. |
searchFields |
A character string indicating the search fields. Defaults to 'Both'. |
isRecommended |
A character string indicating whether the taxon is recommended. Defaults to 'NotSet'. |
isOkForObservationSystems |
A character string indicating whether the taxon is suitable for observation systems. Defaults to 'NotSet'. |
culture |
A character string indicating the culture. Defaults to 'sv_SE'. |
page |
An integer specifying the page number for pagination. Defaults to 1. |
pageSize |
An integer specifying the page size for pagination. Defaults to 100. |
verbose |
Logical. Print progress bar. Default is TRUE. |
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A data frame containing the search pattern, taxon ID, and best match for each taxon name.
See Also
SLU Artdatabanken API Documentation
Examples
## Not run:
# Match taxon names against SLU Artdatabanken API
matched_taxa <- match_dyntaxa_taxa(c("Homo sapiens", "Canis lupus"), "your_subscription_key")
print(matched_taxa)
## End(Not run)
Match station names against SMHI station list
Description
Matches reported station names in your dataset against a curated station list
("station.txt"), which is synced with "Stationsregistret":
https://stationsregister.miljodatasamverkan.se/.
Usage
match_station(names, station_file = NULL, try_synonyms = TRUE, verbose = TRUE)
Arguments
names |
Character vector of station names to match. |
station_file |
Optional path to a custom station file (tab-delimited).
If |
try_synonyms |
Logical; if |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
This function is useful for validating station names and identifying any unmatched or misspelled entries.
If try_synonyms = TRUE, unmatched station names are also compared
against the SYNONYM_NAMES column in the station database, splitting
multiple synonyms separated by <or>.
The function first checks if a station file path is provided via the
station_file argument. If not, it looks for the
NODC_CONFIG environment variable. This variable can point to a folder
where the NODC (Swedish National Oceanographic Data Center) configuration and station file
are stored, typically including:
-
<NODC_CONFIG>/config/station.txt
If NODC_CONFIG is set and the folder exists, the function will use
station.txt from that location. Otherwise, it falls back to the
bundled station.zip included in the SHARK4R package.
Value
A data frame with two columns:
- reported_station_name
The input station names.
- match_type
Logical;
TRUEif the station was found in the SMHI station list (including synonyms if enabled), otherwiseFALSE.
Examples
# Example stations
stations <- c("ANHOLT E", "BY5 BORNHOLMSDJ", "STX999")
# Check if stations names are in stations.txt (including synonyms)
match_station(stations, try_synonyms = TRUE, verbose = FALSE)
Match Dyntaxa taxon names
Description
This function has been deprecated. Users are encouraged to use match_dyntaxa_taxa instead.
This function matches a list of taxon names against the SLU Artdatabanken API (Dyntaxa) and retrieves the best matches along with their taxon IDs.
Usage
match_taxon_name(
taxon_names,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
multiple_options = FALSE,
searchFields = "Both",
isRecommended = "NotSet",
isOkForObservationSystems = "NotSet",
culture = "sv_SE",
page = 1,
pageSize = 100,
verbose = TRUE
)
Arguments
taxon_names |
A vector of taxon names to match. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
multiple_options |
Logical. If TRUE, the function will return multiple matching names. Default is FALSE, selecting the first match. |
searchFields |
A character string indicating the search fields. Defaults to 'Both'. |
isRecommended |
A character string indicating whether the taxon is recommended. Defaults to 'NotSet'. |
isOkForObservationSystems |
A character string indicating whether the taxon is suitable for observation systems. Defaults to 'NotSet'. |
culture |
A character string indicating the culture. Defaults to 'sv_SE'. |
page |
An integer specifying the page number for pagination. Defaults to 1. |
pageSize |
An integer specifying the page size for pagination. Defaults to 100. |
verbose |
Logical. Print progress bar. Default is TRUE. |
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A data frame containing the search pattern, taxon ID, and best match for each taxon name.
See Also
SLU Artdatabanken API Documentation
Examples
## Not run:
# Match taxon names against SLU Artdatabanken API
matched_taxa <- match_taxon_name(c("Homo sapiens", "Canis lupus"), "your_subscription_key")
print(matched_taxa)
## End(Not run)
Retrieve WoRMS records by taxonomic names with retry logic
Description
This function retrieves records from the WoRMS database using the worrms R package for a vector of taxonomic names.
It includes retry logic to handle temporary failures and ensures all names are processed. The function can query
all names at once using a bulk API call or iterate over names individually.
Usage
match_worms_taxa(
taxa_names,
fuzzy = TRUE,
best_match_only = TRUE,
max_retries = 3,
sleep_time = 10,
marine_only = TRUE,
bulk = FALSE,
chunk_size = 500,
verbose = TRUE
)
Arguments
taxa_names |
A character vector of taxonomic names for which to retrieve records. |
fuzzy |
A logical value indicating whether to perform a fuzzy search. Default is TRUE.
Note: Fuzzy search is only applied in iterative mode ( |
best_match_only |
A logical value indicating whether to automatically select the first match and return a single match. Default is TRUE. |
max_retries |
Integer specifying the maximum number of retries for the request in case of failure. Default is 3. |
sleep_time |
Numeric specifying the number of seconds to wait before retrying a failed request. Default is 10. |
marine_only |
Logical indicating whether to restrict results to marine taxa only. Default is TRUE. |
bulk |
Logical indicating whether to perform a bulk API call for all unique names at once. Default is FALSE. |
chunk_size |
Integer specifying the maximum number of taxa per bulk API request. Default is 500.
Only used when |
verbose |
Logical indicating whether to print progress messages. Default is TRUE. |
Details
If
bulk = TRUE, all unique names are sent to the API in a single request. Fuzzy matching is ignored.If
bulk = FALSE, the function iterates over names individually, optionally using fuzzy matching.The function retries failed requests up to
max_retriestimes, pausing forsleep_timeseconds between attempts.Names for which no records are found will have
status = "no content"andAphiaID = NA.Names are cleaned before being passed to the API call by converting them to UTF-8, replacing problematic symbols with spaces, removing trailing periods, collapsing extra spaces and by trimming whitespace.
Value
A tibble containing the retrieved WoRMS records. Each row corresponds to a record for a taxonomic name.
Repeated taxa in the input are preserved in the output.
See Also
https://marinespecies.org/ for WoRMS website.
https://CRAN.R-project.org/package=worrms
Examples
# Retrieve WoRMS records iteratively for two taxonomic names
records <- match_worms_taxa(c("Amphidinium", "Karenia"),
max_retries = 3,
sleep_time = 5,
marine_only = TRUE,
verbose = FALSE)
print(records)
# Retrieve WoRMS records in bulk mode (faster for many names)
records_bulk <- match_worms_taxa(c("Amphidinium", "Karenia", "Navicula"),
bulk = TRUE,
marine_only = TRUE,
verbose = FALSE)
Taxon matching using WoRMS (http://www.marinespecies.org/)
Description
This function has been deprecated. Users are encouraged to use match_worms_taxa instead.
matches latin name in data with WoRMS taxon list
Usage
match_wormstaxa(names, ask = TRUE)
Arguments
names |
Vector of scientific names. |
ask |
Ask user in case of multiple matches. |
Value
Data frame with scientific name, scientific name ID and match type.
References
Provoost P, Bosch S (2025). obistools: Tools for data enhancement and quality control. Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. R package version 0.1.0, https://iobis.github.io/obistools/.
Check if stations are reported as nominal positions
Description
This function is deprecated and has been replaced by
check_nominal_station().
Usage
nominal_station(data)
Arguments
data |
A data frame containing at least the columns:
|
Details
This function attempts to determine whether stations in a dataset are reported using nominal positions (i.e., generic or repeated coordinates across events), rather than actual measured coordinates. It compares the number of unique sampling dates with the number of unique station coordinates.
If the number of unique sampling dates is larger than the number of unique station coordinates, the function suspects nominal station positions and issues a warning.
Value
A data frame with distinct station names and their corresponding
latitude/longitude positions, if nominal positions are suspected.
Otherwise, returns NULL.
Examples
df <- data.frame(
sample_date = rep(seq.Date(Sys.Date(), by = "day", length.out = 3), each = 2),
station_name = rep(c("ST1", "ST2"), 3),
sample_longitude_dd = rep(c(15.0, 16.0), 3),
sample_latitude_dd = rep(c(58.5, 58.6), 3)
)
nominal_station(df)
Parse scientific names into genus and species components.
Description
This function processes a character vector of scientific names, splitting them into genus and species components. It handles binomial names (e.g., "Homo sapiens"), removes undesired descriptors (e.g., 'Cfr.', 'cf.', 'sp.', 'spp.'), and manages cases involving varieties, subspecies, or invalid species names. Special characters and whitespace are handled appropriately.
Usage
parse_scientific_names(
scientific_names,
remove_undesired_descriptors = TRUE,
remove_subspecies = TRUE,
remove_invalid_species = TRUE,
encoding = "UTF-8",
scientific_name = deprecated()
)
Arguments
scientific_names |
A character vector containing scientific names, which may include binomials, additional descriptors, or varieties. |
remove_undesired_descriptors |
Logical, if TRUE, undesired descriptors (e.g., 'Cfr.', 'cf.', 'colony', 'cells', etc.) are removed. Default is TRUE. |
remove_subspecies |
Logical, if TRUE, subspecies/variety descriptors (e.g., 'var.', 'subsp.', 'f.', etc.) are removed. Default is TRUE. |
remove_invalid_species |
Logical, if TRUE, invalid species names (e.g., 'sp.', 'spp.') are removed. Default is TRUE. |
encoding |
A string specifying the encoding to be used for the input names (e.g., 'UTF-8'). Default is 'UTF-8'. |
scientific_name |
Value
A data frame with two columns:
-
genus— Genus names. -
species— Species names (empty if unavailable or invalid). Invalid descriptors such as"sp.","spp.", and numeric entries are excluded from this column.
See Also
https://www.algaebase.org/ for AlgaeBase website.
Examples
# Example with a vector of scientific names
scientific_names <- c("Skeletonema marinoi", "Cf. Azadinium perforatum", "Gymnodinium sp.",
"Melosira varians", "Aulacoseira islandica var. subarctica")
# Parse names
result <- parse_scientific_names(scientific_names)
# Check the resulting data frame
print(result)
Create an interactive Leaflet map of sampling stations
Description
Generates an interactive map using the leaflet package, plotting sampling
stations from a data frame. The function automatically detects column names
for station, longitude, and latitude, supporting both standard and
delivery-style datasets.
Usage
plot_map_leaflet(data, provider = "CartoDB.Positron")
Arguments
data |
A data frame containing station coordinates and names. The function accepts either:
|
provider |
Character. The tile provider to use for the map background.
See available providers at
https://leaflet-extras.github.io/leaflet-providers/preview/.
Defaults to |
Value
An HTML widget object (leaflet map) that can be printed or displayed
in R Markdown or Shiny applications.
Examples
# Example data
df <- data.frame(
station_name = c("Station A", "Station B"),
sample_longitude_dd = c(10.0, 10.5),
sample_latitude_dd = c(59.0, 59.5)
)
# Plot points on map
map <- plot_map_leaflet(df)
# Example data in SHARK delivery format
df_deliv <- data.frame(
STATN = c("Station A", "Station B"),
LONGI = c(10.0, 10.5),
LATIT = c(59.0, 59.5)
)
# Plot points on map
map_deliv <- plot_map_leaflet(df_deliv)
Create a Leaflet map.
Description
This function is deprecated and has been replaced by
plot_map_leaflet().
Usage
plot_map_leaflet_deliv(data, provider = "Esri.OceanBasemap")
Arguments
data |
The data frame. |
provider |
Tile provider, see
https://leaflet-extras.github.io/leaflet-providers/preview/. Default is |
Value
HTML widget object.
Determine if positions are near land
Description
This function is a wrapper/re-export of
iRfcb::ifcb_is_near_land(). The iRfcb package is only required
if you want to actually call this function.
Usage
positions_are_near_land(
latitudes,
longitudes,
distance = 500,
shape = NULL,
source = "obis",
crs = 4326,
remove_small_islands = TRUE,
small_island_threshold = 2e+06,
plot = FALSE
)
Arguments
latitudes |
Numeric vector of latitudes for positions. |
longitudes |
Numeric vector of longitudes for positions. Must be the same length as |
distance |
Buffer distance (in meters) from the coastline to consider "near land." Default is 500 meters. |
shape |
Optional path to a shapefile ( |
source |
Character string indicating which default coastline source to use when |
crs |
Coordinate reference system (CRS) to use for input and output. Default is EPSG code 4326 (WGS84). |
remove_small_islands |
Logical indicating whether to remove small islands from
the coastline. Useful in archipelagos. Default is |
small_island_threshold |
Area threshold in square meters below which islands
will be considered small and removed, if remove_small_islands is set to |
plot |
A boolean indicating whether to plot the points, land polygon and buffer. Default is |
Details
Determines whether given positions are near land based on a land polygon shape file.
This function calculates a buffered area around the coastline using a polygon shapefile and determines if each input position intersects with this buffer or the landmass itself. By default, it uses the OBIS land vector dataset.
The EEA shapefile is downloaded from https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-2/gis-data/eea-coastline-polygon
when source = "eea".
Value
If plot = FALSE (default), a logical vector is returned indicating whether each position
is near land or not, with NA for positions where coordinates are missing.
If plot = TRUE, a ggplot object is returned showing the land polygon, buffer area,
and position points colored by their proximity to land.
See Also
clean_shark4r_cache() to manually clear cached shape files.
iRfcb::ifcb_is_near_land for the original function.
Examples
# Define coordinates
latitudes <- c(62.500353, 58.964498, 57.638725, 56.575338)
longitudes <- c(17.845993, 20.394418, 18.284523, 16.227174)
# Call the function
near_land <- positions_are_near_land(latitudes, longitudes, distance = 300, crs = 4326)
# Print the result
print(near_land)
Read a Plankton Toolbox export file
Description
This function reads a sample file exported as an Excel (.xlsx) file from Plankton Toolbox and extracts data from a specified sheet. The default sheet is "sample_data.txt", which contains count data.
Usage
read_ptbx(
file_path,
sheet = c("sample_data.txt", "sample_info.txt", "counting_method.txt",
"Sample summary", "README")
)
Arguments
file_path |
Character. Path to the Excel file. |
sheet |
Character. The name of the sheet to read. Must be one of: "sample_data.txt", "Sample summary", "sample_info.txt", "counting_method.txt", or "README". Default is "sample_data.txt". |
Value
A tibble containing the contents of the selected sheet.
See Also
https://nordicmicroalgae.org/plankton-toolbox/ for downloading Plankton Toolbox.
https://github.com/planktontoolbox/plankton-toolbox/ for Plankton Toolbox source code.
Examples
# Read the default data sheet
sample_data <- read_ptbx(system.file("extdata/Anholt_E_2024-09-15_0-10m.xlsx",
package = "SHARK4R"))
# Print output
sample_data
# Read a specific sheet
sample_info <- read_ptbx(system.file("extdata/Anholt_E_2024-09-15_0-10m.xlsx",
package = "SHARK4R"),
sheet = "sample_info.txt")
# Print output
sample_info
Read SHARK export files (tab- or semicolon-delimited, plain text or zipped)
Description
Reads tab- or semicolon-delimited SHARK export files with standardized format.
The function can handle plain text files (.txt) or zip archives (.zip) containing
a file named shark_data.txt. It automatically detects and converts column types
and can optionally coerce the "value" column to numeric. The "sample_date" column
is converted to Date if it exists.
Usage
read_shark(
filename,
delimiters = "point-tab",
encoding = "utf_8",
guess_encoding = TRUE,
value_numeric = TRUE
)
Arguments
filename |
Path to the SHARK export file. Can be a |
delimiters |
Character. Specifies the delimiter used in the file. Options:
|
encoding |
Character. File encoding. Options: |
guess_encoding |
Logical. If |
value_numeric |
Logical. If |
Details
This function is robust to file encoding issues. By default (guess_encoding = TRUE),
it attempts to automatically detect the file encoding and will use it if it differs
from the user-specified encoding. Automatic detection can be disabled.
Value
A data frame containing the parsed contents of the SHARK export file,
or NULL if the file is empty or could not be read.
See Also
read_shark_deliv() for reading SHARK Excel delivery files (.xls/.xlsx).
Examples
## Not run:
# Read a plain text SHARK export
df_txt <- read_shark("sharkweb_data.txt")
# Read a SHARK export from a zip archive
df_zip <- read_shark("shark_data.zip")
# Read with explicit encoding and do not convert value
df_custom <- read_shark("shark_data.txt",
encoding = "latin_1",
guess_encoding = FALSE,
value_numeric = FALSE)
## End(Not run)
Read SHARK Excel delivery files (.xls or .xlsx)
Description
Reads Excel files delivered to SHARK in a standardized format.
The function automatically detects whether the file is .xls or .xlsx
and reads the specified sheet, skipping a configurable number of rows.
Column types are automatically converted, and if a column "SDATE" exists,
it is converted to Date.
Usage
read_shark_deliv(filename, skip = 2, sheet = 2)
Arguments
filename |
Path to the Excel file to be read. |
skip |
Minimum number of rows to skip before reading anything (column names or data).
Leading empty rows are automatically skipped, so this is a lower bound.
Ignored if |
sheet |
Sheet to read. Either a string (sheet name) or integer (sheet index). If neither is specified, defaults to the second sheet. |
Value
A data frame containing the parsed contents of the Excel file, or NULL if the file
does not exist, is empty, or cannot be read.
See Also
read_shark() for reading SHARK tab- or semicolon-delimited export files or zip-archives.
Examples
## Not run:
# Read the second sheet of a .xlsx file (default)
df_xlsx <- read_shark_deliv("shark_delivery.xlsx")
# Read the first sheet of a .xls file, skipping 3 rows
df_xls <- read_shark_deliv("shark_delivery.xls", skip = 3, sheet = 1)
## End(Not run)
Launch the SHARK4R Bio-QC Tool
Description
This function launches the interactive Shiny application for performing quality control (QC) on SHARK data. The application provides a graphical interface for exploring and validating data before or after submission to SHARK.
Usage
run_qc_app(interactive = TRUE)
Arguments
interactive |
Logical value whether the session is interactive or not. |
Details
The function checks that all required packages for the app are installed before launching. If any are missing, the user is notified. In interactive sessions, the function will prompt whether the missing packages should be installed automatically. In non-interactive sessions (e.g. scripts or CI), the function instead raises an error and lists the missing packages so they can be installed manually.
Value
This function is called for its side effect of launching a Shiny application. It does not return a value.
Examples
# Launch the SHARK4R Bio-QC Tool
if(interactive()){
run_qc_app()
}
Scatterplot with optional horizontal threshold lines
Description
This function creates a scatterplot from a data frame, optionally coloring points
by a grouping column and adding horizontal threshold lines. Supports both static
ggplot2 plots and interactive plotly plots with a linear/log toggle.
Usage
scatterplot(
data,
x = c("station_name", "sample_date"),
parameter = NULL,
hline = NULL,
hline_group_col = NULL,
hline_value_col = NULL,
hline_style = list(linetype = "dashed", size = 0.8),
max_hlines = 5,
interactive = TRUE,
verbose = TRUE
)
Arguments
data |
A data.frame or tibble containing at least the following columns:
|
x |
Character. The column to use for the x-axis. Either |
parameter |
Optional character. If provided, only data for this parameter will be plotted.
If |
hline |
Numeric or data.frame. Horizontal line(s) to add. If numeric, a single line
is drawn at that y-value. If a data.frame, must contain |
hline_group_col |
Character. Column used for grouping when |
hline_value_col |
Character. Column in |
hline_style |
List. Appearance settings for horizontal lines. Should contain |
max_hlines |
Integer. Maximum number of horizontal line groups to display per parameter when |
interactive |
Logical. If TRUE, returns an interactive |
verbose |
Logical. If TRUE, messages will be displayed during execution. Defaults to TRUE. |
Details
If
hlineis numeric, a single horizontal line is drawn across the plot.If
hlineis a data.frame, only the firstmax_hlinesgroups (sorted alphabetically) are displayed.Points can be colored by
hline_group_colif provided.Interactive plots include buttons to switch between linear and log y-axis scales.
Value
A ggplot object (if interactive = FALSE) or a plotly object (if interactive = TRUE).
See Also
load_shark4r_stats for loading threshold or summary statistics that
can be used to define horizontal lines in the plot.
Examples
## Not run:
scatterplot(
data = my_data,
x = "station_name",
parameter = "Chlorophyll-a",
hline = c(10, 20)
)
scatterplot(
data = my_data,
x = "sample_date",
parameter = "Bacterial abundance",
hline = thresholds_df,
hline_group_col = "location_sea_basin",
hline_value_col = "P99"
)
## End(Not run)
Read tab delimited files downloaded from SHARK
Description
This function is deprecated and has been replaced by
read_shark().
Usage
shark_read(filename, delimiters = "point-tab", encoding = "latin_1")
Arguments
filename |
Path to file to be read. |
delimiters |
Character. Specifies the delimiter used to separate values in |
encoding |
Character. Specifies the text encoding of |
Details
Uses read_delim() to read tab-delimited or semicolon-delimited files
with standardized export format from SHARK.
This function is robust to encoding issues:
it accepts a user-specified encoding (cp1252, utf_8, utf_16, or latin_1)
but also attempts to automatically detect the file encoding.
If the detected encoding differs from the specified one,
the detected encoding will be used instead.
This helps in cases where the file encoding has been wrongly specified,
mislabeled, or varies between SHARK exports.
Value
A data frame containing the parsed contents of the SHARK export file.
Read .xlsx files delivered to SHARK
Description
This function is deprecated and has been replaced by
read_shark_deliv().
Uses readxl to read excel files with standardized delivery format
Usage
shark_read_deliv(filename, skip = 2, sheet = 2)
Arguments
filename |
path to file to be read |
skip |
Minimum number of rows to skip before reading anything, be it column names or data. Leading empty rows are automatically skipped, so this is a lower bound. Ignored if range is given. Default is 2. |
sheet |
Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the second sheet. |
Value
Data frame of file
Read .xls files delivered to SHARK
Description
This function is deprecated and has been replaced by read_shark_deliv().
Uses readxl to read excel files with standardized delivery format
Usage
shark_read_deliv_xls(filename, skip = 2, sheet = 2)
Arguments
filename |
path to file to be read |
skip |
Minimum number of rows to skip before reading anything, be it column names or data. Leading empty rows are automatically skipped, so this is a lower bound. Ignored if range is given. Default is 2. |
sheet |
Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the second sheet. |
Value
Data frame of file
Read zip archive and unzip tab delimited files downloaded from SHARK
Description
This function is deprecated and has been replaced by
read_shark().
Usage
shark_read_zip(zipname, delimiters = "point-tab", encoding = "latin_1")
Arguments
zipname |
Path to the zip archive containing SHARK data (expects a file named |
delimiters |
Character. Specifies the delimiter used to separate values in the file.
Options are |
encoding |
Character. Specifies the text encoding of the file.
Options are |
Details
Uses unz() and read_delim() to extract and read tab-delimited or
semicolon-delimited files with standardized export format from SHARK.
Like shark_read(), this function is tolerant to encoding issues.
It allows a user-specified encoding (cp1252, utf_8, utf_16, or latin_1),
but also automatically detects the encoding from the file content.
If the detected encoding does not match the specified one,
the detected encoding is preferred.
This ensures files with wrongly labeled or inconsistent encodings are still read correctly.
Value
A data frame containing the parsed contents of the SHARK export file.
Translate SHARK4R datatype names
Description
Converts user-facing datatype names (e.g., "Grey seal") to internal SHARK4R names
(e.g., "GreySeal") based on SHARK4R:::.type_lookup. See available user-facing
datatypes in get_shark_options()$dataTypes.
Usage
translate_shark_datatype(x)
Arguments
x |
Character vector of datatype names to translate |
Value
Character vector of translated datatype names
Examples
# Example strings
datatypes <- c("Grey seal", "Primary production", "Physical and Chemical")
# Basic translation
translate_shark_datatype(datatypes)
Update SHARK taxonomy records using Dyntaxa
Description
This function updates Dyntaxa taxonomy records based on a list of Dyntaxa taxon IDs. It collects parent IDs from SLU Artdatabanken API (Dyntaxa), retrieves full taxonomy records, and organizes the data into a full taxonomic table that can be joined with data downloaded from SHARK
Usage
update_dyntaxa_taxonomy(
dyntaxa_ids,
subscription_key = Sys.getenv("DYNTAXA_KEY"),
add_missing_taxa = FALSE,
verbose = TRUE
)
Arguments
dyntaxa_ids |
A vector of Dyntaxa taxon IDs to update. |
subscription_key |
A Dyntaxa API subscription key. By default, the key
is read from the environment variable You can provide the key in three ways:
|
add_missing_taxa |
Logical. If TRUE, the function will attempt to fetch missing taxa (i.e., taxon_ids not found in the initial Dyntaxa DwC-A query). Default is FALSE. |
verbose |
Logical. Print progress messages. Default is TRUE. |
Details
A valid Dyntaxa API subscription key is required. You can request a free key for the "Taxonomy" service from the ArtDatabanken API portal: https://api-portal.artdatabanken.se/
Note: Please review the API conditions
and register for access before using the API. Data collected through the API
is stored at SLU Artdatabanken. Please also note that the authors of SHARK4R are not affiliated with SLU Artdatabanken.
Value
A data frame representing the updated Dyntaxa taxonomy table.
See Also
get_shark_data, update_worms_taxonomy, SLU Artdatabanken API Documentation
Examples
## Not run:
# Update Dyntaxa taxonomy for taxon IDs 238366 and 1010380
updated_taxonomy <- update_dyntaxa_taxonomy(c(238366, 1010380), "your_subscription_key")
print(updated_taxonomy)
## End(Not run)
Retrieve and organize WoRMS taxonomy for SHARK Aphia IDs
Description
This function was deprecated and replaced by a function with more accurate name. Use add_worms_taxonomy() instead.
This function collects WoRMS (World Register of Marine Species) taxonomy information for a given set of Aphia IDs. The data is organized into a full taxonomic table that can be joined with data downloaded from SHARK.
Usage
update_worms_taxonomy(aphia_id, aphiaid = deprecated())
Arguments
aphia_id |
A numeric vector containing Aphia IDs for which WoRMS taxonomy needs to be updated. |
aphiaid |
Value
A tibble containing updated WoRMS taxonomy information.
See Also
https://marinespecies.org/ for WoRMS website.
get_shark_data, update_dyntaxa_taxonomy, WoRMS API Documentation, https://CRAN.R-project.org/package=worrms
Examples
# Update WoRMS taxonomy for a set of Aphia IDs
updated_taxonomy <- update_worms_taxonomy(c(149619, 149122, 11))
print(updated_taxonomy)
Determine if points are in a specified sea basin
Description
This function is a wrapper/re-export of
iRfcb::ifcb_which_basin(). The iRfcb package is only required
if you want to actually call this function.
Usage
which_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
Arguments
latitudes |
A numeric vector of latitude points. |
longitudes |
A numeric vector of longitude points. |
plot |
A boolean indicating whether to plot the points along with the sea basins. Default is FALSE. |
shape_file |
The absolute path to a custom polygon shapefile in WGS84 (EPSG:4326) that represents the sea basin.
Defaults to the Baltic Sea, Kattegat, and Skagerrak basins included in the |
Details
This function identifies which sub-basin a set of latitude and longitude points belong to, using a user-specified or default shapefile.
The default shapefile includes the Baltic Sea, Kattegat, and Skagerrak basins and is included in the iRfcb package.
This function reads a pre-packaged shapefile of the Baltic Sea, Kattegat, and Skagerrak basins from the iRfcb package by default, or a user-supplied
shapefile if provided. The shapefiles originate from SHARK (https://shark.smhi.se/en/). It sets the CRS, transforms the CRS to WGS84 (EPSG:4326) if necessary, and checks if the given points
fall within the specified sea basin. Optionally, it plots the points and the sea basin polygons together.
Value
A vector indicating the basin each point belongs to, or a ggplot object if plot = TRUE.
See Also
iRfcb::ifcb_which_basin for the original function.
Examples
# Define example latitude and longitude vectors
latitudes <- c(55.337, 54.729, 56.311, 57.975)
longitudes <- c(12.674, 14.643, 12.237, 10.637)
# Check in which Baltic sea basin the points are in
points_in_the_baltic <- which_basin(latitudes, longitudes)
print(points_in_the_baltic)
# Plot the points and the basins
map <- which_basin(latitudes, longitudes, plot = TRUE)