The SHARK4R package provides a set of functions to perform quality control (QC) on SHARK data. These functions help identify missing or invalid values, spatial errors, and statistical outliers and are mainly intended for internal data validation. The tutorial covers:
This workflow ensures SHARK data are consistent, valid, and ready for analysis. Several quality control components, originally developed by Provoost and Bosch (2018), have been adapted for compatibility with the SHARK format.
You can install the latest version of SHARK4R from CRAN
using:
Load the package along with dplyr:
You can fetch SHARK data using the same filtering options as the SHARK web interface. Explore available options with:
Filter datasets containing “Chlorophyll”:
# Filter names using grepl
chlorophyll_datasets <- shark_options$datasets[grepl("Chlorophyll",
shark_options$datasets)]
# Select the first dataset for demonstration
selected_dataset <- chlorophyll_datasets[1]
# Print the name of the selected dataset
print(selected_dataset)## [1] "SHARK_Chlorophyll_1985_1989_SMHI_version_2023-04-27.zip"
Download the selected dataset as a data frame:
chlorophyll_data <- get_shark_datasets(selected_dataset,
save_dir = tempdir(),
return_df = TRUE,
verbose = FALSE)
tibble(chlorophyll_data)## # A tibble: 80 × 73
## source delivery_datatype check_status_sv data_checked_by_sv visit_year
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1 Chlorophyll Klar Leverantör 1989
## 2 1 Chlorophyll Klar Leverantör 1989
## 3 1 Chlorophyll Klar Leverantör 1985
## 4 1 Chlorophyll Klar Leverantör 1989
## 5 1 Chlorophyll Klar Leverantör 1989
## 6 1 Chlorophyll Klar Leverantör 1989
## 7 1 Chlorophyll Klar Leverantör 1989
## 8 1 Chlorophyll Klar Leverantör 1989
## 9 1 Chlorophyll Klar Leverantör 1989
## 10 1 Chlorophyll Klar Leverantör 1989
## # ℹ 70 more rows
## # ℹ 68 more variables: visit_month <dbl>, station_name <chr>,
## # reported_station_name <chr>, sample_location_id <dbl>, station_id <dbl>,
## # sample_project_name_en <chr>, sample_orderer_name_en <chr>,
## # platform_code <chr>, visit_id <dbl>, expedition_id <lgl>,
## # shark_sample_id_md5 <chr>, sample_date <date>, sample_time <time>,
## # sample_enddate <lgl>, sample_endtime <lgl>, sample_latitude_dm <chr>, …
SHARK data can be downloaded and saved locally using the
save_dir argument, then imported into R using the function
read_shark() for both ZIP archives or text files.
Validate mandatory fields:
check_datatype()check_fields() with optional
field_definitions## # A tibble: 1,195 × 4
## level field row message
## <chr> <chr> <int> <chr>
## 1 error sample_project_name_sv NA Required field sample_project_name…
## 2 error sample_orderer_name_sv NA Required field sample_orderer_name…
## 3 error sampling_laboratory_name_sv NA Required field sampling_laboratory…
## 4 error analytical_laboratory_name_sv NA Required field analytical_laborato…
## 5 error reporting_institute_name_sv NA Required field reporting_institute…
## 6 error sample_enddate 1 Empty value for required field sam…
## 7 error sample_enddate 2 Empty value for required field sam…
## 8 error sample_enddate 3 Empty value for required field sam…
## 9 error sample_enddate 4 Empty value for required field sam…
## 10 error sample_enddate 5 Empty value for required field sam…
## # ℹ 1,185 more rows
Ensure metadata codes follow SHARK conventions:
## All PROJ codes found
## # A tibble: 1 × 2
## reported_code match_type
## <chr> <lgl>
## 1 National marine monitoring TRUE
# Validate ship/platform codes
check_codes(data = chlorophyll_data,
field = "platform_code",
code_type = "SHIPC",
match_column = "Code")## All SHIPC codes found
## # A tibble: 2 × 2
## reported_code match_type
## <chr> <lgl>
## 1 77AR TRUE
## 2 77SN TRUE
## [1] 0
Optional geospatial QC functions:
positions_are_near_land()which_basin() from the iRfcb packageVerify plausibility and consistency of depth values:
## # A tibble: 2 × 4
## level row field message
## <chr> <int> <chr> <chr>
## 1 warning 17 sample_max_depth_m Depth value (20) is greater than the value f…
## 2 warning 80 sample_max_depth_m Depth value (20) is greater than the value f…
## # A tibble: 57 × 4
## level row field message
## <chr> <int> <chr> <chr>
## 1 warning 2 water_depth_m Depth value (78) is greater than the value found…
## 2 warning 3 water_depth_m Depth value (78) is greater than the value found…
## 3 warning 4 water_depth_m Depth value (637) is greater than the value foun…
## 4 warning 6 water_depth_m Depth value (86) is greater than the value found…
## 5 warning 8 water_depth_m Depth value (25) is greater than the value found…
## 6 warning 9 water_depth_m Depth value (41) is greater than the value found…
## 7 warning 10 water_depth_m Depth value (90) is greater than the value found…
## 8 warning 11 water_depth_m Depth value (74) is greater than the value found…
## 9 warning 13 water_depth_m Depth value (465) is greater than the value foun…
## 10 warning 14 water_depth_m Depth value (465) is greater than the value foun…
## # ℹ 47 more rows
Checks performed:
Retrieve reference statistics for your datatype:
shark_statistics <- get_shark_statistics(datatype = "Chlorophyll",
fromYear = 2020,
toYear = 2024,
verbose = FALSE)
tibble(shark_statistics)## # A tibble: 1 × 24
## parameter datatype fromYear toYear n min Q1 median Q3 max P01
## <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chlorophy… Chlorop… 2020 2024 1374 0.19 1.39 2.2 3.3 22.2 0.307
## # ℹ 13 more variables: P05 <dbl>, P95 <dbl>, P99 <dbl>, IQR <dbl>, mean <dbl>,
## # sd <dbl>, var <dbl>, cv <dbl>, mad <dbl>, mild_lower <dbl>,
## # mild_upper <dbl>, extreme_lower <dbl>, extreme_upper <dbl>
Detect extreme values using thresholds (e.g., 99th percentile):
check_outliers(data = chlorophyll_data,
parameter = "Chlorophyll-a",
datatype = "Chlorophyll",
threshold_col = "P99",
thresholds = shark_statistics)## Chlorophyll-a is within the P99 range.
Visualize anomalies:
# Scatterplot with horizontal line at 99th percentile
scatterplot(chlorophyll_data,
hline = shark_statistics$P99)Use check_parameter_rules() to flag measurements that
violate parameter-specific or row-wise logical rules.
## No parameters from the logical rules are present in the dataset. Available parameters are: Total cover of all species, Cover, Cover class, Sediment deposition cover, Abundance class, Wet weight
return_df = TRUE gives a data frame of violations.return_logical = TRUE gives logical vectors for each
parameter.param_conditions or rowwise_conditions
lists.Verify station names against the official SHARK registry:
## Using station.txt from NODC_CONFIG: /home/anders/nodc_config//nodc_station/station.txt
## All stations found
## reported_station_name match_type
## 1 425 GNIBEN TRUE
## 2 FLADEN TRUE
## 3 FLADEN TRUE
## 4 M6 TRUE
## 5 ANHOLT E TRUE
## 6 W SKAGEN TRUE
To plot stations and their distances from the station register in an interactive map:
## Using station.txt from NODC_CONFIG: /home/anders/nodc_config//nodc_station/station.txt
## WARNING: Some stations are outside the allowed distance limit
## # A tibble: 3 × 3
## station_name distance_m OUT_OF_BOUNDS_RADIUS
## <chr> <dbl> <dbl>
## 1 LÄSÖ RÄNNA 10360. 1200
## 2 OH7 2233. 1200
## 3 HS2 2233. 1200
To check if stations are nominal (comparing unique coordinates per station):
## Positions are not suspected to be nominal
For a more user-friendly interface, use the Shiny QC app:
# Run the app
run_qc_app()
# Alternative, download support files and knit documents locally
check_setup(path = tempdir()) # using a temp folder in this exampleThe app provides point-and-click access to the same QC checks described above.
check_datatype(), check_fields())check_codes())plot_map_leaflet(),
check_onland(), optional
positions_are_near_land(),
which_basin())check_depth())check_outliers(),
scatterplot())check_parameter_rules())match_station())Following this order ensures comprehensive QC and prepares your SHARK data for analysis.
## To cite package 'SHARK4R' in publications use:
##
## Lindh, M. and Torstensson, A. (2025). SHARK4R: Accessing and
## Validating Marine Environmental Data from 'SHARK' and Related
## Databases. R package version 1.0.1.
## https://CRAN.R-project.org/package=SHARK4R
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {SHARK4R: Accessing and Validating Marine Environmental Data from 'SHARK' and Related Databases},
## author = {Markus Lindh and Anders Torstensson},
## year = {2025},
## note = {R package version 1.0.1},
## url = {https://CRAN.R-project.org/package=SHARK4R},
## }
obistools: Tools for data
enhancement and quality control. Ocean Biodiversity Information System.
Intergovernmental Oceanographic Commission of UNESCO. R package version
0.1.0, https://iobis.github.io/obistools/.