Downloads one or more datasets (zip-archives) from the SHARK database (Swedish national marine environmental data archive) and optionally unzips them. The function matches provided dataset names against all available SHARK datasets.
Usage
get_shark_datasets(
dataset_name,
save_dir = NULL,
prod = TRUE,
utv = FALSE,
unzip_file = FALSE,
return_df = FALSE,
encoding = "latin_1",
guess_encoding = TRUE,
verbose = TRUE
)Arguments
- dataset_name
Character vector with one or more dataset names (or partial names). Each entry will be matched against available SHARK dataset identifiers (e.g.,
"SHARK_Phytoplankton_2023_SMHI_BVVF"for a specific dataset, or"SHARK_Phytoplankton"for all Phytoplankton datasets).- save_dir
Directory where zip files (and optionally their extracted contents) should be stored. Defaults to
NULL. IfNULLor"", a temporary directory is used.- prod
Logical, whether to download from the production (
TRUE, default) or test (FALSE) SHARK server. Ignored ifutvisTRUE.- utv
Logical. Select UTV server when
TRUE.- unzip_file
Logical, whether to extract downloaded zip archives (
TRUE) or only save them (FALSE, default).- return_df
Logical, whether to return a combined data frame with the contents of all downloaded datasets (
TRUE) instead of a list of file paths (FALSE, default).- encoding
Character. File encoding of
shark_data.txt. Options:"cp1252","utf_8","utf_16","latin_1". Default is"latin_1". Ifguess_encoding = TRUE, detected encoding overrides this value. Ignored ifreturn_dfisFALSE.- guess_encoding
Logical. If
TRUE(default), automatically detect file encoding. IfFALSE, the function uses only the user-specified encoding. Ignored ifreturn_dfisFALSE.- verbose
Logical, whether to show download and extraction progress messages. Default is
TRUE.
Value
If return_df = FALSE, a named list of character vectors.
Each element corresponds to one matched dataset and contains either
the path to the downloaded zip file (if unzip_file = FALSE) or
the path to the extraction directory (if unzip_file = TRUE).
If return_df = TRUE, a single combined data frame with all
dataset contents, including a source column indicating the dataset.
See also
https://shark.smhi.se/en for SHARK database.
get_shark_options() for listing available datasets.
get_shark_data() for downloading tabular data.
Examples
# \donttest{
# Get a specific dataset
get_shark_datasets("SHARK_Phytoplankton_2023_SMHI_BVVF", verbose = FALSE)
#> $`SHARK_Phytoplankton_2023_SMHI_BVVF_version_2025-03-09.zip`
#> [1] "/tmp/RtmpirDMYv/SHARK_Phytoplankton_2023_SMHI_BVVF_version_2025-03-09.zip"
#>
# Get all Zooplankton datasets from 2022 and unzip them
get_shark_datasets(
dataset_name = "Zooplankton_2022",
unzip_file = TRUE,
verbose = FALSE
)
#> $`SHARK_Zooplankton_2022_DEEP_version_2024-12-17.zip`
#> [1] "/tmp/RtmpirDMYv/SHARK_Zooplankton_2022_DEEP_version_2024-12-17"
#>
#> $`SHARK_Zooplankton_2022_SMHI_version_2024-12-17.zip`
#> [1] "/tmp/RtmpirDMYv/SHARK_Zooplankton_2022_SMHI_version_2024-12-17"
#>
#> $`SHARK_Zooplankton_2022_UMSC_version_2024-03-27.zip`
#> [1] "/tmp/RtmpirDMYv/SHARK_Zooplankton_2022_UMSC_version_2024-03-27"
#>
# Get all Chlorophyll datasets and return as a combined data frame
combined_df <- get_shark_datasets(
dataset_name = "Chlorophyll",
return_df = TRUE,
verbose = FALSE
)
#> Detected encoding 'windows-1252' differs from specified 'latin_1'. Using detected encoding.
head(combined_df)
#> # A tibble: 6 × 73
#> source delivery_datatype check_status_sv data_checked_by_sv visit_year
#> <dbl> <chr> <chr> <chr> <dbl>
#> 1 1 Chlorophyll Klar Leverantör 1989
#> 2 1 Chlorophyll Klar Leverantör 1989
#> 3 1 Chlorophyll Klar Leverantör 1985
#> 4 1 Chlorophyll Klar Leverantör 1989
#> 5 1 Chlorophyll Klar Leverantör 1989
#> 6 1 Chlorophyll Klar Leverantör 1989
#> # ℹ 68 more variables: visit_month <dbl>, station_name <chr>,
#> # reported_station_name <chr>, sample_location_id <dbl>, station_id <dbl>,
#> # sample_project_name_en <chr>, sample_orderer_name_en <chr>,
#> # platform_code <chr>, visit_id <chr>, expedition_id <chr>,
#> # shark_sample_id_md5 <chr>, sample_date <date>, sample_time <time>,
#> # sample_enddate <date>, sample_endtime <lgl>, sample_latitude_dm <chr>,
#> # sample_longitude_dm <chr>, sample_latitude_dd <dbl>, …
# }
