Skip to contents

The get_shark_data function retrieves data from the SHARK database hosted by SMHI. The function sends a POST request to the SHARK API with customizable filters, including year, month, taxon name, water category, and more, and returns the retrieved data as a structured data.frame. To view available filter options, see get_shark_options.

Usage

get_shark_data(
  tableView = "sharkweb_overview",
  headerLang = "internal_key",
  save_data = FALSE,
  file_path = NULL,
  delimiters = "point-tab",
  lineEnd = "win",
  encoding = "utf_8",
  dataTypes = c(),
  bounds = c(),
  fromYear = NULL,
  toYear = NULL,
  months = c(),
  parameters = c(),
  checkStatus = "",
  qualityFlags = c(),
  deliverers = c(),
  orderers = c(),
  projects = c(),
  datasets = c(),
  minSamplingDepth = "",
  maxSamplingDepth = "",
  redListedCategory = c(),
  taxonName = c(),
  stationName = c(),
  vattenDistrikt = c(),
  seaBasins = c(),
  counties = c(),
  municipalities = c(),
  waterCategories = c(),
  typOmraden = c(),
  helcomOspar = c(),
  seaAreas = c(),
  hideEmptyColumns = FALSE,
  row_limit = 10^7,
  prod = TRUE,
  verbose = TRUE
)

Arguments

tableView

Character. Specifies the columns of the table to retrieve. Options include:

  • "sharkweb_overview": Overview table

  • "sharkweb_all": All available columns

  • "sharkdata_bacterioplankton": Bacterioplankton table

  • "sharkdata_chlorophyll": Chlorophyll table

  • "sharkdata_epibenthos": Epibenthos table

  • "sharkdata_greyseal": Greyseal table

  • "sharkdata_harbourporpoise": Harbour porpoise table

  • "sharkdata_harbourseal: Harbour seal table

  • "sharkdata_jellyfish": Jellyfish table

  • "sharkdata_physicalchemical": Physical chemical table

  • "sharkdata_physicalchemical_columns": Physical chemical table: column view

  • "sharkdata_phytoplankton": Phytoplankton table

  • "sharkdata_picoplankton": Picoplankton table

  • "sharkdata_planktonbarcoding": Plankton barcoding table

  • "sharkdata_primaryproduction": Primary production table

  • "sharkdata_ringedseal": Ringed seal table

  • "sharkdata_sealpathology": Seal pathology table

  • "sharkdata_sedimentation": Sedimentation table

  • "sharkdata_zoobenthos": Zoobenthos table

  • "sharkdata_zooplankton": Zooplankton table

  • "report_sum_year_param": Report sum per year and parameter

  • "report_sum_year_param_taxon": Report sum per year, parameter and taxon

  • "report_sampling_per_station": Report sampling per station

  • "report_obs_taxon": Report observed taxa

  • "report_stations": Report stations

  • "report_taxon": Report taxa

Default is "sharkweb_overview".

headerLang

Character. Language option for column headers. Possible values:

  • "sv": Swedish.

  • "en": English.

  • "short": Shortened version.

  • "internal_key": Internal key (default).

save_data

Logical. If TRUE, the data will be saved to a specified file (see file_path). If FALSE, a temporary file will be created instead. The temporary file will be automatically deleted after it is loaded into memory.

file_path

Character. The file path where the data should be saved. Required if save_data is TRUE. Ignored if save_data is FALSE.

delimiters

Character. Specifies the delimiter used to separate values in the file, if save_data is TRUE. Options are "point-tab" (tab-separated) or "point-semi" (semicolon-separated). Default is "point-tab".

lineEnd

Character. Defines the type of line endings in the file, if save_data is TRUE. Options are "win" (Windows-style, \r\n) or "unix" (Unix-style, \n). Default is "win".

encoding

Character. Sets the file's text encoding, if save_data is TRUE. Options are "cp1252", "utf_8", "utf_16", or "latin_1". Default is "utf_8".

dataTypes

Character vector. Specifies data types to filter. Possible values include:

  • "Bacterioplankton"

  • "Chlorophyll"

  • "Epibenthos"

  • "Grey seal"

  • "Harbour Porpoise"

  • "Harbour seal"

  • "Jellyfish"

  • "Physical and Chemical"

  • "Phytoplankton"

  • "Picoplankton"

  • "PlanktonBarcoding"

  • "Primary production"

  • "Profile"

  • "Ringed seal"

  • "Seal pathology"

  • "Sedimentation"

  • "Zoobenthos"

  • "Zooplankton"

bounds

A numeric vector of length 4 specifying the geographical search boundaries in decimal degrees, formatted as c(lon_min, lat_min, lon_max, lat_max), e.g., c(11, 58, 12, 59). Default is c() to include all data.

fromYear

Integer (optional). The starting year for data retrieval. If set to NULL (default), the function will use the earliest available year in SHARK.

toYear

Integer (optional). The ending year for data retrieval. If set to NULL (default), the function will use the latest available year in SHARK.

months

Integer vector. The months to retrieve data for, e.g., c(4, 5, 6) for April to June.

parameters

Character vector. Optional parameters to filter the results by, such as "Chlorophyll-a".

checkStatus

Character string. Optional status check to filter results.

qualityFlags

Character vector. Specifies the quality flags to filter the data. By default, all data are included, including those with the "B" flag (Bad).

deliverers

Character vector. Specifies the data deliverers to filter by.

orderers

Character vector. Orderers to filter by specific organizations or individuals.

projects

Character vector. Projects to filter data by specific research or monitoring projects.

datasets

Character vector. Datasets to filter data by specific datasets.

minSamplingDepth

Numeric. Minimum sampling depth (in meters) to filter the data.

maxSamplingDepth

Numeric. Maximum sampling depth (in meters) to filter the data.

redListedCategory

Character vector. Red-listed taxa for conservation filtering.

taxonName

Character vector. Optional vector of taxa names to filter by.

stationName

Character vector. Station names to filter data by specific stations.

vattenDistrikt

Character vector. Water district names to filter by Swedish water districts.

seaBasins

Character vector. Sea basins to filter by.

counties

Character vector. Counties to filter by specific administrative regions.

municipalities

Character vector. Municipalities to filter by.

waterCategories

Character vector. Water categories to filter by.

typOmraden

Character vector. Type areas to filter by.

helcomOspar

Character vector. HELCOM or OSPAR areas for regional filtering.

seaAreas

Character vector. Sea area codes to filter by specific sea areas.

hideEmptyColumns

Logical. Whether to hide empty columns. Default is FALSE.

row_limit

Numeric. Specifies the maximum number of rows that can be retrieved in a single request. If the requested data exceeds this limit, the function automatically downloads the data in yearly chunks. The default value is 10 million rows.

prod

Logical. Whether to query the PROD (production) server or the SMHI internal TEST (testing) server. Default is TRUE (PROD).

verbose

Logical. Whether to display progress information. Default is TRUE.

Value

A data.frame containing the retrieved SHARK data, with column names based on the API's response.

Details

This function sends a POST request to the SHARK API with the specified filters. The response is parsed as JSON and then converted into a data.frame. The function handles the dynamic construction of the query body to filter the data based on the provided parameters. If the row_limit parameter is reached, the data retrieval process is split into manageable chunks to avoid overwhelming the API or running into memory issues. Please note that making very large requests, such as retrieving the entire database, can be extremely memory-intensive.

Examples

if (FALSE) { # \dontrun{
  # Retrieve chlorophyll data from 2019 to 2020 for April to June
  shark_data <- get_shark_data(fromYear = 2019, toYear = 2020,
                               months = c(4, 5, 6), dataTypes = c("Chlorophyll"))
  View(shark_data)
} # }