m.crawl.thredds

List dataset urls from a Thredds Data Server (TDS) catalog.

Command linePython (grass.script)Python (grass.tools)

m.crawl.thredds input=string [print=string [,string,...]] services=string [,string,...] [filter=string] [skip=string [,string,...]] [output=name] [separator=character] [modified_before=string] [modified_after=string] [authentication=name] [nprocs=Number of cores] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]

Example:

m.crawl.thredds input=string services=httpserver

grass.script.run_command("m.crawl.thredds", input, print=None, services="httpserver", filter=".*", skip=None, output="-", separator="pipe", modified_before=None, modified_after=None, authentication=None, nprocs=1, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

gs.run_command("m.crawl.thredds", input="string", services="httpserver")

grass.tools.Tools.m_crawl_thredds(input, print=None, services="httpserver", filter=".*", skip=None, output="-", separator="pipe", modified_before=None, modified_after=None, authentication=None, nprocs=1, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

tools = Tools()
tools.m_crawl_thredds(input="string", services="httpserver")

This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

Parameters

Command linePython (grass.script)Python (grass.tools)

input=string [required]
    URL of a catalog on a thredds server
print=string [,string,...]
    Additional information to print
    Allowed values: service, dataset_size
services=string [,string,...] [required]
    Services of thredds server to crawl
    Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms
    Default: httpserver
filter=string
    Regular expression for filtering dataset and catalog URLs
    Default: .*
skip=string [,string,...]
    Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".*jpeg.*,.*metadata.*)"
output=name
    Name of the output file (stdout if omitted)
    Default: -
separator=character
    Field separator
    Special characters: pipe, comma, space, tab, newline
    Default: pipe
modified_before=string
    Latest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
modified_after=string
    Earliest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
authentication=name
    Authentication for thredds server
    File with authentication information (username and password) for thredds server
nprocs=Number of cores
    Number of cores to use for crawling thredds server
    Default: 1
--overwrite
    Allow output files to overwrite existing files
--help
    Print usage summary
--verbose
    Verbose module output
--quiet
    Quiet module output
--qq
    Very quiet module output
--ui
    Force launching GUI dialog

input : str, required
    URL of a catalog on a thredds server
print : str | list[str], optional
    Additional information to print
    Allowed values: service, dataset_size
services : str | list[str], required
    Services of thredds server to crawl
    Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms
    Default: httpserver
filter : str, optional
    Regular expression for filtering dataset and catalog URLs
    Default: .*
skip : str | list[str], optional
    Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".*jpeg.*,.*metadata.*)"
output : str, optional
    Name of the output file (stdout if omitted)
    Used as: output, file, name
    Default: -
separator : str, optional
    Field separator
    Special characters: pipe, comma, space, tab, newline
    Used as: input, separator, character
    Default: pipe
modified_before : str, optional
    Latest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
modified_after : str, optional
    Earliest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
authentication : str, optional
    Authentication for thredds server
    File with authentication information (username and password) for thredds server
    Used as: input, file, name
nprocs : int, optional
    Number of cores to use for crawling thredds server
    Used as: Number of cores
    Default: 1
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

input : str, required
    URL of a catalog on a thredds server
print : str | list[str], optional
    Additional information to print
    Allowed values: service, dataset_size
services : str | list[str], required
    Services of thredds server to crawl
    Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms
    Default: httpserver
filter : str, optional
    Regular expression for filtering dataset and catalog URLs
    Default: .*
skip : str | list[str], optional
    Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".*jpeg.*,.*metadata.*)"
output : str, optional
    Name of the output file (stdout if omitted)
    Used as: output, file, name
    Default: -
separator : str, optional
    Field separator
    Special characters: pipe, comma, space, tab, newline
    Used as: input, separator, character
    Default: pipe
modified_before : str, optional
    Latest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
modified_after : str, optional
    Earliest modification timestamp of datasets to include in the output
    ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")
authentication : str | io.StringIO, optional
    Authentication for thredds server
    File with authentication information (username and password) for thredds server
    Used as: input, file, name
nprocs : int, optional
    Number of cores to use for crawling thredds server
    Used as: Number of cores
    Default: 1
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

Returns:

result : grass.tools.support.ToolResult | None
If the tool produces text as standard output, a ToolResult object will be returned. Otherwise, None will be returned.

Raises:

grass.tools.ToolError: When the tool ended with an error.

DESCRIPTION

An increasing amount of spatio-temporal data, like climate observations and forecast data or satellite imagery is provided through Thredds Data Servers (TDS).

m.crawl.thredds crawls the catalog of a Thredds Data Server (TDS) starting from the catalog-URL provided in the input. It is a wrapper module around the Python library thredds_crawler. m.crawl.thredds returns a list of dataset URLs, optionally with additional information on the service type and data size. Depending on the format of the crawled datasets, the output of m.crawl.thredds may be used as input to t.rast.import.netcdf.

The returned list of datasets can be filtered:

based on the modification time of the dataset using a range of relevant timestamps defined by the modified_before and modified_after option(s)
based on the file name using a regular expression in the filter option.

When crawling larger Thredds installations, skipping irrelevant branches of the server's tree of datasets can greatly speed-up the process. In the skip option, branches (and also leaf datasets) can be excluded from the search by a comma-separated list of regular expression strings, e.g. ".*metadata.*" would direct the module to not look for datasets inside a "metadata" directory.

Authentication to the Thredds Server (if required) can be provided either through a text-file, where the first line contains the username and the second the password, or by interactive user input (if authentication=-). Alternatively, username and password can be passed through environment variables THREDDS_USER and THREDDS_PASSWORD.

NOTES

The Thredds data catalog is crawled recursively. Providing the URL to the root of a catalog on a Thredds server with many hierarchies and datasets can therefore be quite time consuming, even if executed in parallel (nprocs > 1).

EXAMPLES

List modelled climate observation datasets from the Norwegian Meteorological Institute (met.no)

# Get a list of all data for "seNorge"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
(...)
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_1957.nc

# Get a list of the most recent data for "seNorge"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml" modified_after="2021-02-01"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2020.nc

# Get a list of the most recent data for "seNorge" that match a regular expression
# Note the "." beofor the "*"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml" \
modified_after="2021-02-01" filter=".*2018_202.*"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2020.nc

List Sentinel-2A data from the Norwegian Ground Segment (NBS) for the 2. Feb 2021

# Get a list of all Sentinel-2A data for 2. Feb 2021 with dataset size
m.crawl.thredds input="https://nbstds.met.no/thredds/catalog/NBS/S2A/2021/02/28/catalog.xml" print="data_size"
https://nbstds.met.no/thredds/fileServer/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T35WPU_20210228T201033_DTERRENGDATA.nc|107.6
(...)
https://nbstds.met.no/thredds/fileServer/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T32VNL_20210228T201033_DTERRENGDATA.nc|166.1

# Get a list of WMS end-points to all Sentinel-2A data for 2. Feb 2021
m.crawl.thredds input="https://nbstds.met.no/thredds/catalog/NBS/S2A/2021/02/28/catalog.xml" services="wms"
https://nbstds.met.no/thredds/wms/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T35WPU_20210228T201033_DTERRENGDATA.nc
(...)
https://nbstds.met.no/thredds/wms/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T32VNL_20210228T201033_DTERRENGDATA.nc

REQUIREMENTS

m.crawl.thredds is a wrapper around the thredds_crawler Python library.

AUTHORS

Stefan Blumentrath, Norwegian Institute for Nature Research (NINA), Oslo

SOURCE CODE

Available at: m.crawl.thredds source code (history)
Latest change: Friday Feb 21 10:10:05 2025 in commit 7d78fe3

m.crawl.thredds

Parameters

DESCRIPTION

NOTES

EXAMPLES

REQUIREMENTS

SEE ALSO

AUTHORS

SOURCE CODE