Skip to content

API reference

earthcarekit.download

A download tool for EarthCARE data based on ESA MAAP.

Notes

This module depends on other internal modules:


ecdownload

ecdownload(
    file_type: str | list[str],
    baseline: str | None = None,
    orbit_number: int | list[int] | None = None,
    start_orbit_number: int | None = None,
    end_orbit_number: int | None = None,
    frame_id: str | list[str] | None = None,
    orbit_and_frame: str | list[str] | None = None,
    start_orbit_and_frame: str | None = None,
    end_orbit_and_frame: str | None = None,
    timestamps: str | list[str] | None = None,
    start_time: str | None = None,
    end_time: str | None = None,
    radius_search: tuple[RadiusMetersFloat, LatFloat, LonFloat] | list | None = None,
    bounding_box: tuple[LatSFloat, LonWFloat, LatNFloat, LonEFloat] | list | None = None,
    path_to_config: str | None = None,
    path_to_data: str | None = None,
    is_log: bool = False,
    is_debug: bool = False,
    is_download: bool = True,
    is_overwrite: bool = False,
    is_unzip: bool = True,
    is_delete: bool = True,
    is_create_subdirs: bool = True,
    is_export_results: bool = False,
    idx_selected_input: int | None = None,
    is_organize_data: bool = False,
    is_include_header: bool | None = None,
    is_only_header: bool | None = None,
    is_reversed_order: bool = False,
    return_results: bool = False,
    verbose: bool = True,
    check_product_availability: bool = False,
) -> ProductDataFrame | None

EarthCARE Download Tool: Search for and download EarthCARE products from a ESA data distribution platform (OADS or MAAP).

The execution of this tool is divided into two parts:

Parameters:

Name Type Description Default
file_type str | list[str]

Name(s) of EarthCARE product(s) to search for (e.g., "ATL_NOM_1B", "ANOM", or "A-NOM"). Note: Input string evaluation is not case sensitive. Also, product version may also be selected by adding a colon and the two-letter processor baseline after the name (e.g., "ANOM:BA").

required
baseline str | None

Two-letter processor baseline used as default for all given file_types (e.g., "BA"). Note: A baseline specified in file_type with colon notation (e.g., "ANOM:BA") overwrites the default baseline. Defaults to None.

None
orbit_number int | list[int] | None

Specific orbit number(s) to search for (e.g., 981 or [1000, 5000, ...]). Defaults to None.

None
start_orbit_number int | None

The lower limit of orbit numbers to search for (e.g., 5000). Defaults to None.

None
end_orbit_number int | None

The upper limit of orbit numbers to search for (e.g., 5003). Defaults to None.

None
frame_id str | list[str] | None

Frame ID letter(s) to search for (i.e., letters A to H). Defaults to None.

None
orbit_and_frame str | list[str] | None

Orbit and frame string(s) to search for (e.g., "01234F" or ["1000A", "5000C", ...]). Defaults to None.

None
start_orbit_and_frame str | None

The lower limit of orbit and frames to search for (e.g., "05000D"). Defaults to None.

None
end_orbit_and_frame str | None

The upper limit of orbit and frames to search for (e.g., "05003C"). Defaults to None.

None
timestamps str | list[str] | None

Search for data containing specific timestamp(s) (e.g. "2024-07-31 13:45" or "20240731T134500Z"). Defaults to None.

None
start_time str | None

The lower time limit for the search. Defaults to None.

None
end_time str | None

The upper time limit for the search. Defaults to None.

None
radius_search tuple[RadiusMetersFloat, LatFloat, LonFloat] | list | None

A tuple containing a radius (meters) and a lat/lon point to perform a geo radius search (e.g., 25000 51.35 12.43, i.e., ). Latitudes must be provided as degrees north and longitudes as degrees east. Defaults to None.

None
bounding_box tuple[LatSFloat, LonWFloat, LatNFloat, LonEFloat] | list | None

A tuple containing the extent for a bounding box geo search (e.g., [14.9, 37.7, 14.99, 37.78], i.e., ). Latitudes must be provided as degrees north and longitudes as degrees east. Defaults to None.

None
path_to_config str | None

If provided, uses given config file instead of the default config. Defaults to None.

None
path_to_data str | None

If provided, downloads data to the given folder instead of the one defined in the config file. Defaults to None.

None
is_log bool

If True, creates a log file in a /log folder inside the current working directory. Defaults to False.

False
is_debug bool

If True, shows debug logs in the console. Defaults to False.

False
is_download bool

If False, skips download part, but still performs search requests via the data dissemination platform API. Defaults to True.

True
is_overwrite bool

If True, downloads and overwrites files that already exist in the data directory instead of skipping them. Defaults to False.

False
is_unzip bool

If False, skips file extraction for downloaded archives. Defaults to True.

True
is_delete bool

If True, deletes downloaded archives after extraction (i.e., does not delete non-extracted archives). Defaults to True.

True
is_create_subdirs bool

If True, places downloaded files in a sub-directory structure according to the template defined in the config file. Defaults to True.

True
is_export_results bool

If True, creates a text file in the current working directory listing all search results. Defaults to False.

False
idx_selected_input int | None

A number matching an index in the list of found files. If provided, only this single file will be downloaded. Defaults to None.

None
is_organize_data bool

If True, does not search or download any data. Defaults to False.

False
is_include_header bool | None

If True, the full archive is downloaded containing both HDF5 data file (.h5) and header data file (.HDR). If False, only the data file will be downloaded, speeding up the download time. Defaults to None.

None
is_only_header bool

If True, downloads only header files (.HDR). This option overrides is_include_header. Defaults to False.

None
is_reversed_order bool

If True, downloads data products in reversed order (from the latest to the earliest). Defaults to False.

False
return_results bool

If True, returns the search results as a ProductDataFrame. Defaults to False.

False
verbose bool

If False, does not print logs to the console and does not create log file. Defaults to True.

True
check_product_availability bool

If True, sends extra request to the download backend checking the list of available products per data collection. If False, uses internally stored lists of available products, significantly reducing execution time (but might fail in case of backend changes). Defaults to False.

False

Returns:

Name Type Description
results ProductDataFrame | None

If return_results=False, the function has no return (i.e., None). If return_results=True, the function returns the search results.

Source code in earthcarekit/download/main.py
def ecdownload(
    file_type: str | list[str],
    baseline: str | None = None,
    orbit_number: int | list[int] | None = None,
    start_orbit_number: int | None = None,
    end_orbit_number: int | None = None,
    frame_id: str | list[str] | None = None,
    orbit_and_frame: str | list[str] | None = None,
    start_orbit_and_frame: str | None = None,
    end_orbit_and_frame: str | None = None,
    timestamps: str | list[str] | None = None,
    start_time: str | None = None,
    end_time: str | None = None,
    radius_search: tuple[RadiusMetersFloat, LatFloat, LonFloat] | list | None = None,
    bounding_box: (tuple[LatSFloat, LonWFloat, LatNFloat, LonEFloat] | list | None) = None,
    path_to_config: str | None = None,
    path_to_data: str | None = None,
    is_log: bool = False,
    is_debug: bool = False,
    is_download: bool = True,
    is_overwrite: bool = False,
    is_unzip: bool = True,
    is_delete: bool = True,
    is_create_subdirs: bool = True,
    is_export_results: bool = False,
    idx_selected_input: int | None = None,
    is_organize_data: bool = False,
    is_include_header: bool | None = None,
    is_only_header: bool | None = None,
    is_reversed_order: bool = False,
    return_results: bool = False,
    verbose: bool = True,
    check_product_availability: bool = False,
) -> ProductDataFrame | None:
    """
    EarthCARE Download Tool: Search for and download EarthCARE products from a ESA data distribution platform (OADS or MAAP).

    The execution of this tool is divided into two parts:

    - First, based on provided arguments search request will be send via the OpenSearch API of the [ESA MAAP catalogue](https://catalog.maap.eo.esa.int/catalogue/).
    - Second, the resulting list of products is then downloaded from the configures download backend (OADS or MAAP). See:
        - MAAP: [portal.maap.eo.esa.int/earthcare](https://portal.maap.eo.esa.int/earthcare/)
        - OADS L1: [ec-pdgs-dissemination1.eo.esa.int](https://ec-pdgs-dissemination1.eo.esa.int/)
        - OADS L2: [ec-pdgs-dissemination2.eo.esa.int](https://ec-pdgs-dissemination2.eo.esa.int/)

    Args:
        file_type (str | list[str]): Name(s) of EarthCARE product(s) to search for (e.g., "ATL_NOM_1B", "ANOM", or "A-NOM").
            Note: Input string evaluation is not case sensitive. Also, product version may also be selected
            by adding a colon and the two-letter processor baseline after the name (e.g., "ANOM:BA").
        baseline (str | None, optional): Two-letter processor baseline used as default for all given `file_type`s (e.g., "BA").
            Note: A baseline specified in `file_type` with colon notation (e.g., "ANOM:BA") overwrites the default `baseline`.
            Defaults to None.
        orbit_number (int | list[int] | None, optional):
            Specific orbit number(s) to search for (e.g., 981 or [1000, 5000, ...]). Defaults to None.
        start_orbit_number (int | None, optional):
            The lower limit of orbit numbers to search for (e.g., 5000). Defaults to None.
        end_orbit_number (int | None, optional):
            The upper limit of orbit numbers to search for (e.g., 5003). Defaults to None.
        frame_id (str | list[str] | None, optional):
            Frame ID letter(s) to search for (i.e., letters A to H). Defaults to None.
        orbit_and_frame (str | list[str] | None, optional):
            Orbit and frame string(s) to search for (e.g., "01234F" or ["1000A", "5000C", ...]). Defaults to None.
        start_orbit_and_frame (str | None, optional):
            The lower limit of orbit and frames to search for (e.g., "05000D"). Defaults to None.
        end_orbit_and_frame (str | None, optional):
            The upper limit of orbit and frames to search for (e.g., "05003C"). Defaults to None.
        timestamps (str | list[str] | None, optional):
            Search for data containing specific timestamp(s) (e.g. "2024-07-31 13:45" or "20240731T134500Z"). Defaults to None.
        start_time (str | None, optional):
            The lower time limit for the search. Defaults to None.
        end_time (str | None, optional):
            The upper time limit for the search. Defaults to None.
        radius_search (tuple[RadiusMetersFloat, LatFloat, LonFloat] | list | None, optional):
            A tuple containing a radius (meters) and a lat/lon point to perform a geo radius search (e.g., 25000 51.35 12.43, i.e.,
            <radius[m]> <lat> <lon>). Latitudes must be provided as degrees north and longitudes as degrees east. Defaults to None.
        bounding_box (tuple[LatSFloat, LonWFloat, LatNFloat, LonEFloat]  |  list  |  None, optional):
            A tuple containing the extent for a bounding box geo search (e.g., [14.9, 37.7, 14.99, 37.78],
            i.e., <latS> <lonW> <latN> <lonE>). Latitudes must be provided as degrees north and longitudes as degrees east.
            Defaults to None.
        path_to_config (str | None, optional):
            If provided, uses given config file instead of the default config. Defaults to None.
        path_to_data (str | None, optional):
            If provided, downloads data to the given folder instead of the one defined in the config file. Defaults to None.
        is_log (bool, optional):
            If True, creates a log file in a `/log` folder inside the current working directory. Defaults to False.
        is_debug (bool, optional):
            If True, shows debug logs in the console. Defaults to False.
        is_download (bool, optional):
            If False, skips download part, but still performs search requests via the data dissemination platform API. Defaults to True.
        is_overwrite (bool, optional):
            If True, downloads and overwrites files that already exist in the data directory instead of skipping them. Defaults to False.
        is_unzip (bool, optional): If False, skips file extraction for downloaded archives. Defaults to True.
        is_delete (bool, optional):
            If True, deletes downloaded archives after extraction (i.e., does not delete non-extracted archives). Defaults to True.
        is_create_subdirs (bool, optional):
            If True, places downloaded files in a sub-directory structure according to the template defined in the config file.
            Defaults to True.
        is_export_results (bool, optional):
            If True, creates a text file in the current working directory listing all search results. Defaults to False.
        idx_selected_input (int | None, optional):
            A number matching an index in the list of found files. If provided, only this single file will be downloaded.
            Defaults to None.
        is_organize_data (bool, optional):
            If True, does not search or download any data. Defaults to False.
        is_include_header (bool | None, optional):
            If True, the full archive is downloaded containing both HDF5 data file (`.h5`) and header data file (`.HDR`).
            If False, only the data file will be downloaded, speeding up the download time.
            Defaults to None.
        is_only_header (bool, optional):
            If True, downloads only header files (`.HDR`). This option overrides `is_include_header`. Defaults to False.
        is_reversed_order (bool, optional):
            If True, downloads data products in reversed order (from the latest to the earliest). Defaults to False.
        return_results (bool, optional):
            If True, returns the search results as a `ProductDataFrame`. Defaults to False.
        verbose (bool, optional):
            If False, does not print logs to the console and does not create log file. Defaults to True.
        check_product_availability (bool, optional):
            If True, sends extra request to the download backend checking the list of available products per data collection.
            If False, uses internally stored lists of available products, significantly reducing execution time (but might fail in case of backend changes).
            Defaults to False.

    Returns:
        results (ProductDataFrame | None):
            If `return_results=False`, the function has no return (i.e., None).
            If `return_results=True`, the function returns the search results.
    """
    time_start_script: pd.Timestamp = pd.Timestamp(
        datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    )
    time_end_script: pd.Timestamp
    execution_time: pd.Timedelta

    def _to_list(input: Any, _type: Type) -> list | None:
        if isinstance(input, _type):
            return [input]
        elif isinstance(input, list):
            return input
        else:
            return None

    _file_type: list[str] | None = _to_list(file_type, str)
    assert isinstance(_file_type, list)
    file_type = _file_type

    orbit_number = _to_list(orbit_number, int)
    frame_id = _to_list(frame_id, str)
    orbit_and_frame = _to_list(orbit_and_frame, str)
    timestamps = _to_list(timestamps, str)

    if isinstance(radius_search, tuple):
        radius_search = list(radius_search)

    if isinstance(bounding_box, tuple):
        bounding_box = list(bounding_box)

    idx_selected: int | None = parse_selected_index(idx_selected_input)

    logger: Logger | None = None
    if verbose:
        logger = create_logger(
            logger_name=PROGRAM_NAME,
            log_to_file=is_log,
            debug=is_debug,
        )
    if is_log:
        remove_old_logs(100, pd.Timedelta(days=30))

    log_textbox(
        f"EarthCARE Download Tool\n{__title__} {__version__}",
        logger=logger,
        is_mayor=True,
    )

    if logger and not is_organize_data:
        logger.info("# Settings")
        logger.info(f"# - {is_download=}")
        logger.info(f"# - {is_overwrite=}")
        logger.info(f"# - {is_unzip=}")
        logger.info(f"# - {is_delete=}")
        logger.info(f"# - {is_create_subdirs=}")
        logger.info(f"# - {is_log=}")
        logger.info(f"# - {is_debug=}")
        logger.info(f"# - {is_export_results=}")
        logger.info(f"# - {idx_selected_input=}")

    config = parse_path_to_config(path_to_config, logger=logger)
    path_to_data = parse_path_to_data(path_to_data, logger=logger)
    if isinstance(path_to_data, str):
        config.path_to_data = path_to_data

    if logger and not is_organize_data:
        logger.info(f"# - config_filepath=<{config.filepath}>")
        logger.info(f"# - data_dirpath=<{config.path_to_data}>")

    if is_organize_data:
        if logger:
            logger.info("# Organizing local data ...")
        performed_moves = organize_data(
            config=config,
            logger=logger,
        )
        time_end_script = pd.Timestamp(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
        execution_time = time_end_script - time_start_script
        execution_time_str = str(execution_time).split()[-1]
        if logger:
            console_exclusive_info()
        _moved = len([pm for pm in performed_moves if pm.get("status") == "success"])
        _failed = len([pm for pm in performed_moves if pm.get("status") == "error"])
        _msg = [
            "EXECUTION SUMMARY",
            "---",
            f"Time taken          {execution_time_str}",
            f"Moved files         {_moved}",
            f"Failed moves        {_failed}",
        ]
        log_textbox("\n".join(_msg), logger=logger, show_time=True)
        return None

    if not isinstance(is_include_header, bool):
        is_include_header = config.maap_include_header_file

    search_inputs: _SearchInputs = parse_search_inputs(
        product_type=file_type,
        baseline=baseline,
        orbit_number=orbit_number,
        start_orbit_number=start_orbit_number,
        end_orbit_number=end_orbit_number,
        frame_id=frame_id,
        orbit_and_frame=orbit_and_frame,
        start_orbit_and_frame=start_orbit_and_frame,
        end_orbit_and_frame=end_orbit_and_frame,
        timestamps=timestamps,
        start_time=start_time,
        end_time=end_time,
        radius_search=radius_search,
        bounding_box=bounding_box,
        logger=logger,
    )
    if config.download_backend.lower() == "maap":
        entrypoint = Entrypoint.MAAP
    else:
        entrypoint = Entrypoint.OADS

    planned_requests: list[EOSearchRequest] = create_search_request_list(
        entrypoint=entrypoint,
        search_inputs=search_inputs,
        input_user_type=None,
        candidate_coll_names_user=[c.value for c in config.collections],
        perform_requests=check_product_availability,
        logger=logger,
    )

    found_products: list[EOProduct] = run_search_requets(
        log_heading_msg="STEP 1/2 - Search products",
        search_requests=planned_requests,
        is_debug=is_debug,
        is_found_files_list_to_txt=is_export_results,
        selected_index=idx_selected,
        selected_index_input=idx_selected_input,
        logger=logger,
        download_only_h5=not is_include_header,
        download_only_hdr=is_only_header or False,
        fetch_geometry=return_results,
    )

    donwload_results: list[_DownloadResult] = run_downloads(
        log_heading_msg="STEP 2/2 - Download products",
        products=found_products,
        config=config,
        entrypoint=entrypoint,
        is_download=is_download,
        is_overwrite=is_overwrite,
        is_unzip=is_unzip,
        is_delete=is_delete,
        is_create_subdirs=is_create_subdirs,
        logger=logger,
        is_reversed_order=is_reversed_order,
    )

    if logger:
        num_downloads: int = 0
        num_unzips: int = 0
        num_errors: int = 0
        size_msg: str = "<missing size_msg>"
        avg_speed_mbs: float = 0.0
        if len(donwload_results) > 0:
            num_errors = sum([not r.success for r in donwload_results])
            num_downloads = sum([r.downloaded for r in donwload_results])
            num_unzips = sum([r.unzipped for r in donwload_results])
            total_size_mb = sum([r.size_mb for r in donwload_results])
            size_msg = f"{total_size_mb:.2f} MB"
            if total_size_mb >= 1024:
                size_msg = f"{total_size_mb / 1024:.2f} GB"
            avg_speed_mbs = float(np.mean([r.speed_mbs for r in donwload_results]))

        time_end_script = pd.Timestamp(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
        execution_time = time_end_script - time_start_script
        execution_time_str = str(execution_time).split()[-1]

        console_exclusive_info()
        _msg = [
            "EXECUTION SUMMARY",
            "---",
            f"Time taken          {execution_time_str}",
            f"API search requests {len(planned_requests)}",
            f"Remote files found  {len(found_products)}",
            f"Files downloaded    {num_downloads} ({size_msg} at ~{avg_speed_mbs:.2f} MB/s)",
            f"Files unzipped      {num_unzips}",
            f"Errors occured      {num_errors}",
        ]
        log_textbox("\n".join(_msg), logger=logger, show_time=True)

    if return_results:
        pdf = get_product_infos([p.name for p in found_products], must_exist=False)
        pdf["start_latitude"] = np.array([p.start_latitude for p in found_products])
        pdf["start_longitude"] = np.array([p.start_longitude for p in found_products])
        pdf["end_latitude"] = np.array([p.end_latitude for p in found_products])
        pdf["end_longitude"] = np.array([p.end_longitude for p in found_products])
        pdf["url_download_h5"] = np.array([p.url_download_h5 for p in found_products], dtype=str)
        pdf["url_download_hdr"] = np.array([p.url_download_hdr for p in found_products], dtype=str)
        pdf["url_quicklook"] = np.array([p.url_quicklook for p in found_products], dtype=str)
        return pdf
    return None