Working with converted files

Open a converted netCDF or Zarr dataset

Converted netCDF files can be opened with the open_converted function that returns a lazy-loaded EchoData object (only metadata are read during opening):

import echopype as ep
file_path = "./converted_files/file.nc"      # path to a converted nc file
ed = ep.open_converted(file_path)            # create an EchoData object

Likewise, specify the path to open a Zarr dataset. To open such a dataset from cloud storage, use the same storage_options parameter as with open_raw. For example:

s3_path = "s3://s3bucketname/directory_path/dataset.zarr"     # S3 dataset path
ed = ep.open_converted(s3_path, storage_options={"anon": True})

Combine EchoData objects

Data collected by the same instrument deployment across multiple files can be combined into a single EchoData object using combine_echodata. With the release of echopype version 0.6.3, one can now combine a large number of files in parallel (using Dask) while maintaining a stable memory usage. This is done under-the-hood by concatenating data directly into a Zarr store, which corresponds to the final combined EchoData object.

To use combine_echodata, the following criteria must be met:

In previous versions, combine_echodata corrected reversed timestamps and stored the uncorrected timestamps in the Provenance group. Starting from 0.6.3, combine_echodata will preserve time coordinates that have reversed timestamps and not correction is performed.

The first step in combining data is to establish a Dask client with a scheduler. On a local machine, this can be done as follows:

client = Client()  # create client with local scheduler

With distributed resources, we highly recommend reviewing the Dask documentation for deploying Dask clusters.

Next, we assemble a list of EchoData objects. This list can be from converted files (netCDF or Zarr) as in the example below, or from in-memory EchoData objects:

ed_list = []
for converted_file in ["convertedfile1.zarr", "convertedfile2.zarr"]:
    ed_list.append(ep.open_converted(converted_file))  # already converted files are lazy-loaded

Finally, we apply combine_echodata on this list to combine all the data into a single EchoData object. Here, we will store the final combined form in the Zarr path path_to/combined_echodata.zarr and use the client we established above:

combined_ed = ep.combine_echodata(
    ed_list, 
    zarr_path='path_to/combined_echodata.zarr', 
    client=client
)

Once executed, combine_echodata returns a lazy loaded EchoData object (obtained from zarr_path) with all data from the input EchoData objects combined.

As shown in the above example, the path of the combined Zarr store is given by the keyword argument zarr_path, and the Dask client that parallel tasks will be submitted to is given by the keyword argument client. When either (or both) of these are not provided, default values listed in the Notes section in combine_echodata will be used.