Converted netCDF files can be opened with the open_converted function that returns a
lazy-loaded EchoData object
(only metadata are read during opening):
import echopype as ep
file_path = "./converted_files/file.nc" # path to a converted nc file
ed = ep.open_converted(file_path) # create an EchoData object
Likewise, specify the path to open a Zarr dataset. To open such a dataset from cloud storage, use the same
storage_options parameter as with open_raw.
For example:
s3_path = "s3://s3bucketname/directory_path/dataset.zarr" # S3 dataset path
ed = ep.open_converted(s3_path, storage_options={"anon": True})
Data collected by the same instrument deployment across multiple files can be combined into a single EchoData object using combine_echodata. With the release of
echopype version 0.6.3, one can now combine a large number of files in parallel (using Dask) while maintaining a stable memory usage. This is done under-the-hood
by concatenating data directly into a Zarr store, which corresponds to the final combined EchoData
object.
To use combine_echodata, the following criteria must be met:
EchoData object must have the same sonar_modelEchoData objects to be combined must correspond to different raw data files
(i.e., no duplicated files)EchoData objects in the list must be of sequential order in time. Specifically,
the first timestamp of each EchoData object must be smaller (earlier) than the first
timestamp of the subsequent EchoData objectEchoData objects must contain the same frequency channels and the same number of
channelsEchoData objects to be combined:date_created or conversion_time; these attributes should have the same data
type)In previous versions, combine_echodata corrected reversed timestamps and stored
the uncorrected timestamps in the Provenance group.
Starting from 0.6.3, combine_echodata will preserve time coordinates that have
reversed timestamps and not correction is performed.
The first step in combining data is to establish a Dask client with a scheduler. On a local machine, this can be done as follows:
client = Client() # create client with local scheduler
With distributed resources, we highly recommend reviewing the Dask documentation for deploying Dask clusters.
Next, we assemble a list of EchoData objects. This list can be from converted files (netCDF or Zarr)
as in the example below, or from in-memory EchoData objects:
ed_list = []
for converted_file in ["convertedfile1.zarr", "convertedfile2.zarr"]:
ed_list.append(ep.open_converted(converted_file)) # already converted files are lazy-loaded
Finally, we apply combine_echodata on this list to combine all the data into a single
EchoData object. Here, we will store the final combined form in the Zarr path
path_to/combined_echodata.zarr and use the client we established above:
combined_ed = ep.combine_echodata(
ed_list,
zarr_path='path_to/combined_echodata.zarr',
client=client
)
Once executed, combine_echodata returns a lazy loaded EchoData object (obtained from
zarr_path) with all data from the input EchoData objects combined.
As shown in the above example, the path of the combined Zarr store is given by the keyword argument
zarr_path,
and the Dask client that parallel tasks will be submitted to is given by the keyword argument
client.
When either (or both) of these are not provided, default values listed in the Notes section in combine_echodata will be used.