Converted netCDF files can be opened with the open_converted
function that returns a
lazy-loaded EchoData
object
(only metadata are read during opening):
import echopype as ep
file_path = "./converted_files/file.nc" # path to a converted nc file
ed = ep.open_converted(file_path) # create an EchoData object
Likewise, specify the path to open a Zarr dataset. To open such a dataset from cloud storage, use the same
storage_options
parameter as with open_raw.
For example:
s3_path = "s3://s3bucketname/directory_path/dataset.zarr" # S3 dataset path
ed = ep.open_converted(s3_path, storage_options={"anon": True})
Data collected by the same instrument deployment across multiple files can be combined into a single EchoData
object using combine_echodata
. With the release of
echopype version 0.6.3
, one can now combine a large number of files in parallel (using Dask) while maintaining a stable memory usage. This is done under-the-hood
by concatenating data directly into a Zarr store, which corresponds to the final combined EchoData
object.
To use combine_echodata
, the following criteria must be met:
EchoData
object must have the same sonar_model
EchoData
objects to be combined must correspond to different raw data files
(i.e., no duplicated files)EchoData
objects in the list must be of sequential order in time. Specifically,
the first timestamp of each EchoData
object must be smaller (earlier) than the first
timestamp of the subsequent EchoData
objectEchoData
objects must contain the same frequency channels and the same number of
channelsEchoData
objects to be combined:date_created
or conversion_time
; these attributes should have the same data
type)In previous versions, combine_echodata
corrected reversed timestamps and stored
the uncorrected timestamps in the Provenance
group.
Starting from 0.6.3
, combine_echodata
will preserve time coordinates that have
reversed timestamps and not correction is performed.
The first step in combining data is to establish a Dask client with a scheduler. On a local machine, this can be done as follows:
client = Client() # create client with local scheduler
With distributed resources, we highly recommend reviewing the Dask documentation for deploying Dask clusters.
Next, we assemble a list of EchoData
objects. This list can be from converted files (netCDF or Zarr)
as in the example below, or from in-memory EchoData
objects:
ed_list = []
for converted_file in ["convertedfile1.zarr", "convertedfile2.zarr"]:
ed_list.append(ep.open_converted(converted_file)) # already converted files are lazy-loaded
Finally, we apply combine_echodata
on this list to combine all the data into a single
EchoData
object. Here, we will store the final combined form in the Zarr path
path_to/combined_echodata.zarr
and use the client we established above:
combined_ed = ep.combine_echodata(
ed_list,
zarr_path='path_to/combined_echodata.zarr',
client=client
)
Once executed, combine_echodata
returns a lazy loaded EchoData
object (obtained from
zarr_path
) with all data from the input EchoData
objects combined.
As shown in the above example, the path of the combined Zarr store is given by the keyword argument
zarr_path
,
and the Dask client that parallel tasks will be submitted to is given by the keyword argument
client
.
When either (or both) of these are not provided, default values listed in the Notes
section in combine_echodata
will be used.