Excessive memory usage when printing multi-file Dataset #1481

hadfieldnz · 2017-07-19T05:34:25Z

I have a dataset comprising 25 output files from the ROMS ocean model. They are netCDF files ("averages" files in ROMS jargon) containing a number of variables, but most of the storage is devoted to a few time-varying oceanographic variables, either 2D or 3D in space. I have post-processed the files by packing the oceanographic variables to int32 form using the netCDF add_offset and scale_factor attributes. Each file has 100 records in the unlimited dimension (ocean_time) so the complete dataset has 2500 records. The 25 files total 56.8 GiB so would expand to roughly 230 GiB in float64 form.

I open the 25 files with xarray.open_mfdataset, concatenating along the unlimited dimension. This takes a few seconds. I then print() the resulting xarray.Dataset. This takes a few seconds more. All good so far.

But when I vary the number of these files, n, that I include in my xarray.Dataset I get surprising and inconvenient results. All works as expected in reasonable time with n <= 8 and with n >= 19. But with 9 <= n <= 18, the interpreter that's processing the code (pythonw.exe via Ipython) consumes steadily more memory until the 12-14 GiB that's available on my machine is exhausted.

The attached script exposes the problem. In this case the file sequence consists of one file name repeated n times. The value of n currently hard-coded into the script is 10. With this value, the final statement in the script--printing the dataset--will exhaust the memory on my PC in about 10 seconds, if I fail to kill the process first.

I have put a copy of the ROMS output file here:

ftp://ftp.niwa.co.nz/incoming/hadfield/roms_avg_0001.nc

mgh_example_test_mfdataset.py.txt:

"""Explore a performance/memory bug relating to multi-file datasets
"""

import xarray as xr
import os

#%% Specify the list of files. Repeat the same file to get a mult-file dataset

root = os.path.join(
        'D:\\', 'Mirror', 'hpcf', 'working', 'hadfield',
        'work', 'cook', 'roms', 'sim34', 'run',
        'bran-2009-2012-wrfnz-1.20')
file = [os.path.join(root, 'roms_avg_0001.nc')]

file = file*10

print('Number of files:', len(file))

#%% Create a multi-file dataset with the open_mfdataset function.

ds = xr.open_mfdataset(file, concat_dim='ocean_time')

print('The dataset has been successfully opened')

#%% Print a summary

print(ds)

print('The dataset has been printed')

rabernat · 2017-07-20T19:32:18Z

Hi @hadfieldnz -- I believe this issue could be related to #1396, which was fixed in dask/dask#2364.

Could you let us know what versions of xarray and dask you are using?

import xarray
import dask
print(xarray.__version__)
print(dask.__version__)

hadfieldnz · 2017-07-21T07:55:50Z

xarray 0.9.6
dask 0.14.3

fmaussion · 2017-07-21T08:18:52Z

0.14.3 pre-dates the fix dask/dask#2364 mentioned above: can you try to update dask?

hadfieldnz · 2017-07-21T09:14:34Z

I ran "conda update dask", which upgraded me from 0.14.3 to 0.15.0.

Short report: No this has not eliminated the problem.

Long report: Today (Friday) I am on my home machine, which has only 6 GiB RAM. I confirmed earlier today with dask 0.14.3 that I can open and print the dataset with 25 files. And with 10 files IPython halts with a memory error reporting that 85% of the memory is being used. After the upgrade to 0.15.0, running the test script with 10 files, it exhausted all the RAM on my machine and locked it up within a few seconds. I will not be able to investigate this further until I get back on my work machine on Monday.

rabernat · 2017-07-21T12:37:11Z

Can you try calling open_mfdataset with the decode_cf=False option?

shoyer · 2017-07-21T16:40:19Z

Our formatting logic pulls out the first few values of arrays to print them in the repr. It appears that this is failing spectacularly in this case, though I'm not sure why.

Can you share a quick preview of what a single one of your constituent netCDF files looks like?

More broadly: maybe we should disable automatically printing a preview of the contents of xarray.Dataset objects when they have lazily loaded data in the form of dask arrays. This is convenient for interactive use in many cases (when it can be done cheaply!) but fails in many edge cases.

hadfieldnz · 2017-07-23T22:57:06Z

Back at work and able to check things out more thoroughly on a machine with more RAM...

A good number of files to trigger the problem is 10.

As reported before, upgrading dask from 0.14.3 to 0.15.0 did not fix the problem. It seemed to speed up the handling of muli-file datasets generally, therefore causing my PC to crash faster when it crashes.

Ryan, calling open_mfdataset with decode_cf=False does allow me to open and print the 10-file dataset, though this still seems to use an uncomfortably large amount of RAM: about 7 GiB in the Python kernel process, vs only a few hundred for the 25-file dataset.

Stephan, although I discovered this problem when dealing with a 25-file sequence, I boiled it down to a test case involving one file opened multiple times before reporting it here. There is a copy of the file (2.27 Gib) in a publicly accessible location here:

ftp://ftp.niwa.co.nz/incoming/hadfield/roms_avg_0001.nc

and here is the output of ncdump -h:

netcdf roms_avg_0001 {
dimensions:
xi_rho = 482 ;
xi_u = 481 ;
xi_v = 482 ;
xi_psi = 481 ;
eta_rho = 242 ;
eta_u = 242 ;
eta_v = 241 ;
eta_psi = 241 ;
s_rho = 20 ;
s_w = 21 ;
tracer = 2 ;
boundary = 4 ;
ocean_time = UNLIMITED ; // (100 currently)
variables:
int ntimes ;
ntimes:long_name = "number of long time-steps" ;
int ndtfast ;
ndtfast:long_name = "number of short time-steps" ;
double dt ;
dt:long_name = "size of long time-steps" ;
dt:units = "second" ;
double dtfast ;
dtfast:long_name = "size of short time-steps" ;
dtfast:units = "second" ;
double dstart ;
dstart:long_name = "time stamp assigned to model initilization" ;
dstart:units = "days since 1990-01-01 00:00:00" ;
int nHIS ;
nHIS:long_name = "number of time-steps between history records" ;
int ndefHIS ;
ndefHIS:long_name = "number of time-steps between the creation of history files" ;
int nRST ;
nRST:long_name = "number of time-steps between restart records" ;
int ntsAVG ;
ntsAVG:long_name = "starting time-step for accumulation of time-averaged fields" ;
int nAVG ;
nAVG:long_name = "number of time-steps between time-averaged records" ;
int ndefAVG ;
ndefAVG:long_name = "number of time-steps between the creation of average files" ;
int nSTA ;
nSTA:long_name = "number of time-steps between stations records" ;
double Falpha ;
Falpha:long_name = "Power-law shape barotropic filter parameter" ;
double Fbeta ;
Fbeta:long_name = "Power-law shape barotropic filter parameter" ;
double Fgamma ;
Fgamma:long_name = "Power-law shape barotropic filter parameter" ;
double Akt_bak(tracer) ;
Akt_bak:long_name = "background vertical mixing coefficient for tracers" ;
Akt_bak:units = "meter2 second-1" ;
double Akv_bak ;
Akv_bak:long_name = "background vertical mixing coefficient for momentum" ;
Akv_bak:units = "meter2 second-1" ;
double Akk_bak ;
Akk_bak:long_name = "background vertical mixing coefficient for turbulent energy" ;
Akk_bak:units = "meter2 second-1" ;
double Akp_bak ;
Akp_bak:long_name = "background vertical mixing coefficient for length scale" ;
Akp_bak:units = "meter2 second-1" ;
double rdrg ;
rdrg:long_name = "linear drag coefficient" ;
rdrg:units = "meter second-1" ;
double rdrg2 ;
rdrg2:long_name = "quadratic drag coefficient" ;
double Zob ;
Zob:long_name = "bottom roughness" ;
Zob:units = "meter" ;
double Zos ;
Zos:long_name = "surface roughness" ;
Zos:units = "meter" ;
double gls_p ;
gls_p:long_name = "stability exponent" ;
double gls_m ;
gls_m:long_name = "turbulent kinetic energy exponent" ;
double gls_n ;
gls_n:long_name = "turbulent length scale exponent" ;
double gls_cmu0 ;
gls_cmu0:long_name = "stability coefficient" ;
double gls_c1 ;
gls_c1:long_name = "shear production coefficient" ;
double gls_c2 ;
gls_c2:long_name = "dissipation coefficient" ;
double gls_c3m ;
gls_c3m:long_name = "buoyancy production coefficient (minus)" ;
double gls_c3p ;
gls_c3p:long_name = "buoyancy production coefficient (plus)" ;
double gls_sigk ;
gls_sigk:long_name = "constant Schmidt number for TKE" ;
double gls_sigp ;
gls_sigp:long_name = "constant Schmidt number for PSI" ;
double gls_Kmin ;
gls_Kmin:long_name = "minimum value of specific turbulent kinetic energy" ;
double gls_Pmin ;
gls_Pmin:long_name = "minimum Value of dissipation" ;
double Charnok_alpha ;
Charnok_alpha:long_name = "Charnok factor for surface roughness" ;
double Zos_hsig_alpha ;
Zos_hsig_alpha:long_name = "wave amplitude factor for surface roughness" ;
double sz_alpha ;
sz_alpha:long_name = "surface flux from wave dissipation" ;
double CrgBan_cw ;
CrgBan_cw:long_name = "surface flux due to Craig and Banner wave breaking" ;
double Znudg ;
Znudg:long_name = "free-surface nudging/relaxation inverse time scale" ;
Znudg:units = "day-1" ;
double M2nudg ;
M2nudg:long_name = "2D momentum nudging/relaxation inverse time scale" ;
M2nudg:units = "day-1" ;
double M3nudg ;
M3nudg:long_name = "3D momentum nudging/relaxation inverse time scale" ;
M3nudg:units = "day-1" ;
double Tnudg(tracer) ;
Tnudg:long_name = "Tracers nudging/relaxation inverse time scale" ;
Tnudg:units = "day-1" ;
double FSobc_in(boundary) ;
FSobc_in:long_name = "free-surface inflow, nudging inverse time scale" ;
FSobc_in:units = "second-1" ;
double FSobc_out(boundary) ;
FSobc_out:long_name = "free-surface outflow, nudging inverse time scale" ;
FSobc_out:units = "second-1" ;
double M2obc_in(boundary) ;
M2obc_in:long_name = "2D momentum inflow, nudging inverse time scale" ;
M2obc_in:units = "second-1" ;
double M2obc_out(boundary) ;
M2obc_out:long_name = "2D momentum outflow, nudging inverse time scale" ;
M2obc_out:units = "second-1" ;
double Tobc_in(boundary, tracer) ;
Tobc_in:long_name = "tracers inflow, nudging inverse time scale" ;
Tobc_in:units = "second-1" ;
double Tobc_out(boundary, tracer) ;
Tobc_out:long_name = "tracers outflow, nudging inverse time scale" ;
Tobc_out:units = "second-1" ;
double M3obc_in(boundary) ;
M3obc_in:long_name = "3D momentum inflow, nudging inverse time scale" ;
M3obc_in:units = "second-1" ;
double M3obc_out(boundary) ;
M3obc_out:long_name = "3D momentum outflow, nudging inverse time scale" ;
M3obc_out:units = "second-1" ;
double rho0 ;
rho0:long_name = "mean density used in Boussinesq approximation" ;
rho0:units = "kilogram meter-3" ;
double gamma2 ;
gamma2:long_name = "slipperiness parameter" ;
int LuvSrc ;
LuvSrc:long_name = "momentum point sources and sink activation switch" ;
LuvSrc:flag_values = 0, 1 ;
LuvSrc:flag_meanings = ".FALSE. .TRUE." ;
int LwSrc ;
LwSrc:long_name = "mass point sources and sink activation switch" ;
LwSrc:flag_values = 0, 1 ;
LwSrc:flag_meanings = ".FALSE. .TRUE." ;
int LtracerSrc(tracer) ;
LtracerSrc:long_name = "tracer point sources and sink activation switch" ;
LtracerSrc:flag_values = 0, 1 ;
LtracerSrc:flag_meanings = ".FALSE. .TRUE." ;
int LsshCLM ;
LsshCLM:long_name = "sea surface height climatology processing switch" ;
LsshCLM:flag_values = 0, 1 ;
LsshCLM:flag_meanings = ".FALSE. .TRUE." ;
int Lm2CLM ;
Lm2CLM:long_name = "2D momentum climatology processing switch" ;
Lm2CLM:flag_values = 0, 1 ;
Lm2CLM:flag_meanings = ".FALSE. .TRUE." ;
int Lm3CLM ;
Lm3CLM:long_name = "3D momentum climatology processing switch" ;
Lm3CLM:flag_values = 0, 1 ;
Lm3CLM:flag_meanings = ".FALSE. .TRUE." ;
int LtracerCLM(tracer) ;
LtracerCLM:long_name = "tracer climatology processing switch" ;
LtracerCLM:flag_values = 0, 1 ;
LtracerCLM:flag_meanings = ".FALSE. .TRUE." ;
int LnudgeM2CLM ;
LnudgeM2CLM:long_name = "2D momentum climatology nudging activation switch" ;
LnudgeM2CLM:flag_values = 0, 1 ;
LnudgeM2CLM:flag_meanings = ".FALSE. .TRUE." ;
int LnudgeM3CLM ;
LnudgeM3CLM:long_name = "3D momentum climatology nudging activation switch" ;
LnudgeM3CLM:flag_values = 0, 1 ;
LnudgeM3CLM:flag_meanings = ".FALSE. .TRUE." ;
int LnudgeTCLM(tracer) ;
LnudgeTCLM:long_name = "tracer climatology nudging activation switch" ;
LnudgeTCLM:flag_values = 0, 1 ;
LnudgeTCLM:flag_meanings = ".FALSE. .TRUE." ;
int spherical ;
spherical:long_name = "grid type logical switch" ;
spherical:flag_values = 0, 1 ;
spherical:flag_meanings = "Cartesian spherical" ;
double xl ;
xl:long_name = "domain length in the XI-direction" ;
xl:units = "meter" ;
double el ;
el:long_name = "domain length in the ETA-direction" ;
el:units = "meter" ;
int Vtransform ;
Vtransform:long_name = "vertical terrain-following transformation equation" ;
int Vstretching ;
Vstretching:long_name = "vertical terrain-following stretching function" ;
double theta_s ;
theta_s:long_name = "S-coordinate surface control parameter" ;
double theta_b ;
theta_b:long_name = "S-coordinate bottom control parameter" ;
double Tcline ;
Tcline:long_name = "S-coordinate surface/bottom layer width" ;
Tcline:units = "meter" ;
double hc ;
hc:long_name = "S-coordinate parameter, critical depth" ;
hc:units = "meter" ;
int grid ;
grid:cf_role = "grid_topology" ;
grid:topology_dimension = 2 ;
grid:node_dimensions = "xi_psi eta_psi" ;
grid:face_dimensions = "xi_rho: xi_psi (padding: both) eta_rho: eta_psi (padding: both)" ;
grid:edge1_dimensions = "xi_u: xi_psi eta_u: eta_psi (padding: both)" ;
grid:edge2_dimensions = "xi_v: xi_psi (padding: both) eta_v: eta_psi" ;
grid:node_coordinates = "lon_psi lat_psi" ;
grid:face_coordinates = "lon_rho lat_rho" ;
grid:edge1_coordinates = "lon_u lat_u" ;
grid:edge2_coordinates = "lon_v lat_v" ;
grid:vertical_dimensions = "s_rho: s_w (padding: none)" ;
double s_rho(s_rho) ;
s_rho:long_name = "S-coordinate at RHO-points" ;
s_rho:valid_min = -1. ;
s_rho:valid_max = 0. ;
s_rho:positive = "up" ;
s_rho:standard_name = "ocean_s_coordinate_g2" ;
s_rho:formula_terms = "s: s_rho C: Cs_r eta: zeta depth: h depth_c: hc" ;
s_rho:field = "s_rho, scalar" ;
double s_w(s_w) ;
s_w:long_name = "S-coordinate at W-points" ;
s_w:valid_min = -1. ;
s_w:valid_max = 0. ;
s_w:positive = "up" ;
s_w:standard_name = "ocean_s_coordinate_g2" ;
s_w:formula_terms = "s: s_w C: Cs_w eta: zeta depth: h depth_c: hc" ;
s_w:field = "s_w, scalar" ;
double Cs_r(s_rho) ;
Cs_r:long_name = "S-coordinate stretching curves at RHO-points" ;
Cs_r:valid_min = -1. ;
Cs_r:valid_max = 0. ;
Cs_r:field = "Cs_r, scalar" ;
double Cs_w(s_w) ;
Cs_w:long_name = "S-coordinate stretching curves at W-points" ;
Cs_w:valid_min = -1. ;
Cs_w:valid_max = 0. ;
Cs_w:field = "Cs_w, scalar" ;
double h(eta_rho, xi_rho) ;
h:long_name = "bathymetry at RHO-points" ;
h:units = "meter" ;
h:grid = "grid" ;
h:location = "face" ;
h:coordinates = "lon_rho lat_rho" ;
h:field = "bath, scalar" ;
double f(eta_rho, xi_rho) ;
f:long_name = "Coriolis parameter at RHO-points" ;
f:units = "second-1" ;
f:grid = "grid" ;
f:location = "face" ;
f:coordinates = "lon_rho lat_rho" ;
f:field = "coriolis, scalar" ;
double pm(eta_rho, xi_rho) ;
pm:long_name = "curvilinear coordinate metric in XI" ;
pm:units = "meter-1" ;
pm:grid = "grid" ;
pm:location = "face" ;
pm:coordinates = "lon_rho lat_rho" ;
pm:field = "pm, scalar" ;
double pn(eta_rho, xi_rho) ;
pn:long_name = "curvilinear coordinate metric in ETA" ;
pn:units = "meter-1" ;
pn:grid = "grid" ;
pn:location = "face" ;
pn:coordinates = "lon_rho lat_rho" ;
pn:field = "pn, scalar" ;
double lon_rho(eta_rho, xi_rho) ;
lon_rho:long_name = "longitude of RHO-points" ;
lon_rho:units = "degree_east" ;
lon_rho:standard_name = "longitude" ;
lon_rho:field = "lon_rho, scalar" ;
double lat_rho(eta_rho, xi_rho) ;
lat_rho:long_name = "latitude of RHO-points" ;
lat_rho:units = "degree_north" ;
lat_rho:standard_name = "latitude" ;
lat_rho:field = "lat_rho, scalar" ;
double lon_u(eta_u, xi_u) ;
lon_u:long_name = "longitude of U-points" ;
lon_u:units = "degree_east" ;
lon_u:standard_name = "longitude" ;
lon_u:field = "lon_u, scalar" ;
double lat_u(eta_u, xi_u) ;
lat_u:long_name = "latitude of U-points" ;
lat_u:units = "degree_north" ;
lat_u:standard_name = "latitude" ;
lat_u:field = "lat_u, scalar" ;
double lon_v(eta_v, xi_v) ;
lon_v:long_name = "longitude of V-points" ;
lon_v:units = "degree_east" ;
lon_v:standard_name = "longitude" ;
lon_v:field = "lon_v, scalar" ;
double lat_v(eta_v, xi_v) ;
lat_v:long_name = "latitude of V-points" ;
lat_v:units = "degree_north" ;
lat_v:standard_name = "latitude" ;
lat_v:field = "lat_v, scalar" ;
double lon_psi(eta_psi, xi_psi) ;
lon_psi:long_name = "longitude of PSI-points" ;
lon_psi:units = "degree_east" ;
lon_psi:standard_name = "longitude" ;
lon_psi:field = "lon_psi, scalar" ;
double lat_psi(eta_psi, xi_psi) ;
lat_psi:long_name = "latitude of PSI-points" ;
lat_psi:units = "degree_north" ;
lat_psi:standard_name = "latitude" ;
lat_psi:field = "lat_psi, scalar" ;
double angle(eta_rho, xi_rho) ;
angle:long_name = "angle between XI-axis and EAST" ;
angle:units = "radians" ;
angle:grid = "grid" ;
angle:location = "face" ;
angle:coordinates = "lon_rho lat_rho" ;
angle:field = "angle, scalar" ;
double mask_rho(eta_rho, xi_rho) ;
mask_rho:long_name = "mask on RHO-points" ;
mask_rho:flag_values = 0., 1. ;
mask_rho:flag_meanings = "land water" ;
mask_rho:grid = "grid" ;
mask_rho:location = "face" ;
mask_rho:coordinates = "lon_rho lat_rho" ;
double mask_u(eta_u, xi_u) ;
mask_u:long_name = "mask on U-points" ;
mask_u:flag_values = 0., 1. ;
mask_u:flag_meanings = "land water" ;
mask_u:grid = "grid" ;
mask_u:location = "edge1" ;
mask_u:coordinates = "lon_u lat_u" ;
double mask_v(eta_v, xi_v) ;
mask_v:long_name = "mask on V-points" ;
mask_v:flag_values = 0., 1. ;
mask_v:flag_meanings = "land water" ;
mask_v:grid = "grid" ;
mask_v:location = "edge2" ;
mask_v:coordinates = "lon_v lat_v" ;
double mask_psi(eta_psi, xi_psi) ;
mask_psi:long_name = "mask on psi-points" ;
mask_psi:flag_values = 0., 1. ;
mask_psi:flag_meanings = "land water" ;
mask_psi:grid = "grid" ;
mask_psi:location = "node" ;
mask_psi:coordinates = "lon_psi lat_psi" ;
double ocean_time(ocean_time) ;
ocean_time:long_name = "averaged time since initialization" ;
ocean_time:units = "seconds since 1990-01-01 00:00:00" ;
ocean_time:calendar = "gregorian" ;
ocean_time:field = "time, scalar, series" ;
short zeta(ocean_time, eta_rho, xi_rho) ;
zeta:long_name = "time-averaged free-surface" ;
zeta:units = "meter" ;
zeta:time = "ocean_time" ;
zeta:grid = "grid" ;
zeta:location = "face" ;
zeta:coordinates = "lon_rho lat_rho ocean_time" ;
zeta:field = "free-surface, scalar, series" ;
zeta:add_offset = -0.0001525949f ;
zeta:scale_factor = 0.0003051898f ;
zeta:valid_range = -32766s, 32767s ;
short ubar(ocean_time, eta_u, xi_u) ;
ubar:long_name = "time-averaged vertically integrated u-momentum component" ;
ubar:units = "meter second-1" ;
ubar:time = "ocean_time" ;
ubar:grid = "grid" ;
ubar:location = "edge1" ;
ubar:coordinates = "lon_u lat_u ocean_time" ;
ubar:field = "ubar-velocity, scalar, series" ;
ubar:add_offset = -0.0001525949f ;
ubar:scale_factor = 0.0003051898f ;
ubar:valid_range = -32766s, 32767s ;
short vbar(ocean_time, eta_v, xi_v) ;
vbar:long_name = "time-averaged vertically integrated v-momentum component" ;
vbar:units = "meter second-1" ;
vbar:time = "ocean_time" ;
vbar:grid = "grid" ;
vbar:location = "edge2" ;
vbar:coordinates = "lon_v lat_v ocean_time" ;
vbar:field = "vbar-velocity, scalar, series" ;
vbar:add_offset = -0.0001525949f ;
vbar:scale_factor = 0.0003051898f ;
vbar:valid_range = -32766s, 32767s ;
short u(ocean_time, s_rho, eta_u, xi_u) ;
u:long_name = "time-averaged u-momentum component" ;
u:units = "meter second-1" ;
u:time = "ocean_time" ;
u:grid = "grid" ;
u:location = "edge1" ;
u:coordinates = "lon_u lat_u s_rho ocean_time" ;
u:field = "u-velocity, scalar, series" ;
u:add_offset = -0.0001525949f ;
u:scale_factor = 0.0003051898f ;
u:valid_range = -32766s, 32767s ;
short v(ocean_time, s_rho, eta_v, xi_v) ;
v:long_name = "time-averaged v-momentum component" ;
v:units = "meter second-1" ;
v:time = "ocean_time" ;
v:grid = "grid" ;
v:location = "edge2" ;
v:coordinates = "lon_v lat_v s_rho ocean_time" ;
v:field = "v-velocity, scalar, series" ;
v:add_offset = -0.0001525949f ;
v:scale_factor = 0.0003051898f ;
v:valid_range = -32766s, 32767s ;
short w(ocean_time, s_w, eta_rho, xi_rho) ;
w:long_name = "time-averaged vertical momentum component" ;
w:units = "meter second-1" ;
w:time = "ocean_time" ;
w:standard_name = "upward_sea_water_velocity" ;
w:grid = "grid" ;
w:location = "face" ;
w:coordinates = "lon_rho lat_rho s_w ocean_time" ;
w:field = "w-velocity, scalar, series" ;
w:add_offset = -1.525949e-05f ;
w:scale_factor = 3.051898e-05f ;
w:valid_range = -32766s, 32767s ;
short temp(ocean_time, s_rho, eta_rho, xi_rho) ;
temp:long_name = "time-averaged potential temperature" ;
temp:units = "Celsius" ;
temp:time = "ocean_time" ;
temp:grid = "grid" ;
temp:location = "face" ;
temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
temp:field = "temperature, scalar, series" ;
temp:add_offset = 19.99962f ;
temp:scale_factor = 0.0007629744f ;
temp:valid_range = -32766s, 32767s ;
short salt(ocean_time, s_rho, eta_rho, xi_rho) ;
salt:long_name = "time-averaged salinity" ;
salt:time = "ocean_time" ;
salt:grid = "grid" ;
salt:location = "face" ;
salt:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
salt:field = "salinity, scalar, series" ;
salt:add_offset = 19.99969f ;
salt:scale_factor = 0.0006103795f ;
salt:valid_range = -32766s, 32767s ;

// global attributes:
:file = "roms_avg_0001.nc" ;
:format = "netCDF-3 64bit offset file" ;
:Conventions = "CF-1.4, SGRID-0.3" ;
:type = "ROMS/TOMS nonlinear model averages file" ;
:title = "ROMS - Cook Strait" ;
:var_info = "varinfo.dat" ;
:rst_file = "roms_rst_0001.nc" ;
:avg_file = "roms_avg_0001.nc" ;
:sta_file = "roms_sta_0001.nc" ;
:grd_file = "../../grd/roms_grd.nc" ;
:ini_file = "roms_rst_0000.nc" ;
:frc_file_01 = "../../frc/yearly/roms_frc_stress_wrfnz_1.20_2009.nc, ../../frc/yearly/roms_frc_stress_wrfnz_1.20_2010.nc, ../../frc/yearly/roms_frc_stress_wrfnz_1.20_2011.nc, ../../frc/yearly/roms_frc_stress_wrfnz_1.20_2012.nc, ../../frc/yearly/roms_frc_stress_wrfnz_1.20_2013.nc" ;
:frc_file_02 = "../../frc/yearly/roms_frc_sst_oisst_2009.nc, ../../frc/yearly/roms_frc_sst_oisst_2010.nc, ../../frc/yearly/roms_frc_sst_oisst_2011.nc, ../../frc/yearly/roms_frc_sst_oisst_2012.nc, ../../frc/yearly/roms_frc_sst_oisst_2013.nc" ;
:frc_file_03 = "../../frc/yearly/roms_frc_shflux_ncep_2009.nc, ../../frc/yearly/roms_frc_shflux_ncep_2010.nc, ../../frc/yearly/roms_frc_shflux_ncep_2011.nc, ../../frc/yearly/roms_frc_shflux_ncep_2012.nc, ../../frc/yearly/roms_frc_shflux_ncep_2013.nc" ;
:frc_file_04 = "../../frc/yearly/roms_frc_swflux_ncep_2009.nc, ../../frc/yearly/roms_frc_swflux_ncep_2010.nc, ../../frc/yearly/roms_frc_swflux_ncep_2011.nc, ../../frc/yearly/roms_frc_swflux_ncep_2012.nc, ../../frc/yearly/roms_frc_swflux_ncep_2013.nc" ;
:frc_file_05 = "../../frc/yearly/roms_frc_swrad_ncep_2009.nc, ../../frc/yearly/roms_frc_swrad_ncep_2010.nc, ../../frc/yearly/roms_frc_swrad_ncep_2011.nc, ../../frc/yearly/roms_frc_swrad_ncep_2012.nc, ../../frc/yearly/roms_frc_swrad_ncep_2013.nc" ;
:frc_file_06 = "../../frc/fixed/roms_frc_tide.nc" ;
:bry_file = "../../clm/bran-yearly/roms_bry_2009.nc, ../../clm/bran-yearly/roms_bry_2010.nc, ../../clm/bran-yearly/roms_bry_2011.nc, ../../clm/bran-yearly/roms_bry_2012.nc" ;
:clm_file = "../../clm/bran-yearly/roms_clm_2009.nc, ../../clm/bran-yearly/roms_clm_2010.nc, ../../clm/bran-yearly/roms_clm_2011.nc, ../../clm/bran-yearly/roms_clm_2012.nc" ;
:nud_file = "../../nud/a/roms_nud.nc" ;
:script_file = "" ;
:spos_file = "roms_sta.in" ;
:NLM_LBC = "\n",
"EDGE: WEST SOUTH EAST NORTH \n",
"zeta: Che Che Che Che \n",
"ubar: Shc Shc Shc Shc \n",
"vbar: Shc Shc Shc Shc \n",
"u: RadNud RadNud RadNud RadNud \n",
"v: RadNud RadNud RadNud RadNud \n",
"temp: RadNud RadNud RadNud RadNud \n",
"salt: RadNud RadNud RadNud RadNud \n",
"tke: Gra Gra Gra Gra" ;
:svn_url = "https:://myroms.org/svn/src" ;
:svn_rev = "Unversioned directory" ;
:code_dir = "/gpfs_hpcf/filesets/hpcf/scratch/hadfield/roms_bld_AIX-00CD7D244C00_b52f5e7e37f4965dbbc3c64d675da121" ;
:header_dir = "/gpfs_hpcf/filesets/hpcf/scratch/hadfield/roms_bld_AIX-00CD7D244C00_b52f5e7e37f4965dbbc3c64d675da121/ROMS/Include" ;
:header_file = "greater_cook.h" ;
:os = "AIX" ;
:cpu = "powerpc" ;
:compiler_system = "xlf" ;
:compiler_command = "/usr/bin/mpxlf95_r" ;
:compiler_flags = "-qsuffix=f=f90 -qmaxmem=-1 -qarch=pwr6 -qnoextname -q64 -O3 -qstrict -qfree=f90 -qfree=f90" ;
:tiling = "008x008" ;
:history = "2017-03-28 02:02:44: packed with rncpack6\n",
"ROMS/TOMS, Version 3.7, Monday - March 27, 2017 - 9:18:28 PM" ;
:ana_file = "ROMS/Functionals/ana_btflux.h, ROMS/Functionals/ana_srflux.h, ROMS/Functionals/ana_dqdsst.h" ;
:CPP_options = "GREATER_COOK, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ANA_BTFLUX, ANA_DQDSST, ASSUMED_SHAPE, AVERAGES, CURVGRID, DIURNAL_SRFLUX, DJ_GRADPS, DOUBLE_PRECISION, GLS_MIXING, KANTHA_CLAYSON, MASKING, MPI, NONLINEAR, NONLIN_EOS, NO_LBC_ATT, N2S2_HORAVG, POWER_LAW, PROFILE, QCORRECTION, K_GSCHEME, RADIATION_2D, RAMP_TIDES, !RST_SINGLE, SALINITY, SOLAR_SOURCE, SOLVE3D, SSH_TIDES, STATIONS, TS_U3HADVECTION, TS_SVADVECTION, UV_ADV, UV_COR, UV_U3HADVECTION, UV_C4VADVECTION, UV_LOGDRAG, UV_TIDES, VAR_RHO_2D" ;
}

hadfieldnz · 2017-07-24T01:58:32Z

In response to your comment, Stephan

More broadly: maybe we should disable automatically printing a preview of the contents of xarray.Dataset objects when they have lazily loaded data in the form of dask arrays. This is convenient for interactive use in many cases (when it can be done cheaply!) but fails in many edge cases.

Speaking rather selfishly--as someone who is quite good at finding bugs in scientific software, but not much use in fixing them--my worry is that the bugs that are no longer uncovered by printing the dataset preview would come back to bite me some other way.

jhamman · 2017-09-22T19:30:00Z

@hadfieldnz - I think this was just fixed in #1532. Keep an eye out for the 0.10 release. Feel free to reopen if you feel there's more to do here.

jhamman added topic-backends topic-dask topic-performance labels Jul 21, 2017

fmaussion mentioned this issue Sep 14, 2017

Subsetting not reducing memory consumption with mfdataset on Windows #1559

Closed

jhamman closed this as completed Sep 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory usage when printing multi-file Dataset #1481

Excessive memory usage when printing multi-file Dataset #1481

hadfieldnz commented Jul 19, 2017 •

edited by shoyer

Loading

rabernat commented Jul 20, 2017

hadfieldnz commented Jul 21, 2017

fmaussion commented Jul 21, 2017

hadfieldnz commented Jul 21, 2017

rabernat commented Jul 21, 2017 •

edited

Loading

shoyer commented Jul 21, 2017

hadfieldnz commented Jul 23, 2017

hadfieldnz commented Jul 24, 2017

jhamman commented Sep 22, 2017

Excessive memory usage when printing multi-file Dataset #1481

Excessive memory usage when printing multi-file Dataset #1481

Comments

hadfieldnz commented Jul 19, 2017 • edited by shoyer Loading

rabernat commented Jul 20, 2017

hadfieldnz commented Jul 21, 2017

fmaussion commented Jul 21, 2017

hadfieldnz commented Jul 21, 2017

rabernat commented Jul 21, 2017 • edited Loading

shoyer commented Jul 21, 2017

hadfieldnz commented Jul 23, 2017

hadfieldnz commented Jul 24, 2017

jhamman commented Sep 22, 2017

hadfieldnz commented Jul 19, 2017 •

edited by shoyer

Loading

rabernat commented Jul 21, 2017 •

edited

Loading