Deleting a recently opened netCDF4 #9410

abinashmk · 2024-08-28T17:24:38Z

As a part of my task, I had to download, read, process and then finally delete the netCDF files after a certain number of files have been read due to storage limitations. But even after manually closing the files or using context manager:

with xarray.open_dataset(filePath) as ds: 
     #processing code
os.remove(filePath)

OR

ds=xarray.open_dataset(filePath)
#processing code
ds.close()
os.remove(filePath)

, I get

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

I did refer to the previously reported issues #1629 and #2887. But using context manager or changing the engine through which netCDF file is read was of no help. Is there any way to work around this?

welcome · 2024-08-28T17:24:41Z

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

max-sixty · 2024-08-28T18:42:06Z

Could you make an MCVE to copy & paste, using the context manager?

abinashmk · 2024-08-28T20:46:56Z

Do you mean copying the contents of the file or the file itself?

max-sixty · 2024-08-28T22:42:54Z

The file should be created inline.

Thanks!

abinashmk · 2024-08-29T02:44:29Z

I am a bit lost here. What I am trying to do doesn't seem to be related to the creation of the file. There are two dimensions in the dataset, and I am trying to slice a portion from ds as in the code below, after which I have no use for the original file. I need to delete it as it is big. An MCVE of something I did would look like:

import xarray
import os
with xarray.open_dataset(filePath) as ds:
        cropped_ds=ds.sel(x=slice(x1,x2), y=slice(y1,y2)) #x and y are the dimensions in the dataset

os.remove(filePath)

Assuming it was because of the processing that happened in between, I replace it with just a print statement.

import xarray
import os
with xarray.open_dataset(filePath) as ds:
        print(ds)

os.remove(filePath)

However, the problem persisted. I hope I was able to give what you asked in this comment. Please tell me if you need any other info.

max-sixty · 2024-08-29T03:03:07Z

Sorry if I'm being unclear. Have a look at the docs for an MCVE in the issue template. The example should be able to be copy-pasted into a new python prompt.

keewis · 2024-08-29T08:46:34Z

the issue is that we don't have access to your file (nor should we be able to access it), instead what we're looking for is if you can create a dummy dataset, save that to disk and allow us to reproduce your issue that way. For example:

filepath = ...
ds = xr.Dataset({"a": (["x", "y"], np.ones(shape=(10, 12), dtype="float64"))}, coords={"x": range(10), "y": range(12)})
ds.to_netcdf(filepath)

... # code to reproduce your issue

(you might have to adapt the dummy dataset to actually reproduce your issue, this is just an example)

abinashmk · 2024-08-30T15:45:49Z

So, I was trying to write the MCVE for the issue I was facing. The code looks something like this:

import xarray as xr
import numpy as np
import os

# Create latitude and longitude arrays
lat = np.arange(-90, 90, 0.01)
lon = np.arange(-180, 180, 0.01)

# Create a 2D array for temperature, here using a simple example like a sine function for variation
temperature = np.sin(np.sqrt(lat[:, np.newaxis]**2 + lon[np.newaxis, :]**2))

# Create an xarray Dataset
ds = xr.Dataset(
    {
        "TEMPERATURE": (["lat", "lon"], temperature)
    },
    coords={
        "lat": lat,
        "lon": lon
    }
)

# Display the created dataset
ds.to_netcdf("sample.nc")

with xr.open_dataset("sample.nc") as ds:
        cropped_ds=ds.sel(lon=slice(-95,-94), lat=slice(30,28))
os.remove("sample.nc")

I can delete the file. But when I try to do the same for the data I am working on, it throws an error. Thus, I am adding the link to the file, which is open-source data downloaded from Copernicus Land Services. The following is the code that I used that gave out the error.

filePath=r"c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc"
with xr.open_dataset(filePath) as ds:
        print(ds)
os.remove("c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc")

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc'

Some things that I have found:

The file is about 1.5 GB but if I try to write the file again, you get a warning about file size being about 5.3 GB

filePath=r"c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc"
with xr.open_dataset(filePath, engine="netcdf4") as ds:
        print(ds)
ds.to_netcdf("sample2.nc")

MemoryError: Unable to allocate 5.30 GiB for an array with shape (47040, 120960) and data type bool

The dataset I created at the very beginning also requires about 5GB of space and the code executed without any issues. If I don't specify the engine, it issues an error message saying 22 GB of space was required.

I can process the data, save it to a different file, and delete the newly saved file.

filePath=r"c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc"
with xr.open_dataset(filePath) as ds:
        cropped_ds=ds.sel(lon=slice(-95,-94), lat=slice(30,28))
cropped_ds.to_netcdf("sample3.nc")
with xr.open_dataset("sample3.nc") as ds:
        print(ds)
os.remove("sample3.nc")

Do tell me if you require further info.

max-sixty · 2024-08-30T23:07:26Z

I can delete the file. But when I try to do the same for the data I am working on, it throws an error.

That is quite surprising!

Without some repro that doesn't involve downloading 1.5G of data, it's unlikely to get much traction.

Does making a smaller-but-not-tiny file — say 150MB — trigger the error?

abinashmk · 2024-09-01T16:46:27Z

The size is not causing the problem. I tried creating large and small files (about 5GB). I could read and delete it without any issues. I even cropped the data for a particular region from the above file and saved it on a separate file. I could read it and delete it. I can't think of a way to reproduce the same error here.

max-sixty · 2024-09-01T20:00:34Z

OK so to confirm: this code fails for this specific file, but we can't find any other file where the problem occurs?

filePath=r"c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc"
with xr.open_dataset(filePath) as ds:
        print(ds)
os.remove("c_gls_LAI300-RT0_201712310000_GLOBE_PROBAV_V1.0.1.nc")

V surprising if so! Again my guess is that it's too specific a problem to get traction, but we can reopen if there's a more reproducible case...

abinashmk added the needs triage Issue that has not been reviewed by xarray team member label Aug 28, 2024

max-sixty added needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 28, 2024

max-sixty closed this as completed Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting a recently opened netCDF4 #9410

Deleting a recently opened netCDF4 #9410

abinashmk commented Aug 28, 2024 •

edited

Loading

welcome bot commented Aug 28, 2024

max-sixty commented Aug 28, 2024

abinashmk commented Aug 28, 2024

max-sixty commented Aug 28, 2024

abinashmk commented Aug 29, 2024 •

edited

Loading

max-sixty commented Aug 29, 2024

keewis commented Aug 29, 2024 •

edited

Loading

abinashmk commented Aug 30, 2024 •

edited

Loading

max-sixty commented Aug 30, 2024

abinashmk commented Sep 1, 2024

max-sixty commented Sep 1, 2024

Deleting a recently opened netCDF4 #9410

Deleting a recently opened netCDF4 #9410

Comments

abinashmk commented Aug 28, 2024 • edited Loading

welcome bot commented Aug 28, 2024

max-sixty commented Aug 28, 2024

abinashmk commented Aug 28, 2024

max-sixty commented Aug 28, 2024

abinashmk commented Aug 29, 2024 • edited Loading

max-sixty commented Aug 29, 2024

keewis commented Aug 29, 2024 • edited Loading

abinashmk commented Aug 30, 2024 • edited Loading

max-sixty commented Aug 30, 2024

abinashmk commented Sep 1, 2024

max-sixty commented Sep 1, 2024

abinashmk commented Aug 28, 2024 •

edited

Loading

abinashmk commented Aug 29, 2024 •

edited

Loading

keewis commented Aug 29, 2024 •

edited

Loading

abinashmk commented Aug 30, 2024 •

edited

Loading