You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using xarray to read very large NetCDF files (~47G).
Then I need to write the data to a postgres DB. I have tried parsing the array and using an INSERT for every row, but this is taking a very long time (weeks).
I have read that bulk insert would be a lot faster, so I am looking for a solution along those lines.
I also saw that Pandas has a DataFrame.to_sql() function and xarray has Dataset.to_dataframe() function, so I was trying out this approach. However, when trying to convert my xarray Dataset to a Pandas Dataframe, I ran out of memory quickly.
Is this expected behavior? If so can you suggest another solution to this problem?
The text was updated successfully, but these errors were encountered:
I don't have that much experience here so I'll let other chime in. You could try chunking to pandas and then to Postgres (but you'll always be limited by memory with pandas). If there's a NetCDF -> tabular connector, that would allow you to operate beyond memory.
Then I need to write the data to a postgres DB. I have tried parsing the array and using an INSERT for every row, but this is taking a very long time (weeks).
I'm not a particular expert on postgres but I suspect it indeed has some sort of bulk insert facilities.
However, when trying to convert my xarray Dataset to a Pandas Dataframe, I ran out of memory quickly.
If you're working with a 47GB netCDF file, you probably don't have a lot of memory to spare. Often pandas.DataFrame objects can use significantly more memory than xarray.Dataset, especially keeping in mind that an xarray Dataset can lazily reference data on disk but a DataFrame is always in memory. The best strategy is probably to slice the Dataset into small pieces and to individually convert those.
Python version:3
xarray version: 0.9.6
I am using xarray to read very large NetCDF files (~47G).
Then I need to write the data to a postgres DB. I have tried parsing the array and using an INSERT for every row, but this is taking a very long time (weeks).
I have read that bulk insert would be a lot faster, so I am looking for a solution along those lines.
I also saw that Pandas has a DataFrame.to_sql() function and xarray has Dataset.to_dataframe() function, so I was trying out this approach. However, when trying to convert my xarray Dataset to a Pandas Dataframe, I ran out of memory quickly.
Is this expected behavior? If so can you suggest another solution to this problem?
The text was updated successfully, but these errors were encountered: