-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Memory leak when setting Series value via __setitem__ #47172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Before 1.4 setitem probably wrote into the existing array, since 1.4 this always makes a copy. Could you try this without chained indexing? |
I think so too (maybe related: #43406?)
Could you elaborate on this? The following variants also have the same problem (but I think it is to be expected): Variant 1:
Variant 2:
If series are placed on a list (instead of using a dataframe), the memory usage also increases, but the increment is much smaller. I guess the former copied the whole dataframe in each iteration, this one copies only the selected series. (This example shows increasing memory usage in both
If a single series |
I meant using loc, e.g. |
The internal cache is not updated correctly when using chained indexing (since 1.4). I think we already have similar issues about that, could you check? |
Yes, both:
and
works fine. Actually I've used the solution you suggested to solve my issue before opening this bugreport. I've reported this as a bug only because I think that changing the values of the dataframe this way, even if it makes a copy, should not increase the memory usage further and further (seems like references are stuck somewhere). |
I'm sorry, I'm afraid I don't know enough about pandas to recognize which issues are really related. Another strange thing, if
|
several issue point back to #43406. The cache issue may have been referring to is #45684 which should be fixed. |
moving to 1.4.4 |
closed in #48215 |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
In pandas version
1.4.2
the memory usage increases at each iteration of the last for loop, outputs:Expected Behavior
In pandas version
1.3.5
the memory usage remains constant, output:Installed Versions
The text was updated successfully, but these errors were encountered: