-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Make RepositoryItemWriter use CrudRepository#saveAll by default #3720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @benas, would you want me to look into it? |
@parikshitdutta I have already started working on this (see fc9de63), that's why I assigned it to myself. To avoid duplicate efforts, you can take something else from the 4.3 milestone. No need to ask, you can take anything that is unassigned and has no PR associated with it. Thank you upfront! |
Hi @benas , thank for informing, indeed it would help to avoid duplicate effort. Since I am putting my out of office hours or weekends, I want to be efficient with things as much possible. |
Hi @benas, I am using RepositoryItemWriter and was trying to improve performance. I tried to upgrade to 4.3.0-M1 which contains this change but I did not notice any performance improvement. I then started digging down a bit and - maybe I didn't get the whole thing - but I have to say I am not sure this change is going to make any actual improvement compare to the previous implementation. |
@cfbo Thank you for your feedback. There are two things:
Moreover, such performance improvements are not very significant on small data sets. You did not mention your input size but you should try with a large number of items (1M+) to see some speed-up. |
@benas Thanks for your reply. That makes sense. |
Uh oh!
There was an error while loading. Please reload this page.
As of v4.2.2, the javadocs of
RepositoryItemWriter
state that the performance of the writer is determined by the performance ofCrudRepository#saveAll
. However, the implementation does not callCrudRepository#saveAll
, but uses afor
loop in which the selected method is performed for each item.This means that even if I want to use the
saveAll
method by settingsetMethodName("saveAll")
, thesaveAll
method will be called for each item and not only once for all items. To use thesaveAll
method, one needs to extend the writer and overridedoWrite
which is not convenient. Moreover, there is no validation that a method name is provided currently.I understand that the motivation behind this "methodName" parameter is to allows users to select any method that might take a single item as a parameter and not a list (like
update(item)
orsave(item)
, etc), but I think it would be better to default to usingsaveAll
. In fact, the whole intent of theItemWriter
concept in the first place is bulk updates, by making it operate on a list of items by design.Using
saveAll
instead ofsave
is 2x faster according to my first benchmark. I believe the cost of creating aMethodInvoker
and calling the method via reflection + not using a bulk operation is the root of this performance penalty.My suggestion here is to use
saveAll
by default (this will be consistent with Javadoc, which is not the case at the moment) and use the current behaviour if a method name is provided. Note that theRepositoryItemWriterBuilder
enforces that a method name is provided (which is a good point but not consistent with the writer that does not perform any validation), but this constraint can be relaxed to make things consistent and to benefit from the performance boost by default.The text was updated successfully, but these errors were encountered: