Skip to content

Make RepositoryItemWriter use CrudRepository#saveAll by default #3720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fmbenhassine opened this issue May 28, 2020 · 6 comments
Closed

Make RepositoryItemWriter use CrudRepository#saveAll by default #3720

fmbenhassine opened this issue May 28, 2020 · 6 comments

Comments

@fmbenhassine
Copy link
Contributor

fmbenhassine commented May 28, 2020

As of v4.2.2, the javadocs of RepositoryItemWriter state that the performance of the writer is determined by the performance of CrudRepository#saveAll. However, the implementation does not call CrudRepository#saveAll, but uses a for loop in which the selected method is performed for each item.

This means that even if I want to use the saveAll method by setting setMethodName("saveAll"), the saveAll method will be called for each item and not only once for all items. To use the saveAll method, one needs to extend the writer and override doWrite which is not convenient. Moreover, there is no validation that a method name is provided currently.

I understand that the motivation behind this "methodName" parameter is to allows users to select any method that might take a single item as a parameter and not a list (like update(item) or save(item), etc), but I think it would be better to default to using saveAll. In fact, the whole intent of the ItemWriter concept in the first place is bulk updates, by making it operate on a list of items by design.

Using saveAll instead of save is 2x faster according to my first benchmark. I believe the cost of creating a MethodInvoker and calling the method via reflection + not using a bulk operation is the root of this performance penalty.

My suggestion here is to use saveAll by default (this will be consistent with Javadoc, which is not the case at the moment) and use the current behaviour if a method name is provided. Note that the RepositoryItemWriterBuilder enforces that a method name is provided (which is a good point but not consistent with the writer that does not perform any validation), but this constraint can be relaxed to make things consistent and to benefit from the performance boost by default.

@fmbenhassine fmbenhassine added this to the 4.3.0 milestone May 28, 2020
@fmbenhassine fmbenhassine self-assigned this May 28, 2020
@fmbenhassine fmbenhassine removed the status: waiting-for-triage Issues that we did not analyse yet label May 28, 2020
@parikshitdutta
Copy link
Contributor

Hi @benas, would you want me to look into it?

fmbenhassine added a commit to fmbenhassine/spring-batch that referenced this issue May 28, 2020
@fmbenhassine
Copy link
Contributor Author

fmbenhassine commented May 28, 2020

@parikshitdutta I have already started working on this (see fc9de63), that's why I assigned it to myself. To avoid duplicate efforts, you can take something else from the 4.3 milestone. No need to ask, you can take anything that is unassigned and has no PR associated with it. Thank you upfront!

@parikshitdutta
Copy link
Contributor

parikshitdutta commented May 29, 2020

Hi @benas , thank for informing, indeed it would help to avoid duplicate effort.
However I generally reconfirm by asking in case priorities has shifted, and separately I don't see a way to assign issues to myself to mark something I am working on.

Since I am putting my out of office hours or weekends, I want to be efficient with things as much possible.

@fmbenhassine fmbenhassine changed the title Make RepositoryItemWriter use saveAll by default Make RepositoryItemWriter use CrudRepository#saveAll by default May 29, 2020
@cfbo
Copy link

cfbo commented Jul 9, 2020

Hi @benas, I am using RepositoryItemWriter and was trying to improve performance. I tried to upgrade to 4.3.0-M1 which contains this change but I did not notice any performance improvement. I then started digging down a bit and - maybe I didn't get the whole thing - but I have to say I am not sure this change is going to make any actual improvement compare to the previous implementation.
You have added the usage of CrudRepository#saveAll in place of the existing for loop. However the default implementation of CrudRepository#saveAll is SimpleJpaRepository and has very similar for loop and that's why I don't think this change does make a difference. I guess it also all depends from the transactional behaviour. The for loop in doWrite executes within a transaction, so everything is committed at the end anyway, just like in CrudRepository#saveAll.
Again, maybe I am missing something. If instead I am right, then it might be better to revert this change I reckon.
Thanks

@fmbenhassine
Copy link
Contributor Author

@cfbo Thank you for your feedback. There are two things:

  • The item writer was creating a MethodInvoker and calling the supplied method via reflection on each item even if the supplied method is saveAll (meaning that saveAll was called multiple times on each item and not once for all items.. and that's for every chunk). The change applied here calls saveAll directly on the repository (not through reflection) only once per chunk with all items as a single parameter, that's a big difference.
  • As mentioned in the Javadoc, the performance of the writer is determined by the performance of the repository itself. Note how the writer works against the interface and not a particular implementation. So if the default SimpleJpaRepository loops over items and calls save, you can use another repository with a smarter implementation that uses bulk updates and you would get better performance.

Moreover, such performance improvements are not very significant on small data sets. You did not mention your input size but you should try with a large number of items (1M+) to see some speed-up.

@cfbo
Copy link

cfbo commented Jul 10, 2020

@benas Thanks for your reply. That makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants