Add support for batching (take two) #270

lantiga · 2020-02-15T19:43:52Z

This PR reopens #241, which was merged and then reverted.

Issues not covered by tests emerged and need to be resolved before we merge again.
Details will follow here.

Add support for inspection and eviction to queue Mock run info batching Mock run info batching Make TF tests work Add batching for ONNX and ONNX-ML Fix torch API, still WIP Fix torch backend Fixes after rebasing Add auto-batching to TFLite backend Fix from rebase Add batching args to command and change API accordingly Add batching heuristics [WIP] Fix TFLite test by accessing first tensor in first batch safely Temporarily comment out wrong_bg test check Implement batching heuristics Introduce autobatch tests, tflite still fails Fix segfault when error was generated from the backend Fix tflite autobatch test Updated documentation with auto batching Remove stale comments Avoid making extra copies of inputs and outputs when batch count is 1 Address review comments re const-correctness Add tests to detect failures Fix slicing and concatenation Fix tensor slicing and concatenating Temporarily disable tflite autobatch test due to tflite limitation Disable support for autobatching for TFLITE

codecov · 2020-03-08T22:17:45Z

Codecov Report

Merging #270 into master will increase coverage by 2.51%.
The diff coverage is 88.67%.

@@            Coverage Diff             @@
##           master     #270      +/-   ##
==========================================
+ Coverage   53.28%   55.79%   +2.51%     
==========================================
  Files          25       25              
  Lines        4634     5022     +388     
==========================================
+ Hits         2469     2802     +333     
- Misses       2165     2220      +55

Impacted Files	Coverage Δ
src/redisai.h	`0% <0%> (ø)`	⬆️
src/err.c	`91.66% <100%> (+0.36%)`	⬆️
src/backends.c	`62.8% <100%> (ø)`	⬆️
src/backends/tflite.c	`72.07% <56.52%> (-13.83%)`	⬇️
src/backends/onnxruntime.c	`69.1% <88.76%> (+4.02%)`	⬆️
src/model.c	`66.66% <90.16%> (+3.44%)`	⬆️
src/redisai.c	`76.57% <92%> (+2.05%)`	⬆️
src/backends/tensorflow.c	`70.68% <95%> (+2.93%)`	⬆️
src/backends/torch.c	`84.66% <95.12%> (+2.02%)`	⬆️
src/tensor.c	`80.84% <95.91%> (+2.05%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62b2fa2...04073b5. Read the comment docs.

lantiga · 2020-03-08T22:20:10Z

Todo:

Valgrind run (see also memleak on RAI_ModelRunCtxAddBatch [batching PR] #274)
Diagnose and fix crash report (block MODELRUN, kill client, next request -> crash)

lantiga · 2020-03-14T21:53:19Z

Valgrind looks legit, I can't reproduce the crash anymore.
@filipecosta90 ready to review and merge

lantiga · 2020-03-20T12:47:08Z

@filipecosta90 @hhsecond ready for review/approval

* test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]>

lantiga · 2020-03-28T19:35:21Z

I fixed a last (embarassing) bug in queue eviction (#310).
@hhsecond @filipecosta90 I'll be waiting for a last check before merging, let's try to make it happen as soon as we can.

hhsecond

look good to me

Add support for batching (take two) (#270) * Add support for automated batching Add support for inspection and eviction to queue Mock run info batching Mock run info batching Make TF tests work Add batching for ONNX and ONNX-ML Fix torch API, still WIP Fix torch backend Fixes after rebasing Add auto-batching to TFLite backend Fix from rebase Add batching args to command and change API accordingly Add batching heuristics [WIP] Fix TFLite test by accessing first tensor in first batch safely Temporarily comment out wrong_bg test check Implement batching heuristics Introduce autobatch tests, tflite still fails Fix segfault when error was generated from the backend Fix tflite autobatch test Updated documentation with auto batching Remove stale comments Avoid making extra copies of inputs and outputs when batch count is 1 Address review comments re const-correctness Add tests to detect failures Fix slicing and concatenation Fix tensor slicing and concatenating Temporarily disable tflite autobatch test due to tflite limitation Disable support for autobatching for TFLITE * Fix TFLite and tests after rebase * Temporarily disable macos CI build * Add synchronization to autobatch tests * Add synchronization to autobatch thread * Add synchronization to autobatch thread * Batching crashtest (#310) * test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Sherin Thomas <[email protected]>

tnachen · 2020-05-12T19:55:50Z

src/backends/onnxruntime.c

+  status = ort->GetAllocatorWithDefaultOptions(&allocator);
  if (status != NULL) {
-    goto error;
+    return NULL;


Is there a reason returning a status is ignored but line 437 it isn't?

lantiga force-pushed the batching branch from 9346ca6 to 6cccd9d Compare March 8, 2020 22:02

Fix TFLite and tests after rebase

74dbbb5

lantiga requested review from hhsecond and filipecosta90 March 14, 2020 21:57

lantiga added 5 commits March 14, 2020 23:34

Temporarily disable macos CI build

5345920

Merge branch 'master' into batching

80a974e

Add synchronization to autobatch tests

c5f56c8

Add synchronization to autobatch thread

52b7e4a

Add synchronization to autobatch thread

9ab3cec

Sherin Thomas and others added 2 commits March 28, 2020 02:45

Batching crashtest (#310)

d468198

* test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]>

Merge branch 'master' into batching

04073b5

lantiga added the waiting for feedback label Mar 28, 2020

hhsecond approved these changes Mar 30, 2020

View reviewed changes

lantiga merged commit 8752589 into master Mar 30, 2020

tnachen reviewed May 12, 2020

View reviewed changes

gkorland deleted the batching branch October 6, 2020 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for batching (take two) #270

Add support for batching (take two) #270

Uh oh!

lantiga commented Feb 15, 2020

Uh oh!

codecov bot commented Mar 8, 2020 •

edited

Loading

Uh oh!

lantiga commented Mar 8, 2020 •

edited

Loading

Uh oh!

lantiga commented Mar 14, 2020

Uh oh!

lantiga commented Mar 20, 2020

Uh oh!

lantiga commented Mar 28, 2020

Uh oh!

hhsecond left a comment

Uh oh!

tnachen May 12, 2020

Uh oh!

Uh oh!

Add support for batching (take two) #270

Add support for batching (take two) #270

Uh oh!

Conversation

lantiga commented Feb 15, 2020

Uh oh!

codecov bot commented Mar 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lantiga commented Mar 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lantiga commented Mar 14, 2020

Uh oh!

lantiga commented Mar 20, 2020

Uh oh!

lantiga commented Mar 28, 2020

Uh oh!

hhsecond left a comment

Choose a reason for hiding this comment

Uh oh!

tnachen May 12, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 8, 2020 •

edited

Loading

lantiga commented Mar 8, 2020 •

edited

Loading