Replicate results of run commands instead of verbatim #157

lantiga · 2019-07-06T23:08:42Z

Adresses #84.
Replication of MODELRUN and SCRIPTRUN commands will not lead to re-executing the computation on the replicas, but just to setting the outputs on the replicas. Tests now include using replicas.

lantiga · 2019-07-06T23:25:56Z

Tests are passing locally (macOS), while CI is failing with a message

redis.exceptions.ResponseError: cannot get tensor from empty key

it appears that the replica doesn't contain the replicated output as it should.

Does this ring any bell at the RLTest level or else? @MeirShpilraien @mnunberg @rafie

lantiga · 2019-07-09T16:15:06Z

I could replicate the issue locally. It doesn't always fail, but it fails more often than not.

lantiga · 2019-07-10T12:40:03Z

Update on this. Forgetting about RLTest for a second, here's a breakdown of the issue.

First checkout the branch and build. Then start a master and a replica (make sure there's no rdb lingering around)

rm dump.rdb
redis-server --loadmodule build/redisai.so

and in another terminal

redis-server --port 7777 --replicaof 127.0.0.1 6379 --loadmodule build/redisai.so

Now run

r = redis.Redis(host='localhost', port=6379)

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

ret = r.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

The script will fail at the last TENSORGET with

redis.exceptions.ResponseError: cannot get tensor from empty key

because there's no c on the replica.

The slave complains with

== CRITICAL == This replica is sending an error to its master: 'MULTI calls can not be nested' after processing the command 'multi'

In fact, logging from Redis internals, it looks like the master initiates the replica with the correct command args, but it only ends up sending MULTI and not the rest of the command.

If we repeat the above with SCRIPTRUN,

r = redis.Redis(host='localhost', port=6379)

script_filename = 'test/test_data/script.txt'

with open(script_filename, 'rb') as f:
    script = f.read()

r.execute_command('AI.SCRIPTSET', 'ket', 'CPU', script)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.SCRIPTRUN', 'ket', 'bar', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

ret = r.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)
    
r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

There's no error and the replica contains the right value.

The difference is that MODELRUN is blocking, and we call Replicate(...) upon unblocking, from the callback we pass to the BlockClient function.

Even worse than this, it looks like replication is somewhat stuck. If we set another tensor at a new key d after MODELRUN and try to get it from the replica, we still can't find it.

r = redis.Redis(host='localhost', port=6379)

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')
r.execute_command('AI.TENSORSET', 'd', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

ret = r.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

fails with

redis.exceptions.ResponseError: cannot get tensor from empty key

gkorland · 2019-07-10T13:31:27Z

https://github.com/antirez/redis/issues/6228

lantiga · 2019-10-13T23:04:14Z

@gkorland Now that the fix has been included in Redis unstable, should we wait for the next Redis release before merging this one? Or should we try to make things work in both cases by looking at the version of the Redis server? (if possible, I'm actually not sure how to do it)

K-Jo · 2019-11-14T12:19:03Z

@gkorland on which branches are the redis changes present? Will it be part of an upcoming 5 release or only be present in 6?

gkorland · 2019-11-21T10:50:44Z

@K-Jo should be on the new 5.0.7

lantiga · 2019-11-28T22:13:36Z

@rafie I wanted to test the fixes that ended up in Redis 5.0.7, and I rebased on master and changed Docker images to be based off 5.0.7, but apparently CircleCI is still on 5.0.5. I entered the CircleCII container and ran redis-server --version to find out:

Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=442b43d467cd2b03

Can you give me a hint on how to have the container updated? Thanks a lot in advance.

lantiga · 2019-12-04T16:04:23Z

Tests are passing with 5.0.7 :-) @gkorland @rafie @K-Jo

gkorland · 2019-12-05T01:43:43Z

Great! just please make sure you set the redis version in the RAMP file to 5.0.7

lantiga · 2019-12-05T10:33:03Z

@gkorland @rafie it should be enough to change the version number in this line
https://github.com/RedisAI/RedisAI/blob/master/ramp.yml#L8
to 5.0.7 right?

K-Jo · 2019-12-09T10:31:49Z

ramp.yml

@@ -5,7 +5,7 @@ description: Serving tensors and executing deep learning graphs
 homepage: https://oss.redislabs.com/redisai/
 license: GNU Affero General Public License v3.0
 command_line_args: ""
-min_redis_version: "5.0"
+min_redis_version: "5.0.7"
 min_redis_pack_version: "5.4"


@rafie @gkorland should this not be 5.4.11 ?

the pack version

You tell me :-)

By the way, I see the license is also outdated. I can update both, we go with 5.4.11 then?

Replicate results of run commands instead of verbatim (#157) * Replicate results of run commands instead of verbatim * Remove leftover ReplicateVerbatim * Add --use-slaves to test invocation * Fix rebase leftover * Bump Redis version in ramp file

lantiga requested review from mnunberg, hhsecond and MeirShpilraien July 6, 2019 23:10

lantiga added the blocked label Sep 21, 2019

lantiga removed the blocked label Oct 13, 2019

lantiga mentioned this pull request Nov 12, 2019

Append-Only Support for RedisAI #245

Closed

lantiga force-pushed the replicate_outputs branch from 21a2d2b to 8dd518d Compare November 28, 2019 21:42

lantiga added 4 commits December 4, 2019 16:27

Replicate results of run commands instead of verbatim

4a11948

Remove leftover ReplicateVerbatim

80ccc33

Add --use-slaves to test invocation

d21d58b

Fix rebase leftover

7878f35

lantiga force-pushed the replicate_outputs branch from 3c788b0 to 7878f35 Compare December 4, 2019 15:29

Bump Redis version in ramp file

8d592b1

lantiga requested a review from rafie December 5, 2019 22:05

rafie approved these changes Dec 6, 2019

View reviewed changes

lantiga merged commit 44d1c68 into master Dec 6, 2019

This was referenced Dec 6, 2019

Replicate results instead of commands in MODELRUN #84

Closed

Enable AOF #255

Merged

K-Jo reviewed Dec 9, 2019

View reviewed changes

gkorland deleted the replicate_outputs branch October 6, 2020 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replicate results of run commands instead of verbatim #157

Replicate results of run commands instead of verbatim #157

Uh oh!

lantiga commented Jul 6, 2019

Uh oh!

lantiga commented Jul 6, 2019

Uh oh!

lantiga commented Jul 9, 2019

Uh oh!

lantiga commented Jul 10, 2019 •

edited

Loading

Uh oh!

gkorland commented Jul 10, 2019

Uh oh!

lantiga commented Oct 13, 2019

Uh oh!

K-Jo commented Nov 14, 2019

Uh oh!

gkorland commented Nov 21, 2019

Uh oh!

lantiga commented Nov 28, 2019

Uh oh!

lantiga commented Dec 4, 2019

Uh oh!

gkorland commented Dec 5, 2019

Uh oh!

lantiga commented Dec 5, 2019

Uh oh!

K-Jo Dec 9, 2019

Uh oh!

K-Jo Dec 9, 2019

Uh oh!

lantiga Dec 9, 2019

Uh oh!

Uh oh!

Replicate results of run commands instead of verbatim #157

Replicate results of run commands instead of verbatim #157

Uh oh!

Conversation

lantiga commented Jul 6, 2019

Uh oh!

lantiga commented Jul 6, 2019

Uh oh!

lantiga commented Jul 9, 2019

Uh oh!

lantiga commented Jul 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkorland commented Jul 10, 2019

Uh oh!

lantiga commented Oct 13, 2019

Uh oh!

K-Jo commented Nov 14, 2019

Uh oh!

gkorland commented Nov 21, 2019

Uh oh!

lantiga commented Nov 28, 2019

Uh oh!

lantiga commented Dec 4, 2019

Uh oh!

gkorland commented Dec 5, 2019

Uh oh!

lantiga commented Dec 5, 2019

Uh oh!

K-Jo Dec 9, 2019

Choose a reason for hiding this comment

Uh oh!

K-Jo Dec 9, 2019

Choose a reason for hiding this comment

Uh oh!

lantiga Dec 9, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lantiga commented Jul 10, 2019 •

edited

Loading