Skip to content

Replicate results of run commands instead of verbatim #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 6, 2019

Conversation

lantiga
Copy link
Contributor

@lantiga lantiga commented Jul 6, 2019

Adresses #84.
Replication of MODELRUN and SCRIPTRUN commands will not lead to re-executing the computation on the replicas, but just to setting the outputs on the replicas. Tests now include using replicas.

@lantiga
Copy link
Contributor Author

lantiga commented Jul 6, 2019

Tests are passing locally (macOS), while CI is failing with a message

redis.exceptions.ResponseError: cannot get tensor from empty key

it appears that the replica doesn't contain the replicated output as it should.

Does this ring any bell at the RLTest level or else? @MeirShpilraien @mnunberg @rafie

@lantiga
Copy link
Contributor Author

lantiga commented Jul 9, 2019

I could replicate the issue locally. It doesn't always fail, but it fails more often than not.

@lantiga
Copy link
Contributor Author

lantiga commented Jul 10, 2019

Update on this. Forgetting about RLTest for a second, here's a breakdown of the issue.

First checkout the branch and build. Then start a master and a replica (make sure there's no rdb lingering around)

rm dump.rdb
redis-server --loadmodule build/redisai.so

and in another terminal

redis-server --port 7777 --replicaof 127.0.0.1 6379 --loadmodule build/redisai.so 

Now run

r = redis.Redis(host='localhost', port=6379)

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

ret = r.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

The script will fail at the last TENSORGET with

redis.exceptions.ResponseError: cannot get tensor from empty key

because there's no c on the replica.

The slave complains with

== CRITICAL == This replica is sending an error to its master: 'MULTI calls can not be nested' after processing the command 'multi'

In fact, logging from Redis internals, it looks like the master initiates the replica with the correct command args, but it only ends up sending MULTI and not the rest of the command.

If we repeat the above with SCRIPTRUN,

r = redis.Redis(host='localhost', port=6379)

script_filename = 'test/test_data/script.txt'

with open(script_filename, 'rb') as f:
    script = f.read()

r.execute_command('AI.SCRIPTSET', 'ket', 'CPU', script)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.SCRIPTRUN', 'ket', 'bar', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

ret = r.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)
    
r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'c', 'VALUES')
print(ret)

There's no error and the replica contains the right value.

The difference is that MODELRUN is blocking, and we call Replicate(...) upon unblocking, from the callback we pass to the BlockClient function.

Even worse than this, it looks like replication is somewhat stuck. If we set another tensor at a new key d after MODELRUN and try to get it from the replica, we still can't find it.

r = redis.Redis(host='localhost', port=6379)

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')
r.execute_command('AI.TENSORSET', 'd', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

ret = r.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

r2 = redis.Redis(host='localhost', port=7777)

ret = r2.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

fails with

redis.exceptions.ResponseError: cannot get tensor from empty key

@gkorland
Copy link
Contributor

@lantiga
Copy link
Contributor Author

lantiga commented Oct 13, 2019

@gkorland Now that the fix has been included in Redis unstable, should we wait for the next Redis release before merging this one? Or should we try to make things work in both cases by looking at the version of the Redis server? (if possible, I'm actually not sure how to do it)

@K-Jo
Copy link
Collaborator

K-Jo commented Nov 14, 2019

@gkorland on which branches are the redis changes present? Will it be part of an upcoming 5 release or only be present in 6?

@gkorland
Copy link
Contributor

@K-Jo should be on the new 5.0.7

@lantiga
Copy link
Contributor Author

lantiga commented Nov 28, 2019

@rafie I wanted to test the fixes that ended up in Redis 5.0.7, and I rebased on master and changed Docker images to be based off 5.0.7, but apparently CircleCI is still on 5.0.5. I entered the CircleCII container and ran redis-server --version to find out:

Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=442b43d467cd2b03

Can you give me a hint on how to have the container updated? Thanks a lot in advance.

@lantiga
Copy link
Contributor Author

lantiga commented Dec 4, 2019

Tests are passing with 5.0.7 :-) @gkorland @rafie @K-Jo

@gkorland
Copy link
Contributor

gkorland commented Dec 5, 2019

Great! just please make sure you set the redis version in the RAMP file to 5.0.7

@lantiga
Copy link
Contributor Author

lantiga commented Dec 5, 2019

@gkorland @rafie it should be enough to change the version number in this line
https://github.com/RedisAI/RedisAI/blob/master/ramp.yml#L8
to 5.0.7 right?

@lantiga lantiga requested a review from rafie December 5, 2019 22:05
@lantiga lantiga merged commit 44d1c68 into master Dec 6, 2019
@@ -5,7 +5,7 @@ description: Serving tensors and executing deep learning graphs
homepage: https://oss.redislabs.com/redisai/
license: GNU Affero General Public License v3.0
command_line_args: ""
min_redis_version: "5.0"
min_redis_version: "5.0.7"
min_redis_pack_version: "5.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafie @gkorland should this not be 5.4.11 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pack version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You tell me :-)

By the way, I see the license is also outdated. I can update both, we go with 5.4.11 then?

lantiga added a commit that referenced this pull request May 6, 2020
 
Replicate results of run commands instead of verbatim (#157)

* Replicate results of run commands instead of verbatim

* Remove leftover ReplicateVerbatim

* Add --use-slaves to test invocation

* Fix rebase leftover

* Bump Redis version in ramp file
@gkorland gkorland deleted the replicate_outputs branch October 6, 2020 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants