-
Notifications
You must be signed in to change notification settings - Fork 276
Description
So I just stumbled upon the following. If you run this code snippet twice:
import axelrod as axl
import numpy as np
axl.seed(1)
players = [s() for s in axl.demo_strategies]
tournament = axl.Tournament(
players=players,
turns=200,
repetitions=5
)
results = tournament.play(processes=0)
list(map(np.mean, results.normalised_scores))
You will not get the same results.
axl.demo_strategies
include axl.Random
so this is due to the seed not being effective.
If you run the same code but in series multiple times:
import axelrod as axl
import numpy as np
axl.seed(1)
players = [s() for s in axl.demo_strategies]
tournament = axl.Tournament(
players=players,
turns=200,
repetitions=5
)
results = tournament.play()
list(map(np.mean, results.normalised_scores))
then you do get the same results.
This is because the axl.seed(1)
becomes redundant as jobs don't all necessarily finish in the same order and so are not necessarily run in the same order thus offsetting the random sequence.
Possible fix? (Not sure)
I can think of one idea towards a fix which involves sampling random integers in the "parent" tournament process and passing those to matches which would each set their own seed.
Modifying the match generator to do something like:
def build_match_chunks(self):
"""
A generator that returns player index pairs and match parameters for a
round robin tournament.
Yields
-------
tuples
((player1 index, player2 index), match object)
"""
if self.edges is None:
edges = complete_graph(self.players)
else:
edges = self.edges
for index_pair in edges:
match_params = self.build_single_match_params()
yield (index_pair, match_params, self.repetitions, random.integer()). # Adding the random integer here
and modifying the Match
to take a seed (and set the seed):
class Match(object):
"""The Match class conducts matches between two players."""
def __init__(
self,
players,
turns=None,
prob_end=None,
game=None,
deterministic_cache=None,
noise=0,
match_attributes=None,
reset=True,
seed=seed,
):
Good points:
- [If this works] This would ensure that there are no differences in results from using multi processes or series.
- If a seed is set (like in the code above) then the random integers generated by the "parent" process would be the same;
- If a seed is not set then the "parent" process would pass "random" seeds to the matches so not reproducible.
Problem/question:
I think that the only way to do this efficiently is to sample the "parent" random numbers in the generator but if I'm not mistaken that will still be affected by the offset seedings (when a match sets a seed that will offset for the parent):
- Can we "copy" the random module so we can have two random sequences on the go? (One for the parent process and the other for the matches?). (We could potentially implement our our Mersenne twister but that sound idiotic)
- Perhaps using
numpy.random.randint()
could be used to sample all random seeds needed "efficiently". This has the downside to needing to know how many matches we want. In practice this isn't a problem, in theory we've tried to keep things so that the generators did not need this information...
Any other/better ideas?
Whatever the fix we should include something like the original code snippet as a test. If we can't fix it we should at least document this as a downside to parallelisation...