Setting a seed is ineffective when parallelising

So I just stumbled upon the following. If you run this code snippet twice:

```python
import axelrod as axl
import numpy as np
axl.seed(1)
players = [s() for s in axl.demo_strategies]
tournament = axl.Tournament(
     players=players,
     turns=200,
     repetitions=5
)
results = tournament.play(processes=0)
list(map(np.mean, results.normalised_scores))
```

You will not get the same results. 

`axl.demo_strategies` include `axl.Random` so this is due to the seed not being effective.

If you run the same code but in series multiple times:

```python
import axelrod as axl
import numpy as np
axl.seed(1)
players = [s() for s in axl.demo_strategies]
tournament = axl.Tournament(
     players=players,
     turns=200,
     repetitions=5
)
results = tournament.play()
list(map(np.mean, results.normalised_scores))
```

then you do get the same results.

This is because the `axl.seed(1)` becomes redundant as jobs don't all necessarily finish in the same order and so are not necessarily run in the same order thus offsetting the random sequence.

**Possible fix?** (Not sure)

I can think of one idea towards a fix which involves sampling random integers in the "parent" tournament process and passing those to matches which would each set their own seed.


Modifying the match generator to do something like:

```python
    def build_match_chunks(self):
        """
        A generator that returns player index pairs and match parameters for a
        round robin tournament.

        Yields
        -------
        tuples
            ((player1 index, player2 index), match object)
        """
        if self.edges is None:
            edges = complete_graph(self.players)
        else:
            edges = self.edges

        for index_pair in edges:
            match_params = self.build_single_match_params()
            yield (index_pair, match_params, self.repetitions, random.integer()). # Adding the random integer here

```

and modifying the `Match` to take a seed (and set the seed):

```python
class Match(object):
    """The Match class conducts matches between two players."""

    def __init__(
        self,
        players,
        turns=None,
        prob_end=None,
        game=None,
        deterministic_cache=None,
        noise=0,
        match_attributes=None,
        reset=True,
        seed=seed,
    ):

```

Good points:

- [If this works] This would ensure that there are no differences in results from using multi processes or series.
- If a seed is set (like in the code above) then the random integers generated by the "parent" process would be the same;
- If a seed is not set then the "parent" process would pass "random" seeds to the matches so not reproducible.

Problem/question:

I think that the only way to do this efficiently is to sample the "parent" random numbers in the generator but if I'm not mistaken that will still be affected by the offset seedings (when a match sets a seed that will offset for the parent):

- Can we "copy" the random module so we can have two random sequences on the go? (One for the parent process and the other for the matches?). (We could potentially implement our our Mersenne twister but that sound idiotic)
- Perhaps using `numpy.random.randint()` could be used to sample all random seeds needed "efficiently". This has the downside to needing to know how many matches we want. In practice this isn't a problem, in theory we've tried to keep things so that the generators did not need this information...

**Any other/better ideas?**

Whatever the fix we should include something like the original code snippet as a test. If we can't fix it we should at least document this as a downside to parallelisation...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setting a seed is ineffective when parallelising #1277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setting a seed is ineffective when parallelising #1277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions