-
Notifications
You must be signed in to change notification settings - Fork 23
Reproducibility Bugs
Kjell Wooding edited this page Nov 16, 2021
·
7 revisions
Every time we encounter a reproducibility related bug/issue (in our attempts to reproduce others' work), we'll document it here. To the best of our abilities.
- (WHERE-DO-I-START) README doesn't tell me where to start.
- (NOTEBOOK-ORDER) Ran notebooks out of order. No indication of where to start, or where to go next.
- (VARIABLE-SCARCITY) A variable name was re-used (possibly as a result of copy/pasting code from elsewhere), setting up cognitive dissonance, or confusing code.
- (COPIED-NOTEBOOK) Copied notebooks for code reuse, instead of generalizing to functions/module
- (STALE-COMMENT) Markdown cell(s) copied and wrong. (e.g. comments in copied notebooks that weren't updated for how they were being used)
- (EYEBALL-TEST) Only way to check if I got the same results was to compare against outputs in the original notebook and images (but the images didn't match because of randomness)
- (TL;DR) Instructions for reproducing were confusing, hard to follow, or incomplete
- (NO-DOCSTRING) Code is missing key usability documentation (a docstring, or its equivalent for the language in question)
- (NO-DATA-LICENSE) No data license.
- (NO-CODE-LICENSE) No license on source code.
- (NO-ENVIRONMENT-INSTRUCTIONS) Chicken and egg issue with environments. No environment.yml file or the like. (Even if there are some instructions in a notebook).
- (NO-VERSION-PIN) Versions not pinned. E.g. uses a dev branch without a clear indication of when it became released.
- (HARDCODED-PATH) A file contains a hardcoded path, so the project will not run elsewhere without manual editing
- (IMPOSSIBLE-ENVIRONMENT) dependencies are not resolvable due to version clashes. (e.g. need <=0.48 and >=0.49)
- (ARCH-DIFFERENCE) The same code runs differently on different architectures
- (MISSING-STATE) Can't reproduce because of some missing state. e.g. cells were run out of sequence. Variable was changed but notebook wasn't rerun.
- (NONDETERMINISTIC) Behavior cannot be modeled or reproduced. (e.g. True Randomness)
- (PRNG-FAIL) Code has a fixed pseudo-random seed, but results are still not reproducible
- (NO-PRNG-SEED) No fixed random seed was used, or there was no option to use one
- (UNLUCKY-SEED) A certain choice of random seed results in unexpected or pathological behavior; e.g. infinite loop
- (NO-DATA-HASH) No data versioning/hash. No way to tell if the data changed since it was originally accessed.
- (DISORDERED-DATA) No fixed ordering in dataset generation: same contents, different order (so different hash/behavior)
- (VERSION-DEPENDENT-HASH) Different versions of a container format (library) produce different hashes over identical raw data