Skip to content

Reproducibility Bugs

Kjell Wooding edited this page Nov 16, 2021 · 7 revisions

Every time we encounter a reproducibility related bug/issue (in our attempts to reproduce others' work), we'll document it here. To the best of our abilities.

Documentation/Process bugs

  • (WHERE-DO-I-START) README doesn't tell me where to start.
  • (NOTEBOOK-ORDER) Ran notebooks out of order. No indication of where to start, or where to go next.
  • (VARIABLE-SCARCITY) A variable name was re-used (possibly as a result of copy/pasting code from elsewhere), setting up cognitive dissonance, or confusing code.
  • (COPIED-NOTEBOOK) Copied notebooks for code reuse, instead of generalizing to functions/module
  • (STALE-COMMENT) Markdown cell(s) copied and wrong. (e.g. comments in copied notebooks that weren't updated for how they were being used)
  • (EYEBALL-TEST) Only way to check if I got the same results was to compare against outputs in the original notebook and images (but the images didn't match because of randomness)
  • (TL;DR) Instructions for reproducing were confusing, hard to follow, or incomplete
  • (NO-DOCSTRING) Code is missing key usability documentation (a docstring, or its equivalent for the language in question)

Licenses

  • (NO-DATA-LICENSE) No data license.
  • (NO-CODE-LICENSE) No license on source code.

Environment Reproducibility

  • (NO-ENVIRONMENT-INSTRUCTIONS) Chicken and egg issue with environments. No environment.yml file or the like. (Even if there are some instructions in a notebook).
  • (NO-VERSION-PIN) Versions not pinned. E.g. uses a dev branch without a clear indication of when it became released.
  • (HARDCODED-PATH) A file contains a hardcoded path, so the project will not run elsewhere without manual editing
  • (IMPOSSIBLE-ENVIRONMENT) dependencies are not resolvable due to version clashes. (e.g. need <=0.48 and >=0.49)
  • (ARCH-DIFFERENCE) The same code runs differently on different architectures

Hidden State/Notebook non-linearity

  • (MISSING-STATE) Can't reproduce because of some missing state. e.g. cells were run out of sequence. Variable was changed but notebook wasn't rerun.

Randomness

  • (NONDETERMINISTIC) Behavior cannot be modeled or reproduced. (e.g. True Randomness)
  • (PRNG-FAIL) Code has a fixed pseudo-random seed, but results are still not reproducible
  • (NO-PRNG-SEED) No fixed random seed was used, or there was no option to use one
  • (UNLUCKY-SEED) A certain choice of random seed results in unexpected or pathological behavior; e.g. infinite loop

Data

  • (NO-DATA-HASH) No data versioning/hash. No way to tell if the data changed since it was originally accessed.
  • (DISORDERED-DATA) No fixed ordering in dataset generation: same contents, different order (so different hash/behavior)
  • (VERSION-DEPENDENT-HASH) Different versions of a container format (library) produce different hashes over identical raw data
Clone this wiki locally