Skip to content

AD testing #869

@penelopeysm

Description

@penelopeysm

I'm not aware of an issue for this, so wanted to open one to capture thoughts.

The code for this is still not in DynamicPPL, it's hosted at https://github.com/penelopeysm/ModelTests.jl as it's easier for me to iterate on it there

Desiderata:

  1. Each AD backend runs in its own CI.
  2. For each AD backend, each model tested runs in its own process. This is pretty awkward. I think it basically means we need to have a shell script calling Julia.
  3. The result of the job should be aggregated - if any model fails then the job should have a red cross
  4. Output should specify benchmark time (if run successfully) & error (if not). When the jobs finish running, this info must be collated into a single csv and/or html on gh-pages i.e. this info must be easily available to end-user

Note: some of these are difficult to do right. It may well be that we should sacrifice some of these points, or push them to later, just for the sake of getting something out.

Bonus stretch goals:

  1. Avoid recalculating the 'ground truth' with ForwardDiff for the same model multiple times.
  2. Add links to existing GitHub issues when reasons for failing models are known.
  3. Add ability to test on different varinfos

Additional details in #799 (comment)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions