AD testing

I'm not aware of an issue for this, so wanted to open one to capture thoughts.

The code for this is still not in DynamicPPL, it's hosted at https://github.com/penelopeysm/ModelTests.jl as it's easier for me to iterate on it there

Desiderata:

1. Each AD backend runs in its own CI.
2. For each AD backend, each model tested runs in its own process. This is pretty awkward. I think it basically means we need to have a shell script calling Julia.
3. The result of the job should be aggregated - if any model fails then the job should have a red cross
4. Output should specify benchmark time (if run successfully) & error (if not). When the jobs finish running, this info must be collated into a single csv and/or html on gh-pages i.e. this info must be easily available to end-user

Note: some of these are _difficult_ to do right. It may well be that we should sacrifice some of these points, or push them to later, just for the sake of getting _something_ out.

Bonus stretch goals:

1. Avoid recalculating the 'ground truth' with ForwardDiff for the same model multiple times.
2. Add links to existing GitHub issues when reasons for failing models are known.
3. Add ability to test on different varinfos

Additional details in https://github.com/TuringLang/DynamicPPL.jl/pull/799#issuecomment-2643772496

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AD testing #869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AD testing #869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions