Process definition API

Hi,
I've discovered xarray-simlab recently and it has caught my attention.
The project looks very promising: I'm using more and more xarray in my models and a framework helping the assembling and running of models could help a lot my workflow.

These days, I mostly work with frequency-domain models, but I guess it would be easy to extend xarray-simlab to this kind of models. They are actually simpler than the time dependent ones.

After reading the doc and playing with some toy cases, I'd like to share some feedback about the API.
Hopefully, the issue tracker is not such a bad place for that.

I'm very impressed by the implementation, in particular the automatic assembling of the processes in a model. The integration with xarray is quite promising. The efforts on the documentation and the availability of the package are also great!
However, I am not totally convinced by the way the processes are defined.

Let's consider the example from the tutorial.
The `ProfileU` and `u_vars` abstractions were not very clear to me, so I rewrote them in the framework of conservation laws, with which I'm more familiar:
```python
@xs.process
class EulerForwardTimeIntegration:
    fluxes = xs.group('fluxes')
    u = xs.variable(dims='x', intent='inout', description='conserved quantity')

    def run_step(self, dt):
        self.sum_of_fluxes = dt * sum(self.fluxes)
                                    # ^ Note that you don't need a generator here

    def finalize_step(self):
        self.u += self.sum_of_fluxes

@xs.process
class UpwindAdvectionFlux:
    v = xs.variable(description='velocity', intent='in')
    dx = xs.foreign(UniformGrid1D, 'dx')

    u = xs.foreign(EulerForwardTimeIntegration, 'u', intent='in')
    flux = xs.variable(dims='x', intent='out', group='fluxes')

    def run_step(self, dt):
        if self.v > 0.0:
            self.flux = self.v * (np.roll(self.u, 1) - self.u)/self.dx
        else:
            self.flux = self.v * (self.u - np.roll(self.u, -1))/self.dx
```
The full code is [here.](https://gist.github.com/mancellin/d3836827f9f3dd0a86d5196dab770efb)

I love the modularity of the design, allowing me to easily replace the advection term by another one or to add more of them.

However, I was not able to easily modify the time integration process to have a second order scheme such as:
```python
@xs.process
class RK2TimeIntegration:
    fluxes = xs.group('fluxes')
    u = xs.variable(dims='x', intent='inout', description='conserved quantity')

    def run_step(self, dt):
        predicted_u = self.u + dt * sum(self.fluxes)
        # Missing something like: predicted_fluxes = Flux(predicted_u)
        self.next_u = (self.u + predicted_u)/2 + dt/2 * sum(predicted_fluxes)

    def finalize_step(self):
        self.u = self.next_u
```

To do that, I would have to create new advection flux classes:
* The explicit reference to `EulerForwardTimeIntegration` in the foreign variable of `UpwindAdvectionFlux` is not a big problem. It could be possible to define this kind of linking in another way, for instance at model creation.
* What bothers me more is that I would have to add in all `Flux` classes an expression mapping `predicted_u` to `predicted_flux` which would be the same as the one mapping `u` to `flux`. Or in other words, the process `UpwindAdvectionFlux` should be a function accepting other arguments than `u`.

And this last point illustrates my general feeling about the current API: most of the process classes could be rewritten as functions, and maybe (functional programming aficionado speaking) they should.
The wrapping around `attrs` is quite nice but the `xs.process` are not data holders but functions, and the `xs.variable` are functions inputs and outputs with annotations.

So, maybe the best way to define the processes is to define functions.
The metadata on the variables could be passed by the docstring or the type annotations, e.g.:
```python
def upwind_advection_flux(
        u: Variable(dims=("x",), name='u'),
        v: float,
        dx: float,
) -> Variable(dims=("x",), name='flux'):
    if v > 0.0:
        return v * (np.roll(u, 1) - u)/dx
    else:
        return v * (u - np.roll(u, -1))/dx
```
I made a mock-up of the whole model [here](https://gist.github.com/mancellin/b4647a99e8be7075d9a85aa6b02e46e2).

The code would be (in my humble opinion) both more flexible and more secure.
Indeed, the processes could be easily unit-tested, unlike the current process class.
I don't know exactly how the model builder works, but it should still have enough clues to link the processes together. 
Note finally that this framework would be easily extendable to other kind of models than the time-dependent ones.

These were my two cents. I'd be happy to discuss it further if you want.
Cheers,


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Process definition API #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Process definition API #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions