Description
Hi,
I've discovered xarray-simlab recently and it has caught my attention.
The project looks very promising: I'm using more and more xarray in my models and a framework helping the assembling and running of models could help a lot my workflow.
These days, I mostly work with frequency-domain models, but I guess it would be easy to extend xarray-simlab to this kind of models. They are actually simpler than the time dependent ones.
After reading the doc and playing with some toy cases, I'd like to share some feedback about the API.
Hopefully, the issue tracker is not such a bad place for that.
I'm very impressed by the implementation, in particular the automatic assembling of the processes in a model. The integration with xarray is quite promising. The efforts on the documentation and the availability of the package are also great!
However, I am not totally convinced by the way the processes are defined.
Let's consider the example from the tutorial.
The ProfileU
and u_vars
abstractions were not very clear to me, so I rewrote them in the framework of conservation laws, with which I'm more familiar:
@xs.process
class EulerForwardTimeIntegration:
fluxes = xs.group('fluxes')
u = xs.variable(dims='x', intent='inout', description='conserved quantity')
def run_step(self, dt):
self.sum_of_fluxes = dt * sum(self.fluxes)
# ^ Note that you don't need a generator here
def finalize_step(self):
self.u += self.sum_of_fluxes
@xs.process
class UpwindAdvectionFlux:
v = xs.variable(description='velocity', intent='in')
dx = xs.foreign(UniformGrid1D, 'dx')
u = xs.foreign(EulerForwardTimeIntegration, 'u', intent='in')
flux = xs.variable(dims='x', intent='out', group='fluxes')
def run_step(self, dt):
if self.v > 0.0:
self.flux = self.v * (np.roll(self.u, 1) - self.u)/self.dx
else:
self.flux = self.v * (self.u - np.roll(self.u, -1))/self.dx
The full code is here.
I love the modularity of the design, allowing me to easily replace the advection term by another one or to add more of them.
However, I was not able to easily modify the time integration process to have a second order scheme such as:
@xs.process
class RK2TimeIntegration:
fluxes = xs.group('fluxes')
u = xs.variable(dims='x', intent='inout', description='conserved quantity')
def run_step(self, dt):
predicted_u = self.u + dt * sum(self.fluxes)
# Missing something like: predicted_fluxes = Flux(predicted_u)
self.next_u = (self.u + predicted_u)/2 + dt/2 * sum(predicted_fluxes)
def finalize_step(self):
self.u = self.next_u
To do that, I would have to create new advection flux classes:
- The explicit reference to
EulerForwardTimeIntegration
in the foreign variable ofUpwindAdvectionFlux
is not a big problem. It could be possible to define this kind of linking in another way, for instance at model creation. - What bothers me more is that I would have to add in all
Flux
classes an expression mappingpredicted_u
topredicted_flux
which would be the same as the one mappingu
toflux
. Or in other words, the processUpwindAdvectionFlux
should be a function accepting other arguments thanu
.
And this last point illustrates my general feeling about the current API: most of the process classes could be rewritten as functions, and maybe (functional programming aficionado speaking) they should.
The wrapping around attrs
is quite nice but the xs.process
are not data holders but functions, and the xs.variable
are functions inputs and outputs with annotations.
So, maybe the best way to define the processes is to define functions.
The metadata on the variables could be passed by the docstring or the type annotations, e.g.:
def upwind_advection_flux(
u: Variable(dims=("x",), name='u'),
v: float,
dx: float,
) -> Variable(dims=("x",), name='flux'):
if v > 0.0:
return v * (np.roll(u, 1) - u)/dx
else:
return v * (u - np.roll(u, -1))/dx
I made a mock-up of the whole model here.
The code would be (in my humble opinion) both more flexible and more secure.
Indeed, the processes could be easily unit-tested, unlike the current process class.
I don't know exactly how the model builder works, but it should still have enough clues to link the processes together.
Note finally that this framework would be easily extendable to other kind of models than the time-dependent ones.
These were my two cents. I'd be happy to discuss it further if you want.
Cheers,