Description
There is a desire to have a generic stream abstraction in nmigen.lib
which can be used / presented by the nmigen-stdio
cores (among others). This issue exists to capture discussion around the design of such an abstraction.
Obvious places to look for inspiration here include AXI4-Stream and Avalon-ST. Wishbone, although common in OSHW designs, does not define a specific streaming mode and thus is quite complex for the streaming use case. It's also underspecified, so "Wishbone compatibility" is less meaningful than we might hope.
AXI4-Stream uses a simple ready-valid handshake for flow control, which is non-optional. It requires all transfers to be integer multiples of 8 bits, and supports both "packet" and "frame" abstractions for higher-level structure over the octet stream. It also supports sparse data streams and multiplexing multiple logical streams onto a single physical stream (see TID
). All of these elements except for the "valid" half of the handshake are, technically speaking, optional. Anecdotally, I found the spec quite readable, although of course implementation may be another matter.
Avalon-ST is quite similar to AXI4-Stream with the following notable differences:
- No formal abstraction above "packet" (no "frame")
- Support for reporting errors
- Symbols need not be 8 bits
- Support for "ready latency" -
ready
is asserted zero or more cycles before the sink is actually ready to accept data - and "ready allowance" - each assertion ofready
allows the source to send zero or more beats afterready
is deasserted
Anecdotally, I found the Avalon-ST spec less readable than the AXI4-Stream spec, but only slightly.
My first-pass suggestion is a design much like Avalon-ST, but fixing the value of the beatsPerCycle
attribute to 1 and readyLatency
and readyAllowance
to 0.
Activity
whitequark commentedon Feb 7, 2020
I think it would be very useful to have cheap adapters to/from both AXI4-Stream and Avalon-ST, so it makes sense to restrict the features provided by nMigen streams to a subset supported by both. For example, I think we should not provide error reporting. One exception is that non integer multiple of octet wide streams are extremely useful, and the library primitive should not be restricted only to
8*n
data widths.cc @enjoy-digital @mithro
awygle commentedon Feb 7, 2020
I agree with this.
I don't _dis_agree with this, but I did want to point out that it seems very possible to use AXI4-Stream's
TUSER
signals for error reporting, if we mandated that cores do so. I do think we shouldn't worry about error reporting initially (it can always be added later) but don't see the feature-subset argument as persuasive.I think I'm slightly less enthused about non-octet data streams than you are, but it seems fairly easy to build this capability on top of AXI4-Stream for an adapter if need be (it will just be a slightly more complex adapter). So, no objections.
A motivating example (or more than one) seems like it would be useful for this discussion - it might help center our thinking.
whitequark commentedon Feb 7, 2020
In my view, non-octet data streams are not particularly inherently valuable, but their value comes from not imposing restrictions on user code. Consider an emitter of raw 8b10b symbols from a SERDES. Such an emitter would very naturally use a 10 bit wide stream, could be connected to a 10 bit wide FIFO, or a 10→20 bit upconverter, etc.
mithro commentedon Feb 7, 2020
Couple of random thoughts - not really all that coherent;
start of
packet / frame / etc type synchronisation signal has been super useful for Ethernet, video stuff, etc.enjoy-digital commentedon Feb 7, 2020
If that can help, in LiteX we have something with LiteX's stream (https://github.com/enjoy-digital/litex/blob/master/litex/soc/interconnect/stream.py) that seems very similar in the idea to what you want to achieve. This is is basically a mix between AXI-4/Avalon-ST: with only
LiteX's stream can be directly mapped to AXI-4 and adapted easily to AvalonST: https://github.com/enjoy-digital/litex/blob/master/litex/soc/interconnect/avalon.py. I really don't find AvalonST convenient to use in systems due to the latency between valid/ready which make creating and and debugging cores complicated. So for nMigen's stream, you should probably reuse more from AXI-4 than AvalonST and just provide adapters to AXI-4 (which should be almost direct mapping) and AvalonST (which should only be adaptation of the valid/ready latency).
enjoy-digital commentedon Feb 7, 2020
In addition to my previous comment: I just wanted to share some feedback of what is already there in LiteX and has already been very useful: stream can simplify things a lot! With a good library of base elements: Converter/FIFOs/Mux/Demux/Gearbox/etc... + #213 on top, it could becomes very powerful! I'm not able to be heavily involved in nMigen development (due to others obligations), so just wanted to share some feedback. If you end up having something close to what is already in LiteX and better implemented, i'll probably be a user of that in the future.
whitequark commentedon Feb 7, 2020
LiteX has certainly been an inspiration for this. I suspect the final design in nMigen would be close to it conceptually, with many of the changes addressing the experience of using LiteX streams for all these years.
awygle commentedon Feb 20, 2020
Taking a pass at @mithro 's comments (inline below):
It's not clear to me that streams have anything to do with resets, or indeed that they necessarily have state to be reset.
I propose that we mandate a stream's endpoints be in the same clock domain. We can use a wrapper around an AsyncFIFO to cross streams between clock domains as needed (terminate stream A in clock domain A, initiate stream B in clock domain B).
As far as rates go, that's an interesting question, as is how we'd go about multiplexing multiple streams onto one faster/wider stream. This requires more thought.
I'm not sure about pipelines, but I see value in Stream-interface wrappers around FIFOs of both types. I don't think streams should inherently contain FIFOs or anything like that.
Agreed, we should support this (optionally, I would argue).
This is basically the TUSER stuff from AXI4-Stream. I think we should start without this, and if we see it as a big weakness, add it later. In general I think we should be very conservative about adding features to streams - we should start with the simplest thing that can possibly work.
This is interesting but I'm not sure it actually has much to do with streams? Unless you're saying you want typechecking on the payload types of streams which are connected together, which does sound like a good thing to have.
anuejn commentedon Feb 3, 2021
I began to prototype Stream abstractions for nMigen to get a better feeling of how this could work both from a conceptual point and how the concrete ergonomics couold be like. Here I will link to some code (that you can read if you are into that or just skip) and try to sum up some of the things I learned from this.
Based on the discussions in here I went with a design that has AXI-Stream like
ready
/valid
signals & a payload.Then there are some specializations of that basic stream that have metadata signals. E.g. there is a
PacketizedStream
that has an AXI-Stream likelast
signal that indicates the end of a packet. For Image data (I am trying to write gateware for a camera, so this is one of my main use cases) I have anImageStream
withframe_last
andline_last
signals.Based on this design I wrote a few cores / wrappers for nmigen standard library cores: FIFOs, a Gearbox, cores for hdmi output and hispi input, a AXI DRAM reader & writer, kernel-like image processing (usage example), a Huffman encoder & the corresponding bit packing.
So here are some things I liked, disliked & learned from my prototype:
valid
must not dependcombinatorialy onready
is quite handy. I found myself building cores that are not compatible with each and deadlock other because I didn't follow this rule.id
(like a "write address" and a "write data" stream).ImageStream
andPacketizedStream
) feels somewhat inelegant but worked okay for me and allowed me to reuse some blocks nicely.first
signaling rather thanlast
signaling. However, I haven't found a way that convinced me yet (and preserves the ability to have generic metadata signals with specific names).alanvgreen commentedon Jul 12, 2021
@enjoy-digital You wrote that parameters can evolve "at each start of packet" Did you mean frame instead of packet?
Also, how useful have parameters been in practice? Do you think they form part of a stream specification, or would it be better to decouple them?
alanvgreen commentedon Aug 2, 2021
Here's a list of Streams in nmigen I've found:
alanvgreen commentedon Aug 16, 2021
After reading through this issue I made a streams implementation as part of one of the accelerators on CFU-Playground. I'm not completely happy with it, but after using it and, and looking over other implementations, I have many opinions.
Minimum Viable Stream API
A minimum stream API would include:
payload
: transferring this data is the whole point.valid
andready
: for flow control.Most of the useful functionality built with streams uses just these three signals.
It is useful to allow
payload
s to be either single Signals or records of Signals as the application requires. In the LiteX codebase, an interesting example is the pair of WishboneDMAReader and [WishboneDMAWriter]. The reader uses two streams of Signal()s: one for address, the other for data, while the writer uses a single stream with (addr, data) both encoded in the stream.On the Meaning of valid and ready
There are plenty of valid that an endpoint might either ignore valid or ready or assert them constantly. For example, a video PHY may need to consume data every single clock cycle in order to generate a correct video signal. It might therefore have a sink that always asserts ready, and always reads the payload regardless of the state of valid.
Endpoints like this should be explicitly allowed, so long as the behavior is well documented.
On the Use of Frames
LiteX includes first and last signals in its streams, which define frames. I have many opinions:
Source vs Sink is confusing
It can be difficult to write or understand code using streams. When using components with a stream interface, data goes into sink and comes out of a source, but when implementing those components the programmer sees the other side of the endpoint and, data comes out of the sink and goes into the source.
To reduce this confusion, I have had some success:
I see the Luna codebase addresses some of these issues by paying careful attention to naming with
streameq()
andattach()
as aliases forconnect()
.Other observations
valid
,ready
,payload
andlast
- is still obviously useful.Finally, here's that diagram:
rroohhh commentedon Aug 16, 2021
@alanvgreen Some things that came to mind:
I agree with your assessment, that there are plenty of cases where an something might want to use a stream like interface without obeying all of the rules. However I think one has to be very careful when doing that, as it can deadlock the whole pipeline of streams.
A good idea here might be to use seperate types for streams with relaxed rules, so one has to explicitly insert some kind of "converter" to connect it to a "real" stream.
Furthermore, in naps it was very useful to formally check that the modules using streams actually obey the rules of the stream interface:
valid
not depending onready
valid
is asserted and untilready
is assertedWe were able to catch a number of bugs using these two simple checks (see here for the implementation)
Atleast
last
can be useful in cases where one buffers a number of data beats to then write them all out in one go.An example of this is the
DramPacketRingbufferStreamWriter
innaps
. It writes a stream of data to dram using the AXI interface. The AXI interface works in burst, meaning you can transmit a single base address which then gets used for a number of data words. To minimize the number of idle cycles on the AXI bus theDramPacketRingbufferStreamWriter
tries to create as big bursts as possible (16 words in AXI3). If however the total number is not divisable by16
this would normally cause some words to get stuck. With alast
signal however the core can flush the rest of the buffered data accordingly.(The implementation is quite convoluted, but most of this can be found here
But I think in general I agree,
last
is usually related to the payload, and there is not even a clear unique definition oflast
for many types for streams. For examplenaps
defines aImageStream
, which has both aline_last
(which is asserted on the last pixel of a single line in the image) andframe_last
(which is asserted on the last pixel of a whole frame). These are both very useful, for exampleline_last
can be useful in a variety of video specific cores (like resizing, downsampling, etc), andframe_last
can be very useful where whole images are needed (for example to switch to the next image buffer in memory after one image was written).This is something we tried in
naps
aswell, but actually found it more confusing down the line.Take for example this:
Inside a module one always wants to connect data coming from a
Sink
to aSource
(connect_upstream
) and external to a module one wants to do the opposite, connecting data coming from aSource
to aSink
. This constant switching between the directions was more confusing for us than simply having one type.I think all of the discussions in this issue were about AXI4-Stream and Avalon-ST, not the "full" AXI4 and Avalon. The AXI4-Stream and Avalon-ST variants are not a lot more heavy-weight than the proposal here.
18 remaining items