Skip to content

Stream Abstraction for amaranth.lib #317

Closed
@awygle

Description

@awygle

There is a desire to have a generic stream abstraction in nmigen.lib which can be used / presented by the nmigen-stdio cores (among others). This issue exists to capture discussion around the design of such an abstraction.

Obvious places to look for inspiration here include AXI4-Stream and Avalon-ST. Wishbone, although common in OSHW designs, does not define a specific streaming mode and thus is quite complex for the streaming use case. It's also underspecified, so "Wishbone compatibility" is less meaningful than we might hope.

AXI4-Stream uses a simple ready-valid handshake for flow control, which is non-optional. It requires all transfers to be integer multiples of 8 bits, and supports both "packet" and "frame" abstractions for higher-level structure over the octet stream. It also supports sparse data streams and multiplexing multiple logical streams onto a single physical stream (see TID). All of these elements except for the "valid" half of the handshake are, technically speaking, optional. Anecdotally, I found the spec quite readable, although of course implementation may be another matter.

Avalon-ST is quite similar to AXI4-Stream with the following notable differences:

  • No formal abstraction above "packet" (no "frame")
  • Support for reporting errors
  • Symbols need not be 8 bits
  • Support for "ready latency" - ready is asserted zero or more cycles before the sink is actually ready to accept data - and "ready allowance" - each assertion of ready allows the source to send zero or more beats after ready is deasserted

Anecdotally, I found the Avalon-ST spec less readable than the AXI4-Stream spec, but only slightly.

My first-pass suggestion is a design much like Avalon-ST, but fixing the value of the beatsPerCycle attribute to 1 and readyLatency and readyAllowance to 0.

Activity

whitequark

whitequark commented on Feb 7, 2020

@whitequark
Member

I think it would be very useful to have cheap adapters to/from both AXI4-Stream and Avalon-ST, so it makes sense to restrict the features provided by nMigen streams to a subset supported by both. For example, I think we should not provide error reporting. One exception is that non integer multiple of octet wide streams are extremely useful, and the library primitive should not be restricted only to 8*n data widths.

cc @enjoy-digital @mithro

awygle

awygle commented on Feb 7, 2020

@awygle
ContributorAuthor

I think it would be very useful to have cheap adapters to/from both AXI4-Stream and Avalon-ST, so it makes sense to restrict the features provided by nMigen streams to a subset supported by both.

I agree with this.

For example, I think we should not provide error reporting.

I don't _dis_agree with this, but I did want to point out that it seems very possible to use AXI4-Stream's TUSER signals for error reporting, if we mandated that cores do so. I do think we shouldn't worry about error reporting initially (it can always be added later) but don't see the feature-subset argument as persuasive.

One exception is that non integer multiple of octet wide streams are extremely useful, and the library primitive should not be restricted only to 8*n data widths.

I think I'm slightly less enthused about non-octet data streams than you are, but it seems fairly easy to build this capability on top of AXI4-Stream for an adapter if need be (it will just be a slightly more complex adapter). So, no objections.

A motivating example (or more than one) seems like it would be useful for this discussion - it might help center our thinking.

whitequark

whitequark commented on Feb 7, 2020

@whitequark
Member

I think I'm slightly less enthused about non-octet data streams than you are

In my view, non-octet data streams are not particularly inherently valuable, but their value comes from not imposing restrictions on user code. Consider an emitter of raw 8b10b symbols from a SERDES. Such an emitter would very naturally use a 10 bit wide stream, could be connected to a 10 bit wide FIFO, or a 10→20 bit upconverter, etc.

mithro

mithro commented on Feb 7, 2020

@mithro

Couple of random thoughts - not really all that coherent;

  • Think a lot about how streams and resets work.
  • Think about how things would work with source + sink in different clock domains or same clock domain but different rates (IE half rate, quarter rate, etc).
  • Think about how FIFOs and pipelines fit together with streams.
  • A start of packet / frame / etc type synchronisation signal has been super useful for Ethernet, video stuff, etc.
  • Other metadata about the data stream (IE pixel format) might want to be synced with either SOP or reset.
  • It would be good if stream endpoints had a Python (elaboration time?) negotiation of the stream. IE Two video cores connected by the stream might want to negotiate the pixel format so they don't need to insert data converters. This could potentially also apply to things like ready / valid type signals.
enjoy-digital

enjoy-digital commented on Feb 7, 2020

@enjoy-digital

If that can help, in LiteX we have something with LiteX's stream (https://github.com/enjoy-digital/litex/blob/master/litex/soc/interconnect/stream.py) that seems very similar in the idea to what you want to achieve. This is is basically a mix between AXI-4/Avalon-ST: with only

  • valid: similar to AXI-4's valid.
  • ready: similar to AXI-4's ready.
  • first(used for packets only when needed): similar to Avalon-ST's sop, because it's sometimes useful to just easily know the start of a packet instead of deducing it from last as done in AXI-4.
  • last (used for packets) : similar to AXI-4's last.
  • payload: a Record with its own layout, similar to AXI-4's DATA/STRB, that can evolves at each valid/ready transaction.
  • param (optional): a Record with its own layout, similar to AXI-4's USER, that can evolves at each start of packet.

LiteX's stream can be directly mapped to AXI-4 and adapted easily to AvalonST: https://github.com/enjoy-digital/litex/blob/master/litex/soc/interconnect/avalon.py. I really don't find AvalonST convenient to use in systems due to the latency between valid/ready which make creating and and debugging cores complicated. So for nMigen's stream, you should probably reuse more from AXI-4 than AvalonST and just provide adapters to AXI-4 (which should be almost direct mapping) and AvalonST (which should only be adaptation of the valid/ready latency).

enjoy-digital

enjoy-digital commented on Feb 7, 2020

@enjoy-digital

In addition to my previous comment: I just wanted to share some feedback of what is already there in LiteX and has already been very useful: stream can simplify things a lot! With a good library of base elements: Converter/FIFOs/Mux/Demux/Gearbox/etc... + #213 on top, it could becomes very powerful! I'm not able to be heavily involved in nMigen development (due to others obligations), so just wanted to share some feedback. If you end up having something close to what is already in LiteX and better implemented, i'll probably be a user of that in the future.

whitequark

whitequark commented on Feb 7, 2020

@whitequark
Member

With a good library of base elements: Converter/FIFOs/Mux/Demux/Gearbox/etc... + #213 on top, it could becomes very powerful!

LiteX has certainly been an inspiration for this. I suspect the final design in nMigen would be close to it conceptually, with many of the changes addressing the experience of using LiteX streams for all these years.

awygle

awygle commented on Feb 20, 2020

@awygle
ContributorAuthor

Taking a pass at @mithro 's comments (inline below):

Couple of random thoughts - not really all that coherent;

* Think a lot about how streams and resets work.

It's not clear to me that streams have anything to do with resets, or indeed that they necessarily have state to be reset.

* Think about how things would work with source + sink in different clock domains or same clock domain but different rates (IE half rate, quarter rate, etc).

I propose that we mandate a stream's endpoints be in the same clock domain. We can use a wrapper around an AsyncFIFO to cross streams between clock domains as needed (terminate stream A in clock domain A, initiate stream B in clock domain B).

As far as rates go, that's an interesting question, as is how we'd go about multiplexing multiple streams onto one faster/wider stream. This requires more thought.

* Think about how FIFOs and pipelines fit together with streams.

I'm not sure about pipelines, but I see value in Stream-interface wrappers around FIFOs of both types. I don't think streams should inherently contain FIFOs or anything like that.

* A `start of` packet / frame / etc type synchronisation signal has been super useful for Ethernet, video stuff, etc.

Agreed, we should support this (optionally, I would argue).

* Other metadata about the data stream (IE pixel format) might want to be synced with either SOP or reset.

This is basically the TUSER stuff from AXI4-Stream. I think we should start without this, and if we see it as a big weakness, add it later. In general I think we should be very conservative about adding features to streams - we should start with the simplest thing that can possibly work.

* It would be good if stream endpoints had a Python (elaboration time?) negotiation of the stream. IE Two video cores connected by the stream might want to negotiate the pixel format so they don't need to insert data converters. This could potentially also apply to things like ready / valid type signals.

This is interesting but I'm not sure it actually has much to do with streams? Unless you're saying you want typechecking on the payload types of streams which are connected together, which does sound like a good thing to have.

anuejn

anuejn commented on Feb 3, 2021

@anuejn
Contributor

I began to prototype Stream abstractions for nMigen to get a better feeling of how this could work both from a conceptual point and how the concrete ergonomics couold be like. Here I will link to some code (that you can read if you are into that or just skip) and try to sum up some of the things I learned from this.

Based on the discussions in here I went with a design that has AXI-Stream like ready / valid signals & a payload.
Then there are some specializations of that basic stream that have metadata signals. E.g. there is a PacketizedStream that has an AXI-Stream like last signal that indicates the end of a packet. For Image data (I am trying to write gateware for a camera, so this is one of my main use cases) I have an ImageStream with frame_last and line_last signals.
Based on this design I wrote a few cores / wrappers for nmigen standard library cores: FIFOs, a Gearbox, cores for hdmi output and hispi input, a AXI DRAM reader & writer, kernel-like image processing (usage example), a Huffman encoder & the corresponding bit packing.

So here are some things I liked, disliked & learned from my prototype:

  • Streams are a great abstraction and simplify thinking about breaking a problem up into smaller boxes for me a lot :)
  • AXIs rule that valid must not dependcombinatorialy on ready is quite handy. I found myself building cores that are not compatible with each and deadlock other because I didn't follow this rule.
  • For the AXI reader and writer, I was quite happy with the idea that a stream is basically anything that has AXI like ready / valid signals so that a AXI (not stream) interface can be represented as a bunch of (rather special) streams with many payload signals like id (like a "write address" and a "write data" stream).
  • One needs some way to create a stream that is shaped like another stream (e.g. in a FIFO core; to create the output stream that has the same signals as the input stream in a generic way)
  • Having adapters Between different kinds of streams (In my case ImageStream and PacketizedStream) feels somewhat inelegant but worked okay for me and allowed me to reuse some blocks nicely.
  • Sometimes I was finding myself wanting a stream that uses first signaling rather than last signaling. However, I haven't found a way that convinced me yet (and preserves the ability to have generic metadata signals with specific names).
  • I think I need a way to transport more than one word per clock cycle in the near future. Somehow that information (including maybe word enables?) should be a thing streams support. I do not know yet how that should work but doing space / time trade-offs is a thing in which I think streams can help?
  • Sometimes there are cores that have to consume / produce data at a fixed rate (e.g. hispi input or hdmi output). This can not really be expressed with these streams.
alanvgreen

alanvgreen commented on Jul 12, 2021

@alanvgreen
Contributor

@enjoy-digital You wrote that parameters can evolve "at each start of packet" Did you mean frame instead of packet?

Also, how useful have parameters been in practice? Do you think they form part of a stream specification, or would it be better to decouple them?

alanvgreen

alanvgreen commented on Aug 2, 2021

@alanvgreen
Contributor

Here's a list of Streams in nmigen I've found:

alanvgreen

alanvgreen commented on Aug 16, 2021

@alanvgreen
Contributor

After reading through this issue I made a streams implementation as part of one of the accelerators on CFU-Playground. I'm not completely happy with it, but after using it and, and looking over other implementations, I have many opinions.

Minimum Viable Stream API

A minimum stream API would include:

  • A payload: transferring this data is the whole point.
  • valid and ready: for flow control.

Most of the useful functionality built with streams uses just these three signals.

It is useful to allow payloads to be either single Signals or records of Signals as the application requires. In the LiteX codebase, an interesting example is the pair of WishboneDMAReader and [WishboneDMAWriter]. The reader uses two streams of Signal()s: one for address, the other for data, while the writer uses a single stream with (addr, data) both encoded in the stream.

On the Meaning of valid and ready

There are plenty of valid that an endpoint might either ignore valid or ready or assert them constantly. For example, a video PHY may need to consume data every single clock cycle in order to generate a correct video signal. It might therefore have a sink that always asserts ready, and always reads the payload regardless of the state of valid.

Endpoints like this should be explicitly allowed, so long as the behavior is well documented.

On the Use of Frames

LiteX includes first and last signals in its streams, which define frames. I have many opinions:

  1. Plumbing through first and last along with valid places a small but real burden on implementers of library classes to pass through (and test) these additional signals, which is error-prone. For example WishboneDMAReader passes along last, but not first.
  2. Not all endpoints use frames - the meaning of the first and last signals is tied to the meaning of the data in the payload.
  3. I do not recall seeing any example of library classes that are able to use the first or last signals in a meaningful way, other than to pass them on. This is probably related to the previous point.
  4. If it were decided to include frame signals in a standard nmigen stream library, I would argue for keeping last and removing first. first, if needed, can be regenerated from the last and valid signals with a single FF, but last cannot be generated from first and valid. (This might also address @anuejn's desire to have both first and last).
  5. My strong preference would be to not make first and last part of the interface, and let applications define it as part of the interface, when needed.

Source vs Sink is confusing

It can be difficult to write or understand code using streams. When using components with a stream interface, data goes into sink and comes out of a source, but when implementing those components the programmer sees the other side of the endpoint and, data comes out of the sink and goes into the source.

To reduce this confusion, I have had some success:

  1. Using separate types for Source and Sink, and using the type system to ensure that sources only connect to sinks and not any one of the three wrong ways that it can be done (source to source, sink to sink, and sink to source).
  2. Not exposing Record directly as a superclass of Source and Sink. LiteX's Endpoint inherits from Record, but this is just an implementation convenience - thinking of Endpoint as a subclass of Record is not helpful.
  3. Making a diagram to refer to while programming. It surprised me how much this helped, but it helped.

I see the Luna codebase addresses some of these issues by paying careful attention to naming with streameq() and attach() as aliases for connect().

Other observations

  1. In practice, LiteX streams are more interface than implementation. Vocabulary and usage conventions are as important as the code. I think this is one reason why it is hard to write an RFC - the code doesn't do much at all.
  2. AXI4 and Avalon streams are relativly heavy-weight and difficult to use. They appear to be designed for connecting large chunks of IP (hundreds or thousands of lines of RTL) together, whereas LiteX streams are more focussed on connecting smaller pieces of IP (tens of lines of RTL).
  3. LiteX streams are a useful convention when designing and coding. I can compare code I wrote before and after adopting streams, and the after code is easier to understand; eg. this early attempt is much more convoluted than this attempt with streams.
  4. I agree with @awygle regarding clock domain crossing: it's better to specify that both sides of the stream be in the same clock domaain.
  5. @anuejn's apertus-open-source-cinema/naps is the most original implementation of the LiteX style streams I've come across. There are some interesting differences in design choices, but the basic protocol - valid, ready, payload and last - is still obviously useful.
  6. I found this whole issue quite useful in building an implementation. Particularly useful wer @anuejn's observations, @enjoy-digital's summary of LiteX behavior and @awygle's answers to @mithro. Thanks to everyone for sharing.
  7. I would likely adopt almost any upstream implementation of streams in preference to having my own.

Finally, here's that diagram: Streams Overview

rroohhh

rroohhh commented on Aug 16, 2021

@rroohhh
Contributor

@alanvgreen Some things that came to mind:

On the Meaning of valid and ready

There are plenty of valid that an endpoint might either ignore valid or ready or assert them constantly. For example, a video PHY may need to consume data every single clock cycle in order to generate a correct video signal. It might therefore have a sink that always asserts ready, and always reads the payload regardless of the state of valid.

Endpoints like this should be explicitly allowed, so long as the behavior is well documented.

I agree with your assessment, that there are plenty of cases where an something might want to use a stream like interface without obeying all of the rules. However I think one has to be very careful when doing that, as it can deadlock the whole pipeline of streams.
A good idea here might be to use seperate types for streams with relaxed rules, so one has to explicitly insert some kind of "converter" to connect it to a "real" stream.

Furthermore, in naps it was very useful to formally check that the modules using streams actually obey the rules of the stream interface:

  1. valid not depending on ready
  2. the payload staying the same while valid is asserted and until ready is asserted
    We were able to catch a number of bugs using these two simple checks (see here for the implementation)
3. I do not recall seeing any example of library classes that are able to use the _first_ or _last_ signals in a meaningful way, other than to pass them on. This is probably related to the previous point.

Atleast last can be useful in cases where one buffers a number of data beats to then write them all out in one go.
An example of this is the DramPacketRingbufferStreamWriter in naps. It writes a stream of data to dram using the AXI interface. The AXI interface works in burst, meaning you can transmit a single base address which then gets used for a number of data words. To minimize the number of idle cycles on the AXI bus the DramPacketRingbufferStreamWriter tries to create as big bursts as possible (16 words in AXI3). If however the total number is not divisable by 16 this would normally cause some words to get stuck. With a last signal however the core can flush the rest of the buffered data accordingly.
(The implementation is quite convoluted, but most of this can be found here

But I think in general I agree, last is usually related to the payload, and there is not even a clear unique definition of last for many types for streams. For example naps defines a ImageStream, which has both a line_last (which is asserted on the last pixel of a single line in the image) and frame_last (which is asserted on the last pixel of a whole frame). These are both very useful, for example line_last can be useful in a variety of video specific cores (like resizing, downsampling, etc), and frame_last can be very useful where whole images are needed (for example to switch to the next image buffer in memory after one image was written).

1. Using separate types for _Source_ and _Sink_, and using the type system to ensure that sources only connect to sinks and not any one of the three wrong ways that it can be done (source to source, sink to sink, and sink to source).

This is something we tried in naps aswell, but actually found it more confusing down the line.
Take for example this:

class StreamTruncate(Elaboratable):
  def __init__(self, input: Sink, output_width: int):
    self.output = Source(unsigned(output_width))
  
  def elaborate(self, plat):
    m = Module()
    m.d.comb += output.connect_upstream(input, exclude=["payload"])
    m.d.comb += output.payload.eq(input.payload >> (len(input) - output_width))
    return m

Inside a module one always wants to connect data coming from a Sink to a Source (connect_upstream) and external to a module one wants to do the opposite, connecting data coming from a Source to a Sink. This constant switching between the directions was more confusing for us than simply having one type.

2. AXI4 and Avalon streams are relativly heavy-weight and difficult to use. They appear to be designed for connecting large chunks of IP (hundreds or thousands of lines of RTL) together, whereas LiteX streams are more focussed on connecting smaller pieces of IP (tens of lines of RTL).

I think all of the discussions in this issue were about AXI4-Stream and Avalon-ST, not the "full" AXI4 and Avalon. The AXI4-Stream and Avalon-ST variants are not a lot more heavy-weight than the proposal here.

3 remaining items

added a commit that references this issue on Aug 22, 2021
alanvgreen

alanvgreen commented on Aug 22, 2021

@alanvgreen
Contributor

@rroohhh You have convinced me that Source and Sink are not good first class concepts, and it would be better to have a single class name, with better names for variables to distinguish their role. Looking at examples such as LiteX's WishboneDMAWriter, renaming self.sink and self._sink to self.input and self._input. Similarly, using output instead of source helps too.

I rewrote the stream implementation that I have been working on in CFU-Playground, removing the distinction between Sink and Source and having just a single connect(upstream, downstream) function. It is a shorter implementation, and together with renaming variables it's seems to work much better.

It's currently using the name, Stream. I'm not particularly attached to the name, and open to suggestions. If nothing better comes up, I might try Endpoint again, just because LiteX developers will already be familiar with it.

@adamgreig @rroohhh I also thought about guarantees that endpoints should make regarding handshaking. I see the value in both loose guarantees (e.g the ADC input case or LiteX's BinaryActor) and firm guarantees, which aspertus-open-source/naps has found useful for finding bugs through formal verification.

There are four qualities which seem important, any of which could be true or false:
A. The upstream holds payload constant while it asserts valid.
B. The downstream relies on the upstream holding payload constant while it asserts valid.
C. The downstream will use the payload value even if valid is not asserted.
D. The upstream relies on the down stream not using payload while valid is not asserted.

A given upstream downstream should only be allowed to connected if:

(A or not B) AND (C or not D)

In other words, if the upstream does not hold payload constant while valid is asserted, but the downstream relies on it, the stream should not be connected, and similarly if the downstream ignores valid when reading the payload and the upstream relies on the downstream respecting valid, then the streams should also not be connected.

nap's formal_util.py - thanks for the pointer @rroohhh - checks A (The upstream holds payload constant while it asserts valid), and I wonder how hard it might be to check B, C and D also.

I think it would be worthwhile allowing each stream endpoint to express which of the four properties it has, especially for streams used in common libraries. If, by default, all four properties are false then this represents the least restrictive case, and is the same as LiteX's current implementation. Codebases that rely on formal verification can set the properties as needed.

I had an attempt at implementing A & C in the StreamDescription class, though I'm not happy with this, yet.

desowin

desowin commented on Mar 14, 2023

@desowin

renaming self.sink and self._sink to self.input and self._input. Similarly, using output instead of source helps too.

Treating sink as input and source as output generally allowed me to understand what is going on in OpenVizsla gateware. I came to this conclusion before reading through this issue (i.e. independently).

added this to the 0.5 milestone on Sep 3, 2023
changed the title [-]Stream Abstraction for nmigen.lib[/-] [+]Stream Abstraction for amaranth.lib[/+] on Mar 11, 2024
whitequark

whitequark commented on Mar 25, 2024

@whitequark
Member

RFC amaranth-lang/rfcs#61.

Superseded by #1244.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mithro@adamgreig@whitequark@desowin@alanvgreen

        Issue actions

          Stream Abstraction for amaranth.lib · Issue #317 · amaranth-lang/amaranth