A few weeks ago I read a blog post by the illustrious MS researcher Prof. Satnam Singh. He writes about his Kiwi project which he describes as “[trying] to civilise hardware design” – as compared to the explicit writing of state machines. His example is a Ethernet processor which simply swaps the source and destination MAC addresses over and retransmits them. He has code in C#, and it looks a lot like the inferred state machine style of VHDL I’ve been toying with for a while. So (finally) I’ve toyed…
Inferred State machines in VHDL
Have a look at the C# source on the page linked to above, and then come back to see how easily it translates to VHDL. Hardly in need of “civilisation” IMHO :)
architecture inferred_sm_simple of ethernet_echo is begin -- architecture inferred_sm echoer : process is type t_buffer is array (natural range <>) of std_logic_vector(7 downto 0); variable buff : t_buffer(0 to 1023); -- buffer is a reserved word in VHDL variable start : boolean; variable i, j : integer; variable doneReading : boolean; begin -- process echoer tx_sof_n <= '1'; -- We are not at the start of a frame tx_src_rdy_n <= '1'; tx_eof_n <= '1'; -- We are not at the end of a frame start := rx_sof_n = '0' and rx_src_rdy_n = '0'; -- The start condition main : loop -- Process packets indefinitely -- Wait for SOF and SRC_RDY while not start loop wait until rising_edge(clk); exit main when resetn_clk = '0'; start := rx_sof_n = '0' and rx_src_rdy_n = '0'; -- Check for start of frame end loop; -- Read in the entire frame i := 0; doneReading := false; -- Read the remaining bytes while not doneReading loop if rx_src_rdy_n = '0' then buff(i) := rx_data; i := i+1; end if; doneReading := rx_eof_n = '0'; wait until rising_edge(clk); exit main when resetn_clk = '0'; end loop; tx_src_rdy_n <= '0'; -- We are not at the start of a frame -- Now send an Ethernet packet back to where it came from -- Swap source and destination MAC addresses tx_sof_n <= '1'; for j in 6 to 11 loop -- Process a 6 byte MAC address tx_data <= buff(j); tx_sof_n <= '0'; if j /= 6 then tx_sof_n <= '1'; end if; wait until rising_edge(clk); exit main when resetn_clk = '0'; end loop; for j in 0 to 5 loop -- Process a 6 byte MAC address tx_data <= buff(j); wait until rising_edge(clk); exit main when resetn_clk = '0'; end loop; -- Transmit the remaining bytes j := 12; while j < i loop tx_data <= buff(j); if j = i - 1 then tx_eof_n <= '0'; end if; j := j + 1; wait until rising_edge(clk); exit main when resetn_clk = '0'; end loop; tx_src_rdy_n <= '1'; tx_eof_n <= '1'; start := false; -- No longer at start of frame wait until rising_edge(clk); exit main when resetn_clk = '0'; -- End of frame, ready for next frame end loop; end process echoer; end architecture inferred_sm_simple;
It’s a very easy translation from one to the other as you’ll see if you put the two pieces of code side by side. And it comes in at 67 lines. Prof. Singh’s C version comes in at (if you neglect the equivalent of the VHDL
entity as I did for that version) around 70 lines. So much for the verboseness of VHDL compared to C :) There is a horrendous (IMHO!) VHDL version also on the MSDN page, which Prof. Singh describes as “yuk!” and I quite agree. Personally, the two process style does nothing for me and obscures the design intent of the code. THe comparison is not direct as it uses a shift register to store the data in (in a more tradtional way), but it’s 89 lines long. If refactored to a single process, you’d save about 15 lines, bringing it to much the same length as the other two versions!
But does it synthesise? Yes… if you have the right tools.
Synplify Pro worked fine. XST doesn’t like the loops within an inferred state machine. And XST has gotten worse recently, as the new parser used for the -6 series of FPGAs doesn’t support inferred state machines at all. You can only use
wait for rising_edge(clk) once at the start of a process. I’d be interested to know if Quartus can handle it. [ Update – Enrik informs us in the comments that Quartus doesn’t like it either) It comes out at about 6000LUTs, 8200 register – almost entirely for the
buff storage array (8192 registers on it’s own!) which is read asynchronously and Synplify is not able to infer a Block RAM. I have inquired of Prof. Singh how large his C# implementation compiles to – that’ll be very interesting. If the large
buffer array can be inferred to a blockram that’ll be a huge win for the C# approach! It has been pointed out that you wouldn’t really want to design this circuit this way (as you end up with a load of flipflops not a RAM block) – I was aware of this when I started, but it’s an exercise in comparing coding styles across languages, not in creating optimal hardware.
Why would you bother?
Well, it saves you having to think of names for your states. For some state machines this is a boon. The downside of this is that carefully chosen names for states can be self-documenting, which is good. In this example, comments like
-- transmit the remaining bytes could probably be removed by having a state called
transmit_remaining_bytes (although the lazy typist might abbreviate that to
trb and then comment it anyway!) And it’s ~10% terser, and (I think) more readable in this case. Less code is usually good :) (yes, I know using one character variable names and squashing it all up might count as “unreadable less code”, but you have to apply a bit of common-sense as well!)
- It looks weird (but that’s mainly because there’s no code that look like it in mainstream circulation as far as I’m aware).
- And you have to type a load of text for each clock tick to infer.
wait until rising_edge(clk); exit main when resetn_clk = '0';(This maybe one argument for a VHDL preprocessor, as it’s not “encapsulatable” in any other way I’ve figured out yet. Suggestions appreciated :) This is a pretty strong down-side IMHO, I loathe repeated code that ought to be encapsulated.
- And we’ve yet to see how much worse it is resource-usage wise.
- For very “branchy” state machines it may not work out much different in terms of readability – I haven’t got an example to play with.
The code (should anyone want to have a play) is available from Github