Category Archives: VHDL

libv has a home

Some of my “useful bits” of library code have lived in libv.vhd for a while – I’ve split it off and licensed it with a CC0 license (which means the author disclaims copyright and offers no warranty). It’s on github and I’ll add contributions from anyone who has any!

Either individual functions to add to libv.vhd or great big wodges of useful code (like Jim Lewis’ randomized testing libraries maybe….)

Should VHDL be extended to allow the use of Unicode

I’m contributing to the VASG group which is working on coming up with what the next revision of VHDL should be able to do.

On today’s conference call, the idea was mooted that VHDL could allow the use of Unicode identifiers (ie entity, signal, variable names etc.).

All of today’s participants were (as far as I recall) native English speakers without much call for accented characters, much less characters from entirely different writing systems. So I’m putting a call out to see if there’s any interest from the wider community in pushing forwards a requirement for VHDL compilers to support Unicode.

Feel free to mail me, comment below or @mention me in a tweet with your thoughts – I’ll summarise the results here in a few weeks

Tool switches

@boldport asked:

What are your #FPGA design space exploration techniques?

which he expands upon:


“Design space exploration” is the process of trying out different settings and design methods for achieving better performance. Sometimes the goals are met without any of it — the default settings of the tools are sufficient. When they’re not, what are your techniques to meet your performance goals?

Yet again, the 140 character constraint leaves me with things unspoken….

Working where I do in the automotive market means that it’s not good enough to miss timing by a few picoseconds and say “it’ll be fine, ship it”. If you miss timing, you /have/ to make it pass.

My experience with tool tweakery is that it gains you a 2-5% timing improvement – which can be enough to meet timing when you just missed.

The downside is that usually, when you go and change the design (due to the requirements changing yet again), you find yourself with a slightly different 10ps timing violation which maybe this time the tools can’t get around. Or maybe with a change one of the seed parameters, it will, after some trial runs.

So, I’ve given up on that approach as being too variable. It’s much harder to give estimates of when something will be ready when timing closure is a “tweak the knobs a number of times and see”.

What I do now is rework things until it meets timing easily. That way, it’s likely to stay that way.

Techniques include:

  • Pipelining – adding registers
  • Constraining unconstrained integers – occasionally, the synthesiser doesn’t figure out the range an integer variable or signal can take on, so needs telling. This is happening less and less as synthesis tools get cleverer.
  • Simplifying algorithms

This give me a much more predictable build process. It’s seen me fine, even for a nearly full Spartan3 device with some logic running at 160+MHz DDR.

Of course, if you are right up against the limits of the device speed and you’ve pipelined and constrained and everything else, then tweaking tool parameters is all you have left – anyone in that position has my sympathies!

Inferred state machines in VHDL (vs 2-process-machines of all things!)

A few weeks ago I read a blog post by the illustrious MS researcher Prof. Satnam Singh. He writes about his Kiwi project which he describes as “[trying] to civilise hardware design” – as compared to the explicit writing of state machines. His example is a Ethernet processor which simply swaps the source and destination MAC addresses over and retransmits them. He has code in C#, and it looks a lot like the inferred state machine style of VHDL I’ve been toying with for a while. So (finally) I’ve toyed…

Inferred State machines in VHDL

Have a look at the C# source on the page linked to above, and then come back to see how easily it translates to VHDL. Hardly in need of “civilisation” IMHO :)

architecture inferred_sm_simple of ethernet_echo is
begin  -- architecture inferred_sm 
    echoer : process is
        type t_buffer is array (natural range <>) of std_logic_vector(7 downto 0);
        variable buff        : t_buffer(0 to 1023);  -- buffer is a reserved word in VHDL
        variable start       : boolean;
        variable i, j        : integer;
        variable doneReading : boolean;
    begin  -- process echoer 
        tx_sof_n     <= '1'; -- We are not at the start of a frame 
        tx_src_rdy_n <= '1';
        tx_eof_n     <= '1'; -- We are not at the end of a frame
        start        := rx_sof_n = '0' and rx_src_rdy_n = '0';   -- The start condition 
        main : loop          -- Process packets indefinitely 
            -- Wait for SOF and SRC_RDY 
            while not start loop
                wait until rising_edge(clk);
                exit main when resetn_clk = '0';
                start := rx_sof_n = '0' and rx_src_rdy_n = '0';  -- Check for start of frame 
            end loop;
            -- Read in the entire frame 
            i           := 0;
            doneReading := false;

            -- Read the remaining bytes
            while not doneReading loop
                if rx_src_rdy_n = '0' then
                    buff(i) := rx_data;
                    i       := i+1;
                end if;
                doneReading := rx_eof_n = '0';
                wait until rising_edge(clk); exit main when resetn_clk = '0';
            end loop;

            tx_src_rdy_n <= '0';    -- We are not at the start of a frame
            -- Now send an Ethernet packet back to where it came from
            -- Swap source and destination MAC addresses
            tx_sof_n     <= '1';
            for j in 6 to 11 loop   -- Process a 6 byte MAC address
                tx_data  <= buff(j);
                tx_sof_n <= '0';
                if j /= 6 then
                    tx_sof_n <= '1';
                end if;
                wait until rising_edge(clk); exit main when resetn_clk = '0';
            end loop;
            for j in 0 to 5 loop    -- Process a 6 byte MAC address
                tx_data <= buff(j);
                wait until rising_edge(clk); exit main when resetn_clk = '0';
            end loop;
            -- Transmit the remaining bytes
            j := 12;
            while j < i loop
                tx_data <= buff(j);
                if j = i - 1 then
                    tx_eof_n <= '0';
                end if;
                j := j + 1;
                wait until rising_edge(clk); exit main when resetn_clk = '0';
            end loop;
            tx_src_rdy_n <= '1';
            tx_eof_n     <= '1';
            start        := false;  -- No longer at start of frame
            wait until rising_edge(clk); exit main when resetn_clk = '0';
            -- End of frame, ready for next frame
        end loop;
    end process echoer;
end architecture inferred_sm_simple;

It’s a very easy translation from one to the other as you’ll see if you put the two pieces of code side by side. And it comes in at 67 lines. Prof. Singh’s C version comes in at (if you neglect the equivalent of the VHDL entity as I did for that version) around 70 lines. So much for the verboseness of VHDL compared to C :) There is a horrendous (IMHO!) VHDL version also on the MSDN page, which Prof. Singh describes as “yuk!” and I quite agree. Personally, the two process style does nothing for me and obscures the design intent of the code. THe comparison is not direct as it uses a shift register to store the data in (in a more tradtional way), but it’s 89 lines long. If refactored to a single process, you’d save about 15 lines, bringing it to much the same length as the other two versions!

But does it synthesise? Yes… if you have the right tools.

Synplify Pro worked fine. XST doesn’t like the loops within an inferred state machine. And XST has gotten worse recently, as the new parser used for the -6 series of FPGAs doesn’t support inferred state machines at all. You can only use wait for rising_edge(clk) once at the start of a process. I’d be interested to know if Quartus can handle it. [ Update – Enrik informs us in the comments that Quartus doesn’t like it either) It comes out at about 6000LUTs, 8200 register – almost entirely for the buff storage array (8192 registers on it’s own!) which is read asynchronously and Synplify is not able to infer a Block RAM. I have inquired of Prof. Singh how large his C# implementation compiles to – that’ll be very interesting. If the large buffer array can be inferred to a blockram that’ll be a huge win for the C# approach! It has been pointed out that you wouldn’t really want to design this circuit this way (as you end up with a load of flipflops not a RAM block) – I was aware of this when I started, but it’s an exercise in comparing coding styles across languages, not in creating optimal hardware.

Why would you bother?

Well, it saves you having to think of names for your states. For some state machines this is a boon. The downside of this is that carefully chosen names for states can be self-documenting, which is good. In this example, comments like -- transmit the remaining bytes could probably be removed by having a state called transmit_remaining_bytes (although the lazy typist might abbreviate that to trb and then comment it anyway!) And it’s ~10% terser, and (I think) more readable in this case. Less code is usually good :) (yes, I know using one character variable names and squashing it all up might count as “unreadable less code”, but you have to apply a bit of common-sense as well!)

Downsides

  • It looks weird (but that’s mainly because there’s no code that look like it in mainstream circulation as far as I’m aware).
  • And you have to type a load of text for each clock tick to infer. wait until rising_edge(clk); exit main when resetn_clk = '0'; (This maybe one argument for a VHDL preprocessor, as it’s not “encapsulatable” in any other way I’ve figured out yet. Suggestions appreciated :) This is a pretty strong down-side IMHO, I loathe repeated code that ought to be encapsulated.
  • And we’ve yet to see how much worse it is resource-usage wise.
  • For very “branchy” state machines it may not work out much different in terms of readability – I haven’t got an example to play with.

Code

The code (should anyone want to have a play) is available from Github

Reading image files with VHDL part 3

Having written a function to read a PGM image, how do we use it?

Well, the testbench at the bottom of the pgm vhdl file has an example:

variable i : pixel\_array\_ptr;  
-- Without the transpose function, we would have to present the initialisation data in a non-intuitive way.  
constant testdata : pixel\_array(0 to 7, 0 to 3) := transpose(pixel\_array'(  
(000, 027, 062, 095, 130, 163, 198, 232),  
(000, 000, 000, 000, 000, 000, 000, 000),  
(255, 255, 255, 255, 255, 255, 255, 255),  
(100, 100, 100, 100, 100, 255, 255, 255))  
);  

Being by setting up a variable to store the image in. We also create a constant which contains the data we expect to read from the file. As explained previously we have to transpose the array in order for it to “look as expected” in the code (which is more important than an extra function call to my mind)

Then we can test and check:

-- test on a proper image  
i := pgm_read("testimage_ascii.pgm");  
assert i /= null report "pixels are null" severity error;  
assert_equal("ASCII PGM Width", i.all'length(1), 8);  
assert_equal("ASCII PGM Height", i.all'length(2), 4);  
assert_equal("ASCII PGM data", i.all, testdata);  

That simple!

Code for this series can be found on github

Reading image files with VHDL part 2

Having set up a library for reading images, let’s now go on to read an image in!

pgm_read Recall, we have a function:

impure function pgm_read (filename : string)
return pixel_array_ptr;
 So, some declarations - fairly self-explanatory: 

file pgmfile : text;
variable width, height : coordinate; -- storage for image dimensions
variable l : line; -- buffer for a line of text
variable s : string(1 to 2); -- to check the P2 header
variable ints : integer_vector(1 to 3); -- store the first three integers (width, height and depth)
variable int : integer; -- temporary storage
variable ch : character; -- temporary storage
variable good : boolean; -- to record whether a read is successful or not
variable count : positive; -- keep track of how many numbers we've read
variable empty_image : pixel_array_ptr := null; -- return this on error
variable ret : pixel_array_ptr; -- actual return value
variable x, y : coordinate; -- coordinate tracking
 We begin by opening the file in read mode and reading the first line: 

-- setup some defaults
width := 0;
height := 0;
file_open(pgmfile, filename, read_mode);
readline(pgmfile, l);
 Now, reading the header.. The P2 format PGM header is very simple: 

P2
# optional comments - next two ints are
# xsize and ysize
128 90
# maybe some more comments
# next int is the max value for pixel
255
 All the integers are separated by whitespace (may be space characters or linefeeds or carriage returns, we know not). This suits VHDL, as the 

textio library can read whitespace separated integers from a text file! It’s slightly tricky as we have to manage the cases where the values appear on separate lines. First, check the “P2-ness”:
read(l, s(1));
read(l, s(2), good);
if not good or s /= “P2” then
report “PGM file ‘”&filename&”‘ not P2 type” severity warning;
file_close(pgmfile);
return empty_image;
end if;
VHDL’s

textio is a bit different from in most other languages – we read a line from the file then read values from the line buffer. So, we read the first two characters and check them against “P2”. Now to read the next three integers:
allints : loop — read until we have 3 integers (width, height and colour depth).
line_reading:loop
readline(pgmfile, l);
exit when l.all(1) = ‘#’; — skip comments;
if l’length = 0 then
report “EOF reached in pgmfile before opening integers found”
severity warning;
file_close(pgmfile);
return empty_image;
end if;
number_reading: loop
read(l, ints(count), good);
exit number_reading when not good; — need to read some more from file
count := count + 1;
exit allints when count > ints’high;
end loop;
end loop;
exit when count > ints’high;
end loop;
— Now we have our header sorted. store it
width := ints(1);
height := ints(2);
We have three loops – the outermost

allints loop runs until count increments beyond the end of the ints array. The line_reading loop reads a line, checks for comment lines to skip them, passes non-comment lines onto the next loop: number_reading – this reads numbers from the line buffer until it fails or has filled the ints array. In the former case, the line_reading loop provides more data, otherwise, we drop out and store our data. Once we have the width and height we can allocate an array and read the pixels:
— now read the image pixels
x := 0;
y := 0;
— allocate storage
ret := new pixel_array(0 to width-1, 0 to height-1);
allpixels : loop
readline(pgmfile, l);
exit when l = null; — oh dear, something went wrong!
exit when l’length = 0; — more wrongness!
numbers: loop
read(l, int, good);
exit numbers when not good;
ret(x, y) := int;
exit allpixels when x = width-1 and y = height-1;
x := x + 1;
if x >= width then
x := 0;
y := y + 1;
end if;
end loop numbers;
end loop allpixels;
Again a nested loop structure, one for reading the lines from the file and another for extracting the integers from the linebuffer. The x and y coordinates are updated on each integer read to “scan” across and down the image. Once

x and y reach their terminal values, the loop exits.
assert (x = width-1 and y = height-1)
report “Don’t seem to have read all the pixels I should have”
severity warning;
return ret;
Finally, a little error checking and warning, and return the data.

Onwards, to making use of it… Code for this series can be found on github

Reading image files with VHDL part 1 (again)

Data storage

One of the things VHDL can do (contrary to popular belief :)) is 2-D arrays. So reading images into a 2-D array is a very natural way to store the data. We’ll create a package called pgm to keep all our image reading and writing code together:

package pgm is
  subtype coordinate : natural;
  subtype pixel is integer range 0 to 255;
  type pixel_array is array (coordinate range <>, coordinate range <>) of pixel;
  type pixel_array_ptr is access pixel_array;
  -- Function: transpose
  -- Useful for initialising arrays such that the first coordinate is the x-coord, but the initialisation can "look" like the
  -- image in question within the code
  function transpose (i : pixel_array) return pixel_array;

Some points:

  • We’ve got a subtype for the raw pixel type – this enables us to change a single place if we decide to extend to (say) 16-bit pixels.
  • Images are stored in a 2d array, with the x-coordinate as the first dimension, as is traditional (unless you use Matlab!)
  • We’ll use an access type for the pixel storage being returned from the image reading function pgm_read to allow us to dynamically allocate the storage based on how big the actual image is, rather than having to know at compile-time.
  • The transpose function is the best way I can come up with so far that allows us to create constants which represent images where the code looks like the image it represents. Without this an initialiser for a 4×2 image would look like this:

    constant c : pixel_array(0 to 3, 0 to 1) := ( (11, 12), (21, 22), (31, 32), (41, 42));
    

    A bit counter-intuitive, no? With the transpose function we can do:

    constant c : pixel_array(0 to 3, 0 to 1) := transpose(pixel_array'(
    (11, 21, 31, 41),(12, 22, 32, 42));
    

API

The public API for playing with the images will be simple:

  • a function to read an image from a file into an image record
  • a function to write an image record out to a file

    impure function pgm_read (filename : string) return pixel_array_ptr;
    procedure pgm_write (filename : in string; i : in pixel_array);
    end package pgm;

Testing

For the input side, a simple testbench is all that is required, along with a carefully crafted PGM file for which we “know the right answers”. We can also create an image from scratch and write the new image out. Checking that final image will have to be done by hand. We could “round-trip” the image back in through our image reading function, but this may mask any errors we’ve made in the reading and writing functions which happen to cancel each other out!

entity tb_pgm is  
end entity tb_pgm;

use work.libv.all;  
use work.pgm.all;  
architecture test of tb_pgm is  
begin -- architecture test  
  test1 : process is  
    variable i : pixel_array_ptr;  
    constant testdata : pixel_array(0 to 7, 0 to 3) := transpose(pixel_array'(  
(000, 027, 062, 095, 130, 163, 198, 232),  
(000, 000, 000, 000, 000, 000, 000, 000),  
(255, 255, 255, 255, 255, 255, 255, 255),  
(100, 100, 100, 100, 100, 255, 255, 255))  
);  
    variable blacksquare : pixel_array(0 to 7, 0 to 7) := (others => (others => 0));  
  begin -- process test1  
    -- test on a proper image  
    i := pgm_read("testimage_ascii.pgm");  
    assert i /= null report "pixels are null" severity error;  
    assert_equal("ASCII PGM Width", i.all'length(1), 8);  
    assert_equal("ASCII PGM Height", i.all'length(2), 4);  
    assert_equal("ASCII PGM data", i.all, testdata);

    -- make sure we return a non-image for the binary-style PGM file
    i := pgm_read("testimage.pgm");
    assert i = null report "Binary pixels should be null" severity error;

    -- Now create an image from scratch - a letter M
    blacksquare(1,1) := 255; blacksquare(5,1) := 255;
    blacksquare(1,2) := 255; blacksquare(2,2) := 255; blacksquare(4,2) := 255; blacksquare(5,2) := 255;
    blacksquare(1,3) := 255; blacksquare(3,3) := 255; blacksquare(5,3) := 255;
    blacksquare(1,4) := 255; blacksquare(5,4) := 255;
    blacksquare(1,5) := 255; blacksquare(5,5) := 255;
    blacksquare(1,6) := 255; blacksquare(5,6) := 255;
    pgm_write("test_write.pgm", blacksquare);
    report "End of tests" severity note;
    wait;
  end process test1;
end architecture test;

Aside (libv)

You can read about the non-standard looking assert_equal functions here.

Back to the tests

In the spirit of Test-Driven Development, all the API elements are left empty, except for the pgm_read function which has to return something. So it returns NULL – an empty image! With the results:

pgm.vhd:204:9:@0ms:(assertion error): pixels are null  

Hurrah! It fails in the way we’d hope!

Onwards to actually reading and writing…

Executable comments

Comments in code are very useful. But not as good as executable comments…

I write image processing code at work. One of my FPGAs has a piece of code which generates a signal which has a hard-coded number of clock cycles that it is low for. This is fine in the application – it never needs to change, it just has to match what the camera does, and the software programs that up the same every time.

So, in the (detailed) comments for this module, I made a note that this was the case. However, recently, I needed to change the value that the software sends to the camera to give a bit more time for the processing. So I changed my VHDL tests so that the represented the new value the camera would be using, and ran my regression suite. No problem, all works fine.

We pushed the code into the FPGA and tried it on the bench. All works fine except that this particular signal doesn’t match the camera any more. And my testbenches don’t model the chip at the other end of the link in that level of detail. What I should have done, as well as writing the comment was add some code to check that it was being obeyed.

If I assert something must be true in the comments (ie This signal should match this other signal timing) then I should add some code to tell me off if I make it untrue! The word assert is key – use a VHDL assertion to make sure that the two signals match:

process (clk, camera_sig, mysig) is
  variable camera_low, mysig_low:natural:=0;
begin — process
  if falling_edge(mysig) then
    assert camera_low = mysig_low
    report "Camera sig timing doesn’t match mysig timing"
    severity error;
  end if;
  if rising_edge(camera_sig) then
    camera_low := 0;
  end if;
  if rising_edge(mysig) then
    mysig_low := 0;
  end if;
  if rising_edge(clk) then
    if camera_sig = ’0′ then
      camera_low := camera_low + 1;
    end if;
    if mysig_low = ’0′ then
      mysig_low := mysig_low + 1;
    end if;
  end if;
end process;

The key assert is at the top of that process. The rest simply counts clock ticks while the relevant signal is low. You could also do it without the clocks and capture the time at the start and the end of the pulses to compare…

And that’s an executable comment!

Numbers in VHDL

A couple of times recently, I’ve found myself staring at VHDL code that starts thus:

library ieee;
use ieee.std_logic_arith.all;

and had to explain to the author that this is wrong. Yes, using an IEEE-library is wrong… how can this be?

When VHDL started out, its main logic type when used for creating chips was the “std_logic_vector” – a bag of bits with no intrinsic meaning numerically. There’s the integer types if you want numbers, but that was a bit iffy and inefficient with early synthesis tools. So Synopsys (big vendor of ASIC design tools) in their infinite wisdom came up with an extension library which they called std_logic_arith (and its compatriots std_logic_unsigned and std_logic_signed) which created some variations on the “bag-of-bits” theme which actually had numeric meaning.

This was popular – you could define how big your numbers were going to be by the size of the vector holding them, and you knew they would “wrap-around” like real hardware. VHDL’s integer types flagged errors in the simulation if they went out of range. In addition (sorry :), the synthesisers of the time tended to produce inefficient integers (always 32 bits), unless you constrained the range on them — that seemed unpopular with some. With Synopsys’ libraries, you could write code like this:

some_logic_vector <= some_logic_vector + 1; 

and have it increment your bag-of-bits by 1. Now, VHDL is a strongly typed language — this adding of an integer to a vector is just contrary to the whole design of the langauge. It makes it like Verilog (a popular alternative to VHDL). Briefly, Verilog is to C as VHDL is to Pascal or ADA.

  • Verilog: Give the programmer/hardware designer some sharp tools and let them get on with it
  • VHDL: give the programmer/designer some sharp tools and safety gear

Now, some think the safety gear gets in the way, but I quite like the fact that once my code has compiled, any errors are likely my own design errors, rather than wacky unexpected things the tool does…

Anyway, this would all have been fine if Synopsys had compiled their libraries by default into a synopsys library. But they didn’t – they put it into the IEEE library! So it gained false credibility, as well as being useful.

So when the IEEE come along with a proper standard library – numeric_std – it’s in direct “competition” (for want of a better word). So we had a generation of designers brought up on std_logic_arith. And they wrote some textbooks and example code. And the universities taught it. And other tool vendors (esp. the FPGA vendors, who have a large student audience) used it in their examples.

So now we have a third or fourth generation set of designers who are still using outdated, and non-standard libraries, in the mistaken belief that they are a standard. The company I work for was once bitten by this (fortunately at the FPGA prototyping stage, not in an ASIC). The behaviour of the simulator’s std_logic_arith library was different to the synthesiser’s when comparing vectors of different lengths! That was ten+ years ago, and I haven’t touched the Synopsys libraries since (and I wasn’t even working on that chip!)

The alternative is to use the standardised ieee.numeric_std library. It works the same in every package! The only downside is that you can’t “add 1 to your bag-of-bits”. But if they were going to represent a number, why not use the correct type (either signed or unsigned)? Or use an integer type – they simulate faster and tell you if they overflow (overflows are often a bad thing)! Once you’ve made the commitment to only using which std_logic_vectors when the bits truly have no numerical meaning and using unsigned or signed explicitly, there’s no penalty. Because you can add 1 to an (un)signed vector, and you can do all the other expected numerical operations. And they wrap around like “real hardware” if that’s the behaviour you want. So there it is — there’s no reason to use std_logic_arith — use numeric_std!