Category Archives: FPGA

Why use an FPGA?

“Please help me do this on an FPGA”

The question you shouldn’t ask!

A common refrain on many of the internet’s finest help forums and newsgroups is “I’m trying to do x using an FPGA, help!” And very often “x” is a task which would be more optimally (by many different measures!) be performed in another way. But there is a common assumption that if a task is “intensive” then FPGAs are the answer. One recent example was asking how to implement face-detection in FPGA. It quickly became apparent the the poster didn’t actually know how to perform face-detection at all, so adding FPGAs to the equation was not a great help!

For a quick answer to the question “why use an FPGA?”, I’ll reproduce this list that I used in a lecture to a class of undergraduates:

Use an FPGA if one or more of these apply:

  • you have hard real-time deadlines (measured in μs or ns)
  • you need more than one DSP (lots of arithmetic and parallelisable)
  • a suitable processor (or DSP) costs too much (money, weight, size, power)

And for students, there’s one more:

  • Because your assignment tells you to :) Although ideally it’ll be something that is at least representative of a reasonable FPGA task (not a traffic light controller or vending machine!)

What to do instead?

Software’s easy

Writing software, I’d hazard a guess that even amongst embedded software engineers (those that work at the really low level, not writing code for PCs) many of them don’t really know what their target processor looks like under the hood. They just click compile, wait maybe 10 seconds, and test. And that’s great, it makes for a very productive development environment. When you are sufficiently abstracted from the architecture, there’s enough performance from the tools and chips that you don’t (often) have to think hard about how to implement things, you can just get on with the interesting bit, creating your application.

FPGAs hurt

In comparison, FPGAs are painful to use – don’t get me wrong, the software tools and the silicon architectures have improved massively over the last few years – but compared to writing software, it’s a completely different realm. You have to be much more aware of the architecture of your device, know much more about how the tools operate, wait ages for them to run (and think fundamentally differently about algorithms and implementations). FPGA code takes tens of minutes to compile, and it’s much easier to push up against the performance limits, and then have to mess around with your code to make it more recognisable to the tools you are using.

Choosing an implementation

My advice is always “Avoid using an FPGA unless you have to”. And I say this as a great advocate of FPGAs!

If you can do it in Octave or Matlab on a PC, do so. In fact, even if you end up somewhere else, start from there so you can understand the problem properly.

If you don’t have enough processing power, make use of a GPU.

If that solution costs too much (in money, power, size, weight terms) then you’ll have to get cleverer. Start thinking about microcontrollers. They’re well-tooled up and very powerful. You can have an 80MHz 32-bit ARM for a few pounds (or Euros. Or Dollars) these days, you can do an awful lot with that.

If you’re still struggling for processing power, think about a DSP. But be careful – analyse what you are trying to do very carefully. Figure out which bits will suit a DSP (lots of multiplying and adding in parallel with memory moving) – suddenly you have to know your architecture, just to decide if it’s feasible. Be careful about memory bandwidth, caches are not magic and if your code requires data reads or writes that are randomly scattered about, expect to lose some performance.

The next stage on might be multiple DSPs… and once you start considering multiple DSPs, it might finally be time to think about an FPGA. The downside is you are responsible for so much more of the architecture. Floating-point maths is becoming a sensible option, but you’ll still want to look at the trade-off between development time using floating point and the device-size cost and power savings that come from using a fixed point implementation. You can take advantage of your knowledge of data access patterns to tune the memory controller – in fact you’ll probably have to – yet more grovelling around in the details. Add to this, the fact that it’s a lot harder to hire good FPGA people than DSP people (and they are harder to get than microcontroller people), and help on the internet can be harder to come by. You development time will lengthen as you build simulation models of the hardware you are talking to and have to debug them. And the hour-long build times will try your patience.

But if you have good reason to, go for it!

FPGAs are really well suited to

  • many image and radar processing tasks especially when cost, power and space constrained. (Disclosure: I wrote the first article)
  • financial analysis (when time constrained)
  • seismic analysis (lots of money at stake, the faster you process, the more processing you can do, and the less risk to your drilling)
  • hard-real-time, low-latency deadlines – single-digit microsecond response times to stimulus. See the second page of this flyer – I’ve worked on this project too.


A while ago I compared Altera and Xilinx’s ARM-based FPGA combos. More information is now available publicly, so let’s see what we know now…

One thing that’s hard to miss is that Altera are making a big thing of their features to support applications with more taxing reliability and safety requirements.

Altera’s external DRAM interface supports error-checking and correction (ECC) on 32-bit wide memory, whereas Zynq can only do this on 16-bit wide memory, allowing Altera to keep a higher-performance system with ECC. The Altera SoCs also claim ECC on the large blocks of RAM within the processor subsystem (ie the L2 cache and peripheral memory buffers). It appears that Zynq only has parity (ie error checking, but not correction) on the cache and on-chip memory. In Xilinx’s favour, they have performed lots of failure testing (they always have – to a heroic degree!) and the entire processor subsystem has a silent data corruption rate of about 15 FIT. Not seen any FIT data for Altera yet.

Both vendors have memory protection within the microprocessor section to stop errant software processes stomping on each other’s data, but Altera appear to have additional protection within the DDR controller too, which presumably protects against accesses from the FPGA fabric going where they shouldn’t. Again, Zynq does not (as far as I can see) provide this feature.

Looking “mechanically”, Altera have devices which are pinout compatible with and without their many-gigabit-transceiver blocks, which would provide one of my applications with a useful development interface which could be dropped in production without a board respin.

Finally, Altera also have a single-core option. Of course, that only makes any difference if it saves enough money to make the silicon cheaper in any applications which can get away with a single core. Xilinx have clearly decided not… we’ll have to see!

FPGAs and ARMs – a summary

Today, I compared the new combined ARM and FPGA devices from Xilinx and Altera.

This post summarises that rather long post!


Well, there’s two interesting new series of devices. Both chip families look awesome (that’s not a word I habitually use, unlike in some parts of the internet… consider it high praise :). I foresee all sorts of unforeseen applications (if you’ll forgive the Sir Humphry-ism) enabled by the tight coupling of processor and logic. Can you choose between them? Well, Xilinx’s Zynq has more memory tightly coupled with the processors, maybe a little less on the FPGA side. Zynq also has the XADC, which shouldn’t be overlooked. A single-chip radar processor is feasible with the combination of XADC and large scratchpad. Altera have a more flexible FPGA to processor-memory interface, but Xilinx’s looks eminently good enough. Xilinx have a lot less details published as yet, so there’s no doubt more good stuff to learn from them, and Altera clearly still have things up their sleeves. I’ll update here as more information becomes available.

FPGAs considered ARM-full

Xilinx and Altera have both announced FPGAs with hard ARM processors on them. Xilinx have even got a new product famliy name (Zynq) for them. The products are potential game-changers in some applications. The combination of a high-performance application processor (or two!) tightly coupled to a large array of customisable logic, memory and DSP elements hasn’t really been done like this before. Consider: a PCIe FPGA board is 100s of microseconds or even milliseconds away from a host processor currently. With these architectures, the FPGA logic can be as little as a few dozen clock ticks (maybe a single microsecond) away. Altera and Xilinx are claiming 800MHz for the processors’ clock. For intensive applications (image-processing for instance) the algorithms you can contemplate are different to those which make sense on an Intel processor and memory system. The logic can be tightly coupled to its own memory subsystem as well as the processor shared memory, and data can to-and-fro between them with very small latency making “interactive” software/hardware acceleration a reality. So, is there any difference between them? I’ve trawled the publicly available information on both platforms (which is not overly detailed as yet) to see what I can glean.

Processor system

Both are using a dual-core Cortex-A9 with NEON extensions – a monster of an embedded processor. In raw clock terms it’ll be 5-10x faster (both vendors claim 800MHz) than a soft-core. There’s double-precision floating-point in the main core and a vectorised SIMD engine for DSP assistance. So, let’s go a bit deeper and compare some more gory details:


  • Altera: L1 2x32K per core, L2 512K shared, 64K RAM
  • Xilinx: L1 2x32K per core, L2 512K shared, 256K RAM Very similar – but more tightly-coupled RAM for Xilinx. This feels very significant, particularly in data-intensive applications. And I can’t help feeling these chips are not right unless you have a data-intensive application in mind!

Hard peripherals

  • USB OTG: Both have 2 ports
  • Gigabit Ethernet: Both have 2 ports with support for IEEE1588 timestamping (Altera also claim a Low-Power Idle mode)
  • Controller Area Network (CAN): both sport 2 ports, useful for automotive and industrial networking.
  • IIC: Xilinx have 2 on Zynq, Altera have 4
  • SPI: 2 on each
  • UART: 2 on each
  • SD/SDIO is supported by both vendors – Altera state they can boot from SD.
  • NAND flash – 8 bit for Altera, Xilinx not specified as yet. Xilinx’s static memory interface also supports NOR flash, which Altera are not going to. They say you can build your own in the fabric, which is fair comment – I’m not sure how much use parallel NOR flash will get when NAND and QSPI are there.
  • Quad-SPI is available on both devices.
  • both have an array of timers and GPIOs also.

Another significant difference: Xilinx also have an ADC on chip (XADC) which has been in their high-end families for system management for quite a long time. No evidence of Altera providing anything similar. This is quite a useful addition on Xilinx’s part as most reasonably critical systems will want to monitor temperature and power rails at the very least.

To shuffle all the data to and from these peripherals, both have 8 channel DMA engines; the high performance (USB, Ethernet etc) peripherals also have their own bus mastering capabilities. Xilinx have dedicated 4 DMA channels to the FPGA fabric. Altera says their DMA and fabric are connected, but nothing more specific. The peripherals on both vendors’ devices are wired to pins through a big multiplexer. The implication is that you can’t use all of the peripherals at the same time, although it looks like some of them can also be routed through the FPGA fabric to other IOs. The high-speed ports are more restricted on their pin options. On Altera’s version they have documented the various options – one obvious niggle is that one of the USB ports shares pins with an Ethernet port. I guess for most use cases it’s either two Ethernets or two USBs that are needed, rarely 2 of both. But I imagine that’s one of the ports that can’t be sent through the fabric, as ULPI is a bit special.

SDRAM controller

Both vendors have hard DDR controllers with built-in bandwidth management of the array of ports – something quite costly to build into a soft-core memory controller. Both vendors support DDR2, DDR3 and LPDDR2. Altera’s controller also supports LPDDR1. Altera claim support for ECC on both 16- and 32-bit widths. Xilinx aren’t saying at the moment. The SDRAM controllers have many connections to the FPGA fabric, as you’d hope: Altera have sufficient wiring to the FPGA logic to make 3 or 4 bidirectional ports (depending on bus interface) or 1 very wide port (256-bit!) in each direction, or various combinations in between. Xilinx simply have 4 64-bit ports to the FPGA fabric.


The phrase used in Xilinx’s white paper is “processor-centric”. The Zynq devices are definitely being positioned by Xilinx as completely different beasts to normal FPGAs – hence the family gets its own name. This is a quite clearly a processor with an FPGA on the side. Zynq’s processor boots before the FPGA and then you use the processor to configure the FPGA. Altera are selling their family as more of a middle-ground “FPGA+processor on the same chip”, with boot-flexibility being part of their message. Either the processor or the FPGA part can boot first, with the first up configuring the other part. The processor can boot from QSPI, SD or NAND flash. The FPGA boots (well, OK, configures) with the usual traditional modes (parallel, serial) as well as PCIe – or presumably from the processor system. Personally, I like the Zynq approach – I want to forget the FPGA until I need it.

FPGA fabric

To get data to and from the fabric, Altera have 2 ports mastering to the FPGA (one fast, one slow) and 1 master from the FPGA. In addition there are the memory ports mentioned above. Also worth noting is that in the larger devices 1-3 more hard memory controllers connected directly to the FPGA fabric. Xilinx have 2 ports in each direction between the processor and FPGA and 4×64 bit ports from FPGA to memory. Any further memory interfaces will have to be built in the fabric, although even the low-end devices get the benefit of the Series 7 IOs, which means the PHY interface requires less heroic use of LUTs as delay lines to match DDR timings.

On the logic side, it looks like Altera are using the same adaptive logic module (ALM) – an 8-input fracturable LUT+4FFs+sundry carrychains and muxes – in both Cyclone V and Arria V: from 25K to 460K LEs across 5 family members. (Those must be marketing LEs, not actual ALMs!) Xilinx have a configurable logic block (CLB) – consisting of 8 6-input (somewhat-split-able) LUTs and 16FFs – again the same in both Artix-based and Kintex-based chips. They are claiming 30K to 235K LEs – again those must be marketing numbers. Those numbers are not directly comparable, but it looks like Altera’s biggest device may be significantly larger than Xilinx’s largest. Both vendors are offering ~200KB to ~2MB of FPGA-based memory (up to nearly 3MB at the to-end of Altera’s offering). Yes, those are mega-bytes, not the usual megabits that FPGAs used to get built with!


I’ve summarised all this in another post!

—| Edit —|

And a follow-up here

libv has a home

Some of my “useful bits” of library code have lived in libv.vhd for a while – I’ve split it off and licensed it with a CC0 license (which means the author disclaims copyright and offers no warranty). It’s on github and I’ll add contributions from anyone who has any!

Either individual functions to add to libv.vhd or great big wodges of useful code (like Jim Lewis’ randomized testing libraries maybe….)

Tool switches

@boldport asked:

What are your #FPGA design space exploration techniques?

which he expands upon:

“Design space exploration” is the process of trying out different settings and design methods for achieving better performance. Sometimes the goals are met without any of it — the default settings of the tools are sufficient. When they’re not, what are your techniques to meet your performance goals?

Yet again, the 140 character constraint leaves me with things unspoken….

Working where I do in the automotive market means that it’s not good enough to miss timing by a few picoseconds and say “it’ll be fine, ship it”. If you miss timing, you /have/ to make it pass.

My experience with tool tweakery is that it gains you a 2-5% timing improvement – which can be enough to meet timing when you just missed.

The downside is that usually, when you go and change the design (due to the requirements changing yet again), you find yourself with a slightly different 10ps timing violation which maybe this time the tools can’t get around. Or maybe with a change one of the seed parameters, it will, after some trial runs.

So, I’ve given up on that approach as being too variable. It’s much harder to give estimates of when something will be ready when timing closure is a “tweak the knobs a number of times and see”.

What I do now is rework things until it meets timing easily. That way, it’s likely to stay that way.

Techniques include:

  • Pipelining – adding registers
  • Constraining unconstrained integers – occasionally, the synthesiser doesn’t figure out the range an integer variable or signal can take on, so needs telling. This is happening less and less as synthesis tools get cleverer.
  • Simplifying algorithms

This give me a much more predictable build process. It’s seen me fine, even for a nearly full Spartan3 device with some logic running at 160+MHz DDR.

Of course, if you are right up against the limits of the device speed and you’ve pipelined and constrained and everything else, then tweaking tool parameters is all you have left – anyone in that position has my sympathies!

Version control for FPGAs

@boldport recently asked on Twitter what version control software people used on their FPGA designs. I replied that I use git at home and Subversion at work. The reasons why take a bit more than 140 characters, so I’ve written them here!


Work first – we were using Microsoft’s Visual Sourcesafe quite happily. Until it started to lose data on us. Not great for a version control tool!

I reviewed a load of version control systems then, and I selected Subversion as our tool of choice for version control.

One of the reasons for this was the price (not surprisingly) – I wanted to encourage everyone to start using version control for all sorts of thigns, not just the “softies”. But no-one was going to pay for project managers to have licenses for a paid-for tool.

The TortoiseSVN client integrated nicely with Explorer, so those who like GUIs are well catered for.

It’s a great tool, and has got really wide usage (yes, even amongst project managers).

It works well for FPGA designs too (but then they’re pretty much just text source code anyway!) – I have a flow which can set me up a new FPGA design by pulling starting points from a library space within our repository very quickly. And I have scripted the release process so that I ensure that a TAG is created with the unique buildid of my FPGAs at the same point as the zipfile I release to other developers is created.

One downside to Subversion is that when library code is pulled in through the svn:externals property the revision of that library code is not locked, so if that tag is pulled at a later date, you can find it pulling a later version of the external reference. There are thigns you can do about this, but you have to be proactive in doing them.

Merging has also been a pain – one of my FPGAs branched a lot at one stage, and Subversion at that stage had no knowledge of the previous branches. Since Subversion gained extra merging abilities, I haven’t had much opportunity to use them :(

If I were choosing again now, I would go with a distributed system – either Mercurial or Bazaar – both of which felt a bit Unixy (I’m in a minority in liking Unix-like systems :) and didn’t have Tortoise-like clients at the time we were making the decision.


At home, I started using Subversion, but when git came around, I jumped on it. I was quite entertained by Linus Torvalds comparison of git and svn – he has a certain way with words :)

Git is certainly not for everyone – it works slightly weirdly compared to Subversion (and indeed Bazaar and Mercurial as far as I can tell).

Starting off is simple, just git init. The speed is brilliant. I love being able to switch between branches instantaneously. And the merging ability is superb.

Again, nothing FPGA specific, it’s just source code.

FPGA Q&A area on stack exchange

For those who don’t know Stack Overflow, I recommend having a look round. Web Forums (Fora?) done right. A sensible and easy way of rating questions and answers and questioners and answerers. For the right subjects, a goodly group of knowledgeable people answering them… But mainly on a software theme. Sadly (for me :) FPGAs and HDLs only come up occasionally (but I try and answer when I can). Enter Stack Exchange:

The best place (IMHO) for FPGA advice currently is Usenet. comp.arch.fpga and comp.lang.vhdl have a group of experts who are happy to help with well-asked questions. But the signal:noise ratio is dropping of late. My guess is that this is because new users tend to go for web forums which tend to be single vendor and don’t have the variety of experts that the newsgroups do. And, for some reason, they seem to attract poorly phrased questions.

Stackoverflow and its sister Stack Exchange are designed for the web forum generation, and do it well.

There’s a group up for creation on stack exchange related to FPGAs, but it needs a bunch more people to express interest before it can come into existence. I’d like to see it flourish, and if it comes to reality, I’ll be checking the questions regularly, and maybe posting a few of my own.

(Mind, I’ll always hang out on comp.arch.fpga as well though)