Monthly Archives: October 2011

FPGAs and ARMs – a summary

Today, I compared the new combined ARM and FPGA devices from Xilinx and Altera.

This post summarises that rather long post!

Summary

Well, there’s two interesting new series of devices. Both chip families look awesome (that’s not a word I habitually use, unlike in some parts of the internet… consider it high praise :). I foresee all sorts of unforeseen applications (if you’ll forgive the Sir Humphry-ism) enabled by the tight coupling of processor and logic. Can you choose between them? Well, Xilinx’s Zynq has more memory tightly coupled with the processors, maybe a little less on the FPGA side. Zynq also has the XADC, which shouldn’t be overlooked. A single-chip radar processor is feasible with the combination of XADC and large scratchpad. Altera have a more flexible FPGA to processor-memory interface, but Xilinx’s looks eminently good enough. Xilinx have a lot less details published as yet, so there’s no doubt more good stuff to learn from them, and Altera clearly still have things up their sleeves. I’ll update here as more information becomes available.

FPGAs considered ARM-full

Xilinx and Altera have both announced FPGAs with hard ARM processors on them. Xilinx have even got a new product famliy name (Zynq) for them. The products are potential game-changers in some applications. The combination of a high-performance application processor (or two!) tightly coupled to a large array of customisable logic, memory and DSP elements hasn’t really been done like this before. Consider: a PCIe FPGA board is 100s of microseconds or even milliseconds away from a host processor currently. With these architectures, the FPGA logic can be as little as a few dozen clock ticks (maybe a single microsecond) away. Altera and Xilinx are claiming 800MHz for the processors’ clock. For intensive applications (image-processing for instance) the algorithms you can contemplate are different to those which make sense on an Intel processor and memory system. The logic can be tightly coupled to its own memory subsystem as well as the processor shared memory, and data can to-and-fro between them with very small latency making “interactive” software/hardware acceleration a reality. So, is there any difference between them? I’ve trawled the publicly available information on both platforms (which is not overly detailed as yet) to see what I can glean.

Processor system

Both are using a dual-core Cortex-A9 with NEON extensions – a monster of an embedded processor. In raw clock terms it’ll be 5-10x faster (both vendors claim 800MHz) than a soft-core. There’s double-precision floating-point in the main core and a vectorised SIMD engine for DSP assistance. So, let’s go a bit deeper and compare some more gory details:

Memory

  • Altera: L1 2x32K per core, L2 512K shared, 64K RAM
  • Xilinx: L1 2x32K per core, L2 512K shared, 256K RAM Very similar – but more tightly-coupled RAM for Xilinx. This feels very significant, particularly in data-intensive applications. And I can’t help feeling these chips are not right unless you have a data-intensive application in mind!

Hard peripherals

  • USB OTG: Both have 2 ports
  • Gigabit Ethernet: Both have 2 ports with support for IEEE1588 timestamping (Altera also claim a Low-Power Idle mode)
  • Controller Area Network (CAN): both sport 2 ports, useful for automotive and industrial networking.
  • IIC: Xilinx have 2 on Zynq, Altera have 4
  • SPI: 2 on each
  • UART: 2 on each
  • SD/SDIO is supported by both vendors – Altera state they can boot from SD.
  • NAND flash – 8 bit for Altera, Xilinx not specified as yet. Xilinx’s static memory interface also supports NOR flash, which Altera are not going to. They say you can build your own in the fabric, which is fair comment – I’m not sure how much use parallel NOR flash will get when NAND and QSPI are there.
  • Quad-SPI is available on both devices.
  • both have an array of timers and GPIOs also.

Another significant difference: Xilinx also have an ADC on chip (XADC) which has been in their high-end families for system management for quite a long time. No evidence of Altera providing anything similar. This is quite a useful addition on Xilinx’s part as most reasonably critical systems will want to monitor temperature and power rails at the very least.

To shuffle all the data to and from these peripherals, both have 8 channel DMA engines; the high performance (USB, Ethernet etc) peripherals also have their own bus mastering capabilities. Xilinx have dedicated 4 DMA channels to the FPGA fabric. Altera says their DMA and fabric are connected, but nothing more specific. The peripherals on both vendors’ devices are wired to pins through a big multiplexer. The implication is that you can’t use all of the peripherals at the same time, although it looks like some of them can also be routed through the FPGA fabric to other IOs. The high-speed ports are more restricted on their pin options. On Altera’s version they have documented the various options – one obvious niggle is that one of the USB ports shares pins with an Ethernet port. I guess for most use cases it’s either two Ethernets or two USBs that are needed, rarely 2 of both. But I imagine that’s one of the ports that can’t be sent through the fabric, as ULPI is a bit special.

SDRAM controller

Both vendors have hard DDR controllers with built-in bandwidth management of the array of ports – something quite costly to build into a soft-core memory controller. Both vendors support DDR2, DDR3 and LPDDR2. Altera’s controller also supports LPDDR1. Altera claim support for ECC on both 16- and 32-bit widths. Xilinx aren’t saying at the moment. The SDRAM controllers have many connections to the FPGA fabric, as you’d hope: Altera have sufficient wiring to the FPGA logic to make 3 or 4 bidirectional ports (depending on bus interface) or 1 very wide port (256-bit!) in each direction, or various combinations in between. Xilinx simply have 4 64-bit ports to the FPGA fabric.

Booting

The phrase used in Xilinx’s white paper is “processor-centric”. The Zynq devices are definitely being positioned by Xilinx as completely different beasts to normal FPGAs – hence the family gets its own name. This is a quite clearly a processor with an FPGA on the side. Zynq’s processor boots before the FPGA and then you use the processor to configure the FPGA. Altera are selling their family as more of a middle-ground “FPGA+processor on the same chip”, with boot-flexibility being part of their message. Either the processor or the FPGA part can boot first, with the first up configuring the other part. The processor can boot from QSPI, SD or NAND flash. The FPGA boots (well, OK, configures) with the usual traditional modes (parallel, serial) as well as PCIe – or presumably from the processor system. Personally, I like the Zynq approach – I want to forget the FPGA until I need it.

FPGA fabric

To get data to and from the fabric, Altera have 2 ports mastering to the FPGA (one fast, one slow) and 1 master from the FPGA. In addition there are the memory ports mentioned above. Also worth noting is that in the larger devices 1-3 more hard memory controllers connected directly to the FPGA fabric. Xilinx have 2 ports in each direction between the processor and FPGA and 4×64 bit ports from FPGA to memory. Any further memory interfaces will have to be built in the fabric, although even the low-end devices get the benefit of the Series 7 IOs, which means the PHY interface requires less heroic use of LUTs as delay lines to match DDR timings.

On the logic side, it looks like Altera are using the same adaptive logic module (ALM) – an 8-input fracturable LUT+4FFs+sundry carrychains and muxes – in both Cyclone V and Arria V: from 25K to 460K LEs across 5 family members. (Those must be marketing LEs, not actual ALMs!) Xilinx have a configurable logic block (CLB) – consisting of 8 6-input (somewhat-split-able) LUTs and 16FFs – again the same in both Artix-based and Kintex-based chips. They are claiming 30K to 235K LEs – again those must be marketing numbers. Those numbers are not directly comparable, but it looks like Altera’s biggest device may be significantly larger than Xilinx’s largest. Both vendors are offering ~200KB to ~2MB of FPGA-based memory (up to nearly 3MB at the to-end of Altera’s offering). Yes, those are mega-bytes, not the usual megabits that FPGAs used to get built with!

Summary

I’ve summarised all this in another post!

—| Edit —|

And a follow-up here

Me: One, Tiny pieces of plastic: Nil

For want of a better place to log how to get at the dishwasher pump next time it gets stuck with a tiny piece of plastic blocking the impeller, I’m sticking it here.

In case it helps anyone else it’s a Bosch Classixx (no idea what model no, bought about 2004 IIRC). Here are the steps:

* Turn it on its side (right hand side looking at the door – the one nearest the programme dial)
* remove the bottom (remove two screws and need a flat screwdriver to prise it a bit) – it has an earth wire attached, so don’t yank it completely away
* Don’t lose the bit of polystyrene from the anti-flooding switch!
* Follow the drain hose (that goes to the u-bend under the sink) back to the pump
* Squeeze the clips and rotate the pump, it’ll come off fairly easily
* remove tiny piece of debris of some sort.
* Put it all back together again.

Is this the modern equivalent of going out hunting sabre-tooth tigers? Probably not :)