Asymmetric Multiprocessing on ZYNQ ZC702 board with EdkDSP Accelerators for Xilinx Vivado 2013.4 Design Flow
This application note describes the asymmetric multiprocessing design (AMP) based on the Xilinx application note XAPP1093. The AMP design is ported from ISE 14.5 design flow to the Xilinx Vivado 2014.3 and SDK 2014.3 design flow. The ARM Cortex A9 processor works together with the MicroBlaze processor, sharing the terminal and block ram. Both processors execute program from the same external DDR3 memory. The MicroBlaze processor is controlling 4 EdkDSP floating point accelerators. Each accelerator is organised as 8xSIMD reconfigurable data path, controlled by the PicoBlaze6 controller.
This evaluation package is provided by UTIA for the Xilinx ZC702 designs with AXI bus. This application note explains how to install and use the demonstrator on Windows7, (32 or 64 bit) and the Xilinx ZC702 board. These key features are demonstrated:
- Implementation of adaptive acoustic noise cancellation on 1 of 4 accelerators is computing the recursive adaptive LMS algorithm
for identification of regression filter with 2000 coefficients in single precision floating point arithmetic with sustained performance
- 632 MFLOP/s on the 100 MHz EdkDSP
- 146 MFLOP/s on the 666 MHz ARM Cortex A9 (with the vector floating point unit)
- 8 MFLOP/s on the 100 MHz MicroBlaze processor with the floating point HW unit
- The EdkDSP accelerators can be reprogrammed by the firmware. The programming is possible in C with the use of the UTIA EDKDSP C compiler. Accelerators can be programmed with two firmware programs. Designs can swap in the real time the firmware in only few clock cycles in the runtime.
- The alternative firmware can be downloaded to the EdkDSP accelerators in parallel with the execution of the current firmware. This is demonstrated by swap of the firmware for the FIR filter room response to the firmware for adaptive LMS identification of the filter coefficients in the acoustic noise cancellation demo.
- The EdkDSP accelerator is providing single-precision floating point results bit-exact identical to the reference software implementation running on MicroBlaze with the Xilinx HW single precision floating point unit.
- The 100 MHz 8xSIMD EdkDSP accelerator is 4,3x faster than the 666 MHz ARM Cortex A9 (with the vector processing unit) and 79x faster than computation on performance optimized 100 MHz MicroBlaze with HW floating point unit, in the presented case of the 2000 tap adaptive LMS filter.
- The floating point 2000 tap coefficients FIR filter (acoustics room model) is computed by single 100 MHz (8xSIMD) EdkDSP accelerator with the floating point performance 1007 MFLOP/s. The peak performance (only theoretical) of the single 100 MHz (8xSIMD) EdkDSP accelerator is 1,6 GFLOP/s.
- The peak performance of four 100 MHz (8xSIMD) EdkDSP accelerators implemented in this demo design is 6,4 GFLOP/s (this is only theoretical, peek figure).
- This evaluation package presents two (8xSIMD) EdkDSP accelerator families: one family without pipelined floating point divider data path and one family with a single pipelined floating point divider data path. The members of both families differ by size and by supported vector floating point operations.
- The floating point applications can be scheduled inside of the EdkDSP accelerator by the Xilinx PicoBlaze6 processor. Each firmware program has maximal size of 4096 (18 bit wide words).
What is included
The asymmetric multiprocessing on ZYNQ (AMP) with the EdkDSP platform evaluation package contains these deliverables for the Windows 7 (32 or 64bit):
- 8 evaluation versions of AMP designs. Each design contains one used ARM Cortex A9 processor core, one MicroBlaze and four instances of the EdkDSP accelerators with 8xSIMD floating point data paths with AXI-lite bus. (ARM 666 MHz, MicroBlaze 100 MHz, Accelerators 100 MHz) Designs are compiled in Xilinx Vivado 2013.4
- UTIA is providing source code for the demo applications and SW projects for the Xilinx SDK 2013.4. These source code projects are compiled with the UTIA library libwal.a serving for the EdkDSP communication.
- The included evaluation versions of the UTIA EdkDSP accelerators have HW limitation of maximal number of performed vector operations.
- The UTIA EdkDSPC C compiler is provided as 3 executable applications for Ubuntu in the VMware Player.
- The firmware is also provided in format of binary files to enable testing of accelerators without C compiler.
- Partners of the Artemis EMC2 project  can get from UTIA the Vivado 2013.4 HW design projects with the evaluation versions of the EdkDSP accelerators (in the Vivado 2013.4 IP netlist format) for free. See chapter 6 for specification of deliverables for the EMC2 project partners and license details.
- Release versions of AMP designs with the EdkDSP package for the Xilinx ZC702 board is offered by UTIA. All customers can order and buy from UTIA the release version of this AMP demo. It includes the Vivado 2013.4 HW design projects with the EdkDSP accelerators (in the Vivado 2013.4 IP netlist format) with main limitations removed. See sections 7 of this application note for specification of deliverables and license details.
|Evaluation version of the AMP demo on ZYNQ with UTIA EdkDSP package designed in Vivado 2013.4
|Utia_EdkDSP_Vivado_2013_4_EMC2_ZC702.pdf for licensing conditions.
|ZIP archive with precompiled Vivado 2013.4 projects demonstrating Utia_EdkDSP HW Floating-point accelerators and source code of SDK 2013.4 software projects with Utia_EdkDSP libraries.
|ZIP file: 24030318 Bytes
PDF file: 2333765 Bytes
|Xilinx Vivado 2013.4, Xilinx SDK 2013.4, Xilinx ZC702 Evaluation board
|See application note
|Functional sample (demo)