

**Application Note** 



# DTRiMC tool for TE0820-02-3CG-1E module on TE0701-06 carrier board

Jiři Kadlec, Zdeněk Pohl, Lukáš Kohout kadlec@utia.cas.cz xpohl@utia.cas.cz kohoutl@utia.cas.cz

# **Revision history**

| Rev.   | Date      | Author    | Description                      |
|--------|-----------|-----------|----------------------------------|
| 0      | 8.04.2020 | J. Kadlec | Initial draft                    |
| 1<br>2 | 8.07.2021 | J. Kadlec | DTRIMC tool for TE0820-02-3CG-1E |

# **Table of Contents**

| 1  | DTRIMC tool TE0820-02-3CG-1E                                            | 1  |
|----|-------------------------------------------------------------------------|----|
| 2  | 8xSIMD FP03x8 floating point accelerators for ZU03-CG-1E                |    |
| 3  | Programming of 8xSIMD FP03x8 floating point accelerators                |    |
| 4  |                                                                         |    |
| 5  | Power consumption                                                       | 11 |
| 6  | ILA – In-circuit Logic Analyzer                                         |    |
| 7  | License                                                                 |    |
| 8  |                                                                         |    |
| U  | Reconfiguration of accelerator by change of firmware                    |    |
|    | Reconfiguration of accelerator by temporary change of firmware          |    |
| 9  |                                                                         |    |
| •  | PPENDIX - Confidence test                                               |    |
|    | Compilation and debug of projects from source code                      |    |
|    | DEBUG of SW application from Xilinx SDK 2018.2                          |    |
|    | Guide for compilation and use of C MEX functions in Scilab              |    |
| 10 | APPENDIX – DTRiMC tool guidelines                                       |    |
|    | Guide for compilation of HW in the DTRiMC tool                          |    |
|    | Guide for configuration and compilation of PetaLinux in the DTRiMC tool |    |
|    | Guide for configuration and compilation of Debian OS in the DTRIMC tool |    |
|    | Guide for creation of SDSoC platform OS in the DTRiMC tool              |    |
|    | Guide for creation of shared library and HW kernel in the DTRiMC tool   |    |
|    | Guide for retargeting of the DTRiMC tool for another device/module      |    |
| Di | sclaimer                                                                |    |
|    |                                                                         |    |

# **Table of Figures**

| Figure 1: Design Time Resource integration of Model Composer DTRiMC tool              |    |
|---------------------------------------------------------------------------------------|----|
| Figure 2: Sobel edge detection and two FP03x8 accelerators in ZU03-CG-1E device       | 3  |
| Figure 3: FP03x8 accelerator for the ZU03-CG-1E device                                | 4  |
| Figure 4: Internal block rams of accelerators.                                        | 5  |
| Figure 5: Floating point functions present in all accelerators {10 or 20 or 30 or 40} | 7  |
| Figure 6: Specific functions present only in some versions accelerators               | 7  |
| Figure 7: Structure of the 128 bit wide VLIW program instruction                      | 8  |
| Figure 8: Performance results of sobel_all.elf application.                           | 10 |
| Figure 9: Systems running sobel_all application.                                      | 12 |
| Figure 10: Systems running sobel_all application, Full HD HDMI desktop, Scilab        | 12 |
| Figure 11: Instruction vz2a.                                                          | 13 |
| Figure 12: Instruction vz2a detail                                                    |    |
| Figure 13: HW accelerated Matrix computation in ILA and chip temperature dashboard    | 14 |
| Figure 14: Select debug workspace                                                     | 20 |
| Figure 15: Define the environment variable.                                           | 21 |
| Figure 16: Test connection to Linux TCF Agent                                         | 22 |
| Figure 17: C MEX compilation in Arm Scilab. Remote X11 Desktop                        |    |
|                                                                                       |    |

# Acknowledgement

This work has been partially supported from project FitOptiVis, project number ECSEL 783162 and the corresponding Czech NFA (MSMT) institutional support project 8A18013.



# 1 DTRiMC tool TE0820-02-3CG-1E

This application note describes evaluation package for the Design Time Resource integration of Model Composer DTRiMC tool. See Figure 1. It serves for integration of 8xSIMD, FP03x8, floating-point, run-time-reconfigurable accelerators for Zynq Ultrascale+ TE0820-02-3CG-1E module [1] on TEBF0820 carrier board [2].



Figure 1: Design Time Resource integration of Model Composer DTRiMC tool.

This application note describes UTIA support for the evaluation version of two serial connected 8xSIMD FP03x8 floating-point, run-time-reconfigurable accelerators for Zynq Ultrascale+ TE0820 module [1] on TE0701 carrier board [2]. The TE0820 module and TE0701 carrier board are designed and manufactured by the company Trenz Electronic [1].



1/34

http://sp.utia.cz

Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i. SW developer can program application without SDSoC 2018.2 compiler license. The standard g++ compiler and "make" can be also used in Win 10 PC or in Debian OS on Arm.

The two serial connected FP03x8 accelerators and the HW data movers supporting data communication are represented for the SW developer as shared C++ library with simple SW API. The API is identical for several alternatives of HW data movers.

The evaluation package provides several pre-compiled HW designs represented in form of SD-cards containing these designs and API interface for SW developer in form of shared Debian libraries for Arm host processor.

The SW developer can program the Arm host application in standard gcc or g++ compiler and "make" can be used for compilation of host applications directly on the embedded Zynq Ultrascale+ ZU03-CG-1E based system.

## 2 8xSIMD FP03x8 floating point accelerators for ZU03-CG-1E

The FP03x8 HW accelerators serve for run-time reprogrammable 8xSIMD single precision floating point computations. The internal structure of FP03x8 accelerators is described in Figure 3.

Input:

- Program firmware data received via AXI stream interface from Arm processor.
- Configuration Write registers for scalar control received via AXI-lite interface from Arm processor.
- Floating point single precision data received via AXI stream interface from Arm processor.

Output:

- Registers indicating end of program accessible to Arm processor via AXI-lite.
- Floating point single precision result data accessible via AXI stream interface for the Arm processor.

Connectivity:

- AXI stream data/program input from ARM to HW accelerator with input FIFO 1024x32. The side channel indicates the last transferred word sent to the component via the DMA transaction from ARM processor.
- AXI stream data/program output from HW accelerator to ARM. The output side channel indicates the last transferred word sent from the component to Arm processor.
- AXI-lite input/output configuration registers.

All designs present in this evaluation package contain two, serial connected FP03x8 accelerators in the programmable logic part of the device. See Figure 2.

The HW data movers supporting the data communication are represented for the SW developer as shared C++ library with simple SW API. The API is identical for several alternatives of HW data movers.





The evaluation package includes 8xSIMD FP32 accelerators with HW license enabling only restricted number of operations. If these licensed operations are all used, user has to reset complete system. This will enable to use the licensed count of operations again.

Please contact UTIA (<u>kadlec@utia.cas.cz</u>) if you are interested in licensing of 8xSIMD accelerators HW IPs without this restriction.



Figure 2: Sobel edge detection and two FP03x8 accelerators in ZU03-CG-1E device





Configure regs. Status regs. Program regs.

Figure 3: FP03x8 accelerator for the ZU03-CG-1E device

### **Accelerator Interfaces**

Type of interface:

- Data streaming I/O: AXI-S 32 bit
- Firmware program VLIW 128 bit
- Configuration I/O: AXI-lite 32 bit
- 4x ARM A53 system clock

ZU03-CG-1E ZU03-CG-1E ZU03-CG-1E ZU03-CG-1E

Device:

Clock: 240 MHz 240 MHz 100 MHz 1200 MHz



### Memory of the Accelerator in the programmable logic part of the device

- 12 dual-ported 1024x64 bit BRAMs Blocks (0 .. 11) are used as:
  - 24 Data RAMs organised as 1024x32 bit blocks: A1..A8, B1..B8 and Z1..Z8.
- 2 dual-ported 512x64 bit BRAMs Blocks (12, 13) are used as
  - 4 Program RAMs organised as 512x32 bit blocks: P1..P3

| SIMD<br>A<br>32 bit | Block<br>64<br>bit | SIMD<br>B<br>32 bit | Block<br>64<br>bit | SIMD<br>Z<br>32 bit | Block<br>64 bit | VLIW<br>Prog | Block<br>64 bit |
|---------------------|--------------------|---------------------|--------------------|---------------------|-----------------|--------------|-----------------|
| A1                  | 0                  | B1                  | 4                  | Z1                  | 8               | P1           | 12              |
| A2                  |                    | B2                  |                    | Z2                  |                 | P2           |                 |
| A3                  | 1                  | B3                  | 5                  | Z3                  | 9               | Р3           | 13              |
| A4                  |                    | B4                  |                    | Z4                  |                 | P4           |                 |
| A5                  | 2                  | B5                  | 6                  | Z5                  | 10              |              |                 |
| A6                  |                    | B6                  |                    | Z6                  |                 |              |                 |
| A7                  | 3                  | B7                  | 7                  | Z7                  | 11              |              |                 |
| A8                  |                    | B8                  |                    | Z8                  |                 |              |                 |

Figure 4: Internal block rams of accelerators.

### **AXI-lite Registers**

| Name: | Data: Description:                                               |
|-------|------------------------------------------------------------------|
| reset | 1 bit: "1" Reset AXI lite Registers; "0" NOP                     |
| we    | 16 bit: Write from stream to block(s) (bit 0 13)                 |
| baddr | 10 bit: Stream will rd/wr from addr=baddr                        |
| bram  | 5 bit: Read from Block 0 13 to Stream; 16 for: Move-data-through |
| paddr | 9 bit: Program start address                                     |
| pstep | 9 bit: Program stop address                                      |
| go    | 1 bit: "1" go from paddr to pstep; "0" NOP                       |
| hi    | 12 bit: SubBank prog. mod: 00zz00bb00aa (bits)                   |
| done  | 8 bit: Read only. "0" => Instruction runs                        |
| pdone | 1 bit: Read only. "0" => Program runs                            |



#### Parameters of stream data interfaces from/to ARM DDR memory

- Maximal supported stream data size is 2048 x 32 bit
- Data streaming can have variable size:
  - Min: 2 x 32 bit
  - Max 2048 x 32 bit
- Mode of operation (same for Data and for Program):
  - Write to a block: It is defined by we (from address defined in baddr)
  - Broadcast Write: It is defined by setting more bits in we (from address defined in baddr)
  - Read from block: It is defined by setting bram (from address defined in baddr)
  - Write or Broadcast Write and Read in parallel: It is defined by setting more bits in we and by setting bram (from address defined in baddr)
  - Send data through the Accelerator: It is defined by setting we = 0 and by setting bram =16;

#### Design-time support

These data streaming HW data movers are supported:

- Zero Copy HW data mover without DMA
- DMA HW data mover with DMA
- SG DMA HW data mover with SG DMA and interrupts

The design time support is based on the Xilinx SDSoC 2018.2 system level compiler.

#### Run-time support

- Data can be written to and/or read from the accelerator by user Arm app.
- Firmware can be written to and/or read from the accelerator user Arm app.
- Computation & data streaming can be performed in parallel.

#### Versions of accelerators:

• **FP03x8**\_capabilities capabilities = 10, 20, 30 or 40





| SIMD OP code (deo | c) | 8xSIMD Floating Point Operation Description                                            |
|-------------------|----|----------------------------------------------------------------------------------------|
| VVER              | 0  | Return capabilities of the accelerator and status of license                           |
| VZ2A              | 1  | 8xSIMD vector copy a <sub>m</sub> [i] <= z <sub>m</sub> [j]; m=18                      |
| VB2A              | 2  | 8xSIMD vector copy a <sub>m</sub> [i] <= b <sub>m</sub> [j]; m=18                      |
| VZ2B              | 3  | 8xSIMD vector copy b <sub>m</sub> [i] <= z <sub>m</sub> [j]; m=18                      |
| VA2B              | 4  | 8xSIMD vector copy b <sub>m</sub> [i] <= a <sub>m</sub> [j]; m=18                      |
| Auto-increments:  |    | Example: for (n=0;n<=CNT;n++){i=i+B_INC;                                               |
| VADD              | 5  | 8xSIMD vector add z <sub>m</sub> [i] <= a <sub>m</sub> [j] + b <sub>m</sub> [k]; m=18  |
| VADD_BZ2A         | 6  | 8xSIMD vector add $a_m[i] \le b_m[j] + z_m[k]; m=18$                                   |
| VADD_AZ2B         | 7  | 8xSIMD vector add b <sub>m</sub> [i] <= a <sub>m</sub> [j] + z <sub>m</sub> [k]; m=18  |
| Auto-increments:  |    | Example: for (n=0;n<=CNT;n++){i=i+B_INC;                                               |
| VSUB              | 8  | 8xSIMD vector sub z <sub>m</sub> [i] <= a <sub>m</sub> [j] - b <sub>m</sub> [k]; m=18  |
| VSUB_BZ2A         | 9  | 8xSIMD vector sub a <sub>m</sub> [i] <= b <sub>m</sub> [j] - z <sub>m</sub> [k]; m=18  |
| VSUB_AZ2B 1       | 10 | 8xSIMD vector sub b <sub>m</sub> [i] <= a <sub>m</sub> [j] - z <sub>m</sub> [k]; m=18  |
| Auto-increments:  |    | Example: for (n=0;n<=CNT;n++){i=i+B_INC;                                               |
| VMULT 1           | .1 | 8xSIMD vector mult z <sub>m</sub> [i] <= a <sub>m</sub> [j] * b <sub>m</sub> [k]; m=18 |
| VMULT_BZ2A 1      | .2 | 8xSIMD vector mult a <sub>m</sub> [i] <= b <sub>m</sub> [j] * z <sub>m</sub> [k]; m=18 |
| VMULT_AZ2B 1      | .3 | 8xSIMD vector mult b <sub>m</sub> [i] <= a <sub>m</sub> [j] * z <sub>m</sub> [k]; m=18 |
| Auto-increments:  |    | Example: for (n=0;n<=CNT;n++){i=i+B_INC; j=j+A_INC; k=k+Z_INC;}                        |

Figure 5: Floating point functions present in all accelerators {10 or 20 or 30 or 40}.

| SIMD OP code (dec)   | 8xSIMD Floating Point Operation Description                                                                             |  |
|----------------------|-------------------------------------------------------------------------------------------------------------------------|--|
| VPROD 14             | 8xSIMD vector products.                                                                                                 |  |
|                      | z <sub>m</sub> [i] <= a <sub>m</sub> '[jj+nn]*b <sub>m</sub> [kk+nn];                                                   |  |
| FP01, FP03: 30,40    | m=18; nn range 0255                                                                                                     |  |
| VMAC 15              | 8xSIMD vector MACs.                                                                                                     |  |
|                      | z <sub>m</sub> [ii+nn] <= z <sub>m</sub> [ii+nn] + a <sub>m</sub> [jj+nn] * b <sub>m</sub> [kk+nn];                     |  |
| FP01, FP03: 20,30,40 | m=18; nn range 010                                                                                                      |  |
| VMSUBAC 16           | 8xSIMD vector MSUBACs.                                                                                                  |  |
|                      | z <sub>m</sub> [ii+nn] <= z <sub>m</sub> [ii+nn] - a <sub>m</sub> [jj+nn] * b <sub>m</sub> [kk+nn];                     |  |
| FP01, FP03: 20,30,40 | m=18; nn range 010                                                                                                      |  |
| LONG_VPROD 17        | Single long vector product .                                                                                            |  |
|                      | z <sub>m</sub> [i] <= ( (a <sub>1</sub> '[jj+nn]*b <sub>1</sub> [kk+nn]+a <sub>2</sub> '[jj+nn]*b <sub>2</sub> [kk+nn]) |  |
|                      | + (a <sub>3</sub> '[jj+nn]*b <sub>3</sub> [kk+nn]+a <sub>4</sub> '[jj+nn]*b <sub>4</sub> [kk+nn]) )                     |  |
|                      | +                                                                                                                       |  |
|                      | ( (a <sub>5</sub> '[jj+nn]*b <sub>5</sub> [kk+nn]+a <sub>6</sub> '[jj+nn]*b <sub>6</sub> [kk+nn])                       |  |
|                      | + (a <sub>7</sub> '[jj+nn]*b <sub>7</sub> [kk+nn]+a <sub>8</sub> '[jj+nn]*b <sub>8</sub> [kk+nn]) );                    |  |
| FP01, FP03: 40       | m=18; nn range 0255                                                                                                     |  |
| VDIV 20              | 8xSIMD vector Division.                                                                                                 |  |
| FP03: 10,20,30,40    | 10,20,30,40 $z_m[i] \le a_m[j] / b_m[k];$                                                                               |  |
| FP01: not supported  | m=18                                                                                                                    |  |
| Auto-increments:     | Example: for(n=0;n<=CNT;n++){i=i+Z_INC; j=j+A_INC; k=k+B_INC;}                                                          |  |

Figure 6: Specific functions present only in some versions accelerators.



7/34

http://sp.utia.cz

Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i.

# 3 Programming of 8xSIMD FP03x8 floating point accelerators

Host arm application can form the VLIW program instructions in DDR4 memory as two 64bit words. Components of the low 64 bit word are marked by light green background. Components of the high 64 bit word components are marked by light blue. See Figure 7. Host arm application can form a sequence of such VLIW program instructions in DDR4 memory and write them to one of two accelerator program memories.

| FP01, FP03         | Size     | VLIW: hi <mark>lo</mark>   | Description                                 |
|--------------------|----------|----------------------------|---------------------------------------------|
| [not_used]         | [8bit]   | <mark>8 bit [6356</mark> ] | Not used by FP01 or FP03                    |
| [not_used]         | [8bit]   | 8 bit [5548]               | Not used by FP01 or FP03                    |
| [0,Z_MEM_SECTION]  | [0,2bit] | 8 bit [4740]               | Z_MEM SECTION (03)                          |
| [CNT]              | [8bit]   | 8 bit [3932]               | Number of 8xSIMD steps (0 255)              |
| [Z_INC]            | [8bit]   | <mark>8 bit [3124]</mark>  | Auto increment of Z address (0 255)         |
| [Z_MEM_SADDR]      | [8bit]   | 8 bit [2316]               | Set Z address after auto-increment overflow |
| [Z_MEM_ADDR]       | [8bit]   | <mark>8 bit [1508]</mark>  | Initial Z address                           |
| [B_INC]            | [8bit]   | 8 bit [0700]               | Auto increment of B address (0 255)         |
| [OP]               | [8bit]   | <mark>8 bit [6356]</mark>  | 8xSIMD vector operation                     |
| [0, B_MEM_SECTION] | [0,2bit] | <mark>8 bit [5548]</mark>  | B_MEM SECTION (03)                          |
| [0, A_MEM_SECTION] | [0,2bit] | <mark>8 bit [4740]</mark>  | A_MEM SECTION (03)                          |
| [B_MEM_SADDR]      | [8bit]   | <mark>8 bit [3932]</mark>  | Set B address after auto-increment overflow |
| [B_MEM_ADDR]       | [8bit]   | <mark>8 bit [3124]</mark>  | Initial B address                           |
| [A_INC]            | [8bit]   | <mark>8 bit [2316]</mark>  | Auto increment of A address (0 255)         |
| [A_MEM_SADDR]      | [8bit]   | <mark>8 bit [1508]</mark>  | Set A address after auto-increment overflow |
| [A_MEM_ADDR]       | [8bit]   | <mark>8 bit [0700]</mark>  | Initial A address                           |

Figure 7: Structure of the 128 bit wide VLIW program instruction.

Sequences of VLIW instructions present in the accelerator program memory can be autonomously executed by the accelerator (see Figure 3).

User defines start address in **paddr** AXI-lite register and end address in **pstep** AXI-lite register.

User requests execution of the sequence of VLIW operations by setting the single bit AXI-lite register go = 1. The accelerator executes the VLIV sequence from paddr to pstep.

State of the execution can be tested by the host application by reading of the AXI Read only register **pdone.** If **pdone**==0, the sequence of VLIW instructions is being executed. If **pdone**==1, the sequence of VLIW instructions is completed.

Finally the host application has to set the single bit AXI-lite register back to go = 0.

The host application can also copy data/program to/from the accelerator while the sequence of VLIW instructions is being executed on current internal data and internal program of the accelerator.



This parallel copy data/program to/from the accelerator (while accelerators executes its sequence of VLIW instructions) requires to avoid race-condition caused by parallel writing to the same memory address both by accelerator and by parallel copy of data defined by the user in the same time instance. This has to be avoided by the user application, by writing only to accelerator data which are not used for writing by the currently executed sequence of VLIW instructions.

The sequence of VLIW instructions can be also reduced to a single VLIW instruction. The **paddr** and **pstep** registers are set to an identical program address in such case.

This evaluation package contains example of the combination of HW accelerated edge detection video processing of FULL HD 1080p60 video input with FULL HD HDMI video output and acceleration of floating point matrix multiplications by two serial connected accelerators.

The HW Sobel filter for edge detection is precompiled into the programmable logic and present in the shared libraries together with the two serial connected accelerators.

The internal structure of the Zynq Ultrascale+ SoC with two serial connected accelerators and HW accelerator for Sobel filter FP03x8 accelerator can be seen in Figure 3.

## 4 C++ evaluation project

The PL part of the ZU03-CG-1E device contains two evaluation versions of the 8xSIMD runtime-reprogrammable single-precision-floating-point HW accelerator FP03x8 organized as 1x2 serial connected accelerators. Released evaluation HW platforms are exported for linking with the arm A53 host programs. Platforms have these properties:

 Sobel filter Edge detection with DMA data mover with 4x 8xSIMD HW acclerators Directory (C++): fp03x8\_v26x2s\te02\_dma\_v26x2\_sw\ SW C++ project: sobel\_all SW C++ project: hdmi\_start Shared C++ library: ./Debug/sd\_card/libsobel\_dma\_v26x2\_hw.so Shared C++ library: ./Release/sd\_card/libsobel\_dma\_v26x2\_hw.so

Debug and Release version of shared libraries Debian Stretch 9.8 OS for the ZU03-CG-1E device for the SDK 2018.2 C++ SW flow (g++ compiler). PL is working with 1x2 evaluation versions of FP03x8 HW accelerators with zero copy data movers.

Test of video processing demos requires to connect Full HD HDMI video source (PC with Full HD 60 FPS or video cammera) to the Full HD HDMI input of the FMC Imageon card and also to connect as output the Full HD HDMI display to the Full HD HDMI output of the FMC Imageon card [3]. See also detailed description of run-time support and description of board configuration in [4].

SW project **sobel\_all** demonstrates HW acceleration of two single precision floating point matrix by matrix multiplications and test of all firmware functions of accelerators. Projects test matrix multiplications and short sequences of elementary vector operations of 8xSIMD HW accelerators. The two instances of the FP03x8 accelerators on ZU03-EG-1E device are controlled by 2 SW threads. HW accelerators accelerate the SW optimized (-O3) code executed on four A53 cores running with 1.2 GHz clock.

Application hdmi\_start.elf serves for initial activation of the FullHD HDMI output (DeskTop).



Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i. http://sp.utia.cz



```
🗬 10.0.0.127 - PuTTY
                                                                            Х
Memory mapped at address 0x7fb3d77000.
Common Init OK
TX Init ... OK
HDMII cable plugged
ADV7611 Video Input Information
        Video Input = HDMI, Progressive
                        = 8 bits per channel
        Color Depth
                        = hav=1920, hfp=88, hsw=44(hsp=1), hbp=148
        HSYNC Timing
                      = vav=1080, vfp=04, vsw=05(vsp=1), vbp=036
        VSYNC Timing
        Video Dimensions = 1920 x 1080
                       = 148.500000 MHz
        Pixel Clock
        Frame rate
                         = 60.000000 FPS
Initialize VDMA ... RX Init ... OK
Starting VDMA parking mode ...
Parking started
FPS:
       89.55 mmultf1 sw OK
                                   807.279 MFLOPs [====
                                                                          1
FPS:
       78.01 mmultf1 8B OK
                                  4862.718 MFLOPs [======== ]
FPS: 78.12 mmultf1 4B OK
                                  5039.919 MFLOPs [==========
                                                                         =1
FPS: 94.59 va2bf1 OK
FPS: 90.41 vaddf1 OK
                                  2978.462 MFLOPs [=========
                                                                          1
                                  2750.606 MFLOPs [=========
                                                                          1
FPS: 86.92 vaddf1 az2b OK
                                  2747.395 MFLOPs [==========
                                                                         ]
     92.68 vaddf1 bz2a OK
FPS:
                                 2773.967 MFLOPs [============
                                                                          1

      FPS:
      100.48 vb2af1
      OK

      FPS:
      100.46 vdivf1
      OK

      FPS:
      100.46 vmacf1
      OK

                                 2989.223 MFLOPs [==============
                                                                          1
                                 2754.102 MFLOPs [==========
                                                                          1
                                  706.318 MFLOPs [===
                                                                          1
FPS: 100.50 vmsubacf1 OK
                                   713.006 MFLOPs [===
                                                                          1
FPS: 100.48 vmulf1
                                  2752.408 MFLOPs [=========
                         OK
                                                                          1
FPS: 100.45 vmulf1_az2b OK
                                  2765.875 MFLOPs [=========
                                                                          1
FPS: 100.17 vmulf1 bz2a OK
                                  2764.139 MFLOPs [=========
                                                                          1
FPS: 100.49 vprodf1 OK
                                  2920.792 MFLOPs [=========
                                                                          1
FPS:
     100.50 vprods8f1 OK
                                  2910.209 MFLOPs [=========
                                                                          1
FPS: 100.47 vsubf1
                        OK
                                  2748.722 MFLOPs [=========
                                                                          1
FPS:
       99.46 vsubf1 az2b OK
                                  2744.495 MFLOPs [==========
                                                                          1
FPS:
      95.28 vsubf1_bz2a OK
                                  2765.006 MFLOPs [=========
                                                                         1
FPS: 100.48 vz2af1 OK
                                  2992.434 MFLOPs [==========
                                                                         1
FPS: 100.47 vz2bf1
                        OK
                                  2994.402 MFLOPs [==========
                                                                          1
       89.08 mmultf1_sw OK
78.45 mmultf1_8B OK
76.76 ^Cmmultf1_4B OK
     89.08 mmultf1 sw OK
                                   805.423 MFLOPs [====
FPS:
                                                                          1
                                  4869.527 MFLOPs [======== ]
FPS:
FPS:
                                   ===1
license OK
Memory device closed
Exiting ...
root@zynqmp:/boot# 📘
```

Figure 8: Performance results of sobel\_all.elf application.

# Tested Function

| mmultf1_4xB | 8x mmult use B1B4, include parallel copy of B1B4 |
|-------------|--------------------------------------------------|
| mmultf1_8xB | 8x mmult use B1B8, include parallel copy of B1B8 |
|             | SW mmult Scilab MEX style, 4 threads             |
| va2bf1      | 8x va2b                                          |
| vaddf1      | 8x vadd                                          |
|             |                                                  |



10/34



ŪTIA

| vaddf1_az2b<br>vaddf1_bz2a<br>vb2af1 | 8x vadd_az2b<br>8x vadd_bz2a<br>8x vb2a |
|--------------------------------------|-----------------------------------------|
| vdivf1<br>vmulf1                     | 8x vdiv<br>8x mul                       |
| vmulf1_az2b                          | 8x mul_az2b                             |
| vmulf1_bz2a                          | 8x mul_bz2a                             |
| vprodf1                              | 8x vprod                                |
| vprods8f1                            | 8x vprods8                              |
| vsubf1                               | 8x vsub                                 |
| vsubf1_az2b                          | 8x vsub_az2b                            |
| vsubf1_bz2a                          | 8x vsub_bz2a                            |
| vz2af1                               | 8x vz2a                                 |
| vz2bf1                               | 8x vz2b                                 |
| vmacf1                               | 8x vmac                                 |
| vmsubacf1                            | 8x vmsubac                              |

Comparison of matrix multiplication performance:

| System        | Function                                                                                                                                                  | MFLOPs              |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| ZU03-CG-1E    | 8x mmult use B1B4, include parallel copy of B1B4<br>8x mmult use B1B8, include parallel copy of B1B8<br>SW mmult Scilab MEX style, 4 threads (on 2 cores) | 5039<br>4862<br>807 |
| I7 PC 3.0 GHz | SW Ubuntu, SciLab C MEX style, 1 thread:                                                                                                                  | 1933                |

Performance results are listed in Figure 8. It is terminal output from the sobel\_all.elf application running on Arm. The FPS is value of maximal achievable FPS (sobel filter is computed in parallel to call to the eight SIMD accelerators in each frame.) The video input FPS is 60.0 FPS (see Figure 8). This is defined by the information received from the video input device. The video output is also fixed, 60 FPS and this is defined by the video output HW.

# 5 Power consumption

Power consumption is measured on input power line 12V. All power supply is derived from this single power source.

| Power consumption                                                                  | Power [W] |
|------------------------------------------------------------------------------------|-----------|
| Linux system is running with all HW interfaced by the library                      | 7,08      |
| libsobel_dma_v26x2_hw.so is present in the device. No user app.                    |           |
| As above, with user interface in Full HD HDMI desktop. See Figure 10.              | 8.16      |
| Linux system is running with all HW interfaced by the library                      | 9,24      |
| libsobel_dma_v26x2_hw.so is present in the device. SW app. <b>sobel_all.elf</b> is |           |
| running. It performs HW accelerated edge detection and scrolls through all         |           |
| tests of two 8xSIMD hw accelerators. See Figure 9.                                 |           |
| As above ( <b>sobel_all.elf</b> is running.), with user interface in Full HD HDMI  | 10.44     |
| desktop. See Figure 10.                                                            |           |



11/34





Figure 9: System is running sobel\_all application.



Figure 10: System is running sobel\_all application, Full HD HDMI desktop, Scilab.



http://sp.utia.cz

# 6 ILA – In-circuit Logic Analyzer

System created by the Design Time Resource integration of Model Composer DTRiMC tool includes HW IP of the Vivado-Lab tool 2018.2 ILA – In-circuit Logic Analyzer.

It is connected to HW 8xSIMD accelerators 0 and 1. ILA can be triggered by specific instruction and displays addresses and we signals with 250 MHz clock. It is configured to sample 1024 data. See Figure 11.



Figure 11: Instruction vz2a.

Figure 11 presents complete vz2a operation test. It is SW project vz2af0. It performs copy of 512 FP data from all Z memories of accelerators to all A memories. Copy is executed as program sequence of two VLIW instructions, each performing copy of 256 FP data. This is visible in Figure 11.

In ILA, we can zoom to see the details. See Figure 12. The we\_op\_1 == 1 and op\_1 == 1 is the trigger condition for ILA set by user. The we\_op\_1 can be seen in the first line of ILA. The address bus related to Z z\_addr\_1 starts to increment, followed by address buss related to A a\_addr\_1. The signal z\_we\_1 is set to 1 to write the data from A to Z in all 8xSIMD memories in parallel.



|                      |                     | _            |           | _      | _        | _    | _         | _         | _       | _     | _         | _          | _   | _          | _     | _   |             | _    | _            | _     | _            | _   | _            | _   | _     |
|----------------------|---------------------|--------------|-----------|--------|----------|------|-----------|-----------|---------|-------|-----------|------------|-----|------------|-------|-----|-------------|------|--------------|-------|--------------|-----|--------------|-----|-------|
| Waveform - hw_ila_   | ,1                  |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              | 3   | ? _ 🗆 |
| Q   <b>+</b>   -   t | 🕑   🕨   🗶   🔳   🕻   | € <b>@</b> , | Θ, 🔅      | I.     |          | 12   | 20 4      | F   F∈    | +       | []    |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| ILA Status: Idle     |                     | 0            |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| Name                 | Value               |              |           | lz     |          | 4    |           | 6         |         | ls    |           |            |     |            |       | 14  | . 1         |      |              |       |              |     |              |     |       |
| ₩ we_op_1            | 1                   |              |           |        |          | -    |           |           |         |       |           |            |     |            |       |     |             |      |              |       | 1            |     | 1            |     |       |
| ₩ op_1               | 01                  |              |           |        |          |      |           |           |         |       |           |            |     | d1         |       |     |             |      |              |       |              |     |              |     |       |
| t_addr_1             | 000                 |              |           | 001    | 002      | 003  | 004       | 005       | 006     | 007   | 008       | 009        | 00a | 003        | 00e   | 004 | X 00e X     | 00f  | 010          | 011   | 012          | 013 | 014          | 015 | 016   |
| 📲 a_addr_1           | 000                 |              |           | 000    |          | 001  |           | 003       | X 004   | 005   | 006       | 007        | 008 | 009        | 00a   | 005 |             | 00d  | ( 00e        | 00f   | 010          | 011 | 012          |     | 014   |
| ₩a_we_1              | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| ₩ b_we_1             | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| \lag{a} z_we_1       | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| 📲 b_addr_1           | 000                 |              | 000       | 001    | 002      | 003  | 004       | 005       | 006     | 007   | 008       | 009        | 00a | 00Ъ        | 00e   | 004 | ( 00e       | 00£  | 010          | 011   | 012          | 013 | 014          | 015 | 016   |
| ₩ we_op              | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| 📲 op                 | 01                  |              |           |        |          |      |           |           |         |       |           |            |     | <b>d</b> 1 |       |     |             |      |              |       |              |     |              |     |       |
| 📲 z_addr             | 000                 |              |           |        |          | 000  |           |           |         |       | 001       | 002        | 003 | 004        | 005   | 006 | X 007 X     | 008  | 009          | 00a   | ( ооь        | 00c | 800 X        | 00e | 00f   |
| 📲 a_addr             | 000                 |              |           |        |          |      |           |           |         |       |           |            | 001 | 002        | X 003 | 004 | X 005 X     | 006  | 007          | 008   | 009          | 00a | Х ооь        | 00c | X 00d |
| ∿a_we                | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| 16 b_we              | 0                   |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| ∿az_we               | 0                   | _            |           |        |          |      |           |           |         |       |           |            |     |            |       | ļ   |             |      |              |       |              |     |              |     |       |
| > 髦 b_addr           | 000                 | _            |           |        |          | 000  |           |           |         |       | 001       | 002        | 003 | 004        | X 005 | 006 | X 007 X     | 008  | <u>X 009</u> | 00a   | <u>х ооь</u> | 00c | <u>X 00d</u> | 00e | 100   |
|                      |                     | Tesda        | ted at: 3 | 021-7- | -02 00.  | 2.00 |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
|                      | <                   | > < ==       |           | 021-04 | 1-03 09. | 2.03 |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
|                      |                     |              |           |        |          |      |           |           |         |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| iettings - hw_ila_1  | Status - hw_ila_1 × |              |           |        | ? _      |      | Trigger S | etup - hw | /_ila_1 | × Cap | ture Setu | p - hw_ila | L1  |            |       |     |             |      |              |       |              |     |              |     | ? _   |
| ৬ 🕨 🚿                | 96                  |              |           |        |          |      | Q +       |           | Ð       |       |           |            |     |            |       |     |             |      |              |       |              |     |              |     |       |
| Core status 🔵        |                     |              |           |        |          | ^    | Name      | Opera     | ator    |       | Radix     |            |     | Value      |       | F   | Port        | Cor  | mparator     | Usage |              |     |              |     |       |
| core status 😈 🛛      |                     |              |           |        |          |      | op_1      |           |         | ~     | [U]       |            | ~   |            |       | 1.1 | probe14[7:0 | 1 of | f1           |       |              |     |              |     |       |
|                      | Window 1 of 1       |              |           |        |          |      | we_op_1   |           |         | ~     | (B)       |            | ~   | 1          |       | ~   | probe2[0]   | 1 of | f1           |       |              |     |              |     |       |

Figure 12: Instruction vz2a detail.

Figure 12 also demonstrates the relation of both observed accelerators. Both accelerators compute the vz2a instructions with time shift of 7 clock cycles. This time shift is given by the shifted start due to the sequential execution of ARM instructions activating the computation in 8xSIMD HW accelerators. We see that the accelerator with \*\_1 variables was started by ARM program first.



Figure 13: HW accelerated Matrix computation in ILA and chip temperature dashboard.

The instantiated ILA helps mainly in analysis and debug of more complex 8xSIMD HW accelerator program sequences like the matrix multiplication. See Figure 13.



# 7 License

This evaluation package of the Design Time Resource integration of Model Composer DTRiMC tool includes precompiled system with two **evaluation versions** of the accelerator:

• **FP03x8** with **capabilities = 40** described in Figure 5 and Figure 6.

The license for the evaluation versions of accelerators enables execution of certain large number of floating point operations before it expires. If this happens, the board has to be switched off and switched on again to restart the evaluation license again.

The commercial version of accelerators is available in UTIA. UTIA offers this license on commercial base. Contract with UTIA is required. For information about details of the commercial license write to Jiri Kadlec <u>kadlec@utia.cas.cz</u>.

# 8 Conclusion

This application note describes evaluation system precompiled by the Design Time Resource integration of Model Composer DTRiMC tool developed in the frame of FitOptiVis project. See Figure 1.

DTRiMC tool serves for integration of two 8xSIMD, FP03x8, floating-point, run-timereconfigurable accelerators for Zynq Ultrascale+ TE0820-02-03CG-1E module [1] on TEBF0808 carrier board [2].

DTRiMC tool requires the 8xSIMD, FP03x8 accelerator as input. The the 8xSIMD, FP03x8 accelerator is not included in the evaluation package. UTIA offers this license on commercial base.

To test the SW application and integration of the eight 8xSIMD, FP03x8 accelerators, UTIA provides for free download the evaluation package with precompiled evaluation version of 8xSIMD, FP03x8 accelerators in a bitstream. The evaluation package presents these system properties:

The run-time reconfigurable floating point accelerators for the ZU03-CG-1E device have been designed and realized with respect to the following considerations and requirements:

- Software utilizing the accelerator can be developed also directly on the embedded system, using the C compiler (gcc) or C++ compiler (g++) present in the Debian Stretch 9.8 operating system running on the Arm A53 device.
- The entire HW platform with 1x2 FP32x8 SIMD HW accelerators is provided in form of a shared library. The provided shared library API is compatible with the standard gcc and g++ based compilation flows. Scripts are auto generated for the standard Debian OS "make" of the embedded system.
- The 1x2 FP32x8 SIMD HW hardware of floating point accelerators is fixed. Reconfiguration is performed by reprogramming the firmware code. The firmware defines what sequences of operations will do the programmable finite state machine (FSM) inside the accelerator.
- 4. Data communication is implemented as an AXI-stream and supports accelerator chaining. The 1x2 FP32x8 SIMD HW hardware configurations are provided.



15/34

- 5. In DTRiMC tool, the data communication support HW data movers are defined in design time and cannot be changed during the run time. The following variants are prepared:
  - a. Zero copy (ZC) HW data movers with C interface, (minimal HW resources)
  - b. Zero copy (ZC) HW data movers with C++ interface (minimal HW resources)
  - c. DMA data HW data movers with C++ interface
  - d. Combination of ZC HW (DDR to Accelerator) and SG DMA HW (Accelerator to DDR) with interrupts and C++ interface.

All communication alternatives work with identical SW API. It means that the user host SW code for ARM A53 remains identical and does not need modifications for all four versions of HW data movers.

- 6. Software must be able to query and identify which SIMD FP operations are supported by each HW accelerator. Based on this information, the software can be reconfigured to take the advantage of supported operations.
- 7. The accelerator must be able to query and identify information about the actual status of the HW license defined in with each HW accelerator.
- 8. The HW accelerator scheduler executing the sequence of VLIW operation is very simple. It can execute only a linear sequence of VLIW vector instructions. It does not support *for-loops*, *if-else*, and similar constructs. There is also no support for checking for the overflow/underflow or NaN in performed floating point operations. All these program control constructs have to be implemented in the host code running on the ARM A53 processor.
- 9. Computations performed in HW accelerators can overlap with stream-based data communications. This is controlled by the user host software running on the ARM A53 processor, usually in several parallel executed threads.
- 10. Data are stored as 64 bit words. This arrangement enables potential use the Ultra RAM blocks (4096x64b) present in some larger Zynq UltraScale+ devices without affecting the accelerator library API or user code.

### Reconfiguration of accelerator by change of firmware

The FP32x8 HW accelerator executes sequences of VLIW vector instructions (firmware) stored in accelerator program memory. This firmware can be first defined in the Arm host software and then downloaded via the streaming interface to the accelerator. The program memory will usually contain multiple different sequences of VLIW instructions.

Computation performed in the accelerator can overlap with stream-based data communication. This is controlled by the Arm host software and it can be used for run-time reconfiguration by loading a new VLIW instruction sequence to the accelerator program memory while computation is in progress.

For example, consider an application which needs to perform accelerated multiplication of 64x64 matrices ( $Z[64,64] = A[64,64] \times B[64,64]$ ). The application running on the host will split the matrix operation into shorter sequences of VLIW instructions and loaded instruction sequences into the accelerator program memory schedule scheduled by the application software running on the ARM host by adjusting pointers to instruction sequences to be loaded into the accelerator program memory while streaming parts of matrix B[64,64] from host DDR memory to the accelerator. Rows of the matrix are propagated as identical to all 8xSIMD memories in 8 stages.



http://sp.utia.cz



## Reconfiguration of accelerator by temporary change of firmware

Application software can temporarily reconfigure the accelerator in the following steps:

- 1. Save pars of data and firmware from accelerator to DDR4,
- 2. Change firmware and upload it to the accelerator,
- 3. Execute the firmware (for example the **SupOp** instruction)
- 4. Read the results from accelerator data memory into ARM host memory,
- 5. Restore saved data and firmware back from DDR to accelerator.

After performing the above steps, the accelerator data and firmware are back in its original state and can continue. The application software running on the ARM host has information about the supported SIMD operations as well as about the status of the HW license.

This temporary replacement of firmware and data can be re-used and work independently on the actual content of the accelerator.

Consider a scenario in which the host application software needs to find out if needed VLIW instructions and the corresponding SIMD operation is actually supported by the HW accelerator.

This information is required by the host software to decide, which firmware version can be used for programming of the HW accelerator:

- If the DotProd instruction is supported by the HW accelerator, the computation of 64x64 matrix multiplication (Z[64,64] = A[64,64] × B[64,64]) will use the instruction to improve efficiency.
- If the **DotProd** instruction is unsupported by the HW accelerator, the host software running on the ARM processor can implement the accelerated matrix multiplication using different sequences of **Mac** (multiply and accumulate) VLIW instructions.
- If the **Mac** instruction is also unsupported, the matrix multiplication can be implemented by again different sequence of **Add** and **Mult** VLIW instructions.

The performance of the matrix multiplication might be reduced in the last case, but such HW accelerator requires less HW resources in the programmable logic of the device.

Use of such compact HW accelerators with reduced set of VLIW instructions might be necessary if the programmable logic a+rea is limited. See Figure 6 for available HW accelerator versions.

# 9 References

[1] MPSoC Module with Xilinx Zynq UltraScale+ ZU3CG-1E, 1 GByte DDR4 SDRAM, 4x5cm

https://shop.trenz-electronic.de/en/TE0820-03-4DE21FA-MPSoC-Module-with-Xilinx-Zynq-UltraScale-ZU3CG-1E-2-GByte-DDR4-SDRAM-4-x-5-cm

[2] Trenz Electronic, "Carrier Board for Trenz Electronic 7 Series" [Online].

https://shop.trenz-electronic.de/en/TE0701-06-Carrier-Board-for-Trenz-Electronic-7-Series?c=261



17/34



[3] Lukas Kohout, Jiri Kadlec, Zdenek Pohl: "Video Input/Output IP Cores for TE0820 SoM with TE0701 Carrier and Avnet HDMI Input/Output FMC Module", Application note and Evaluation package [Online].

http://sp.utia.cz/index.php?ids=results&id=te0820-hio-ho

[4] Jiři Kadlec, Zdeněk Pohl, Lukáš Kohout: "Design Time and Run Time Resources for Zynq Ultrascale+ TE0820-02-3CG-1E with SDSoC 2018.2 Support", Application note and Evaluation package [Online].

http://sp.utia.cz/index.php?ids=results&id=FitOptiVis-te0820-SDSoC-2018\_2



http://sp.utia.cz

Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i. 18/34

# **APPENDIX - Confidence test**

This is basic confidence test of the evaluation package.

Unzip evaluation package to Win 10 directory of your choice. c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\

Precompiled HW and SW projects are located in directory: c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\_relea se\fp03x8\_v26x2s\te02\_dma\_v26x2\_sw\

Sd\_card image is located in directory:

c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\_relea se\_sdcard\

#### INSTALLATION OF TOOLS

- Install Xilinx SDK 2018.2 on Win 10 PC 64 bit.
- Install Xilinx Lab Tools 2018.2 on Win 10 PC 64 bit.
- Install Win32DiskImager for writing of image to 16 GB SD card, Class (10).
- Install Putty (for USB based serial console and Ethernet based serial console).
- Unzip Arm Debian disk image on PC and use Win32DiskImager to write the disk image from the PC to the SD card.

#### HW SETUP

- Insert the SD with disk image card to the Zynq Ultrascale+ board.
- Connect PC and Zyng Ultrascale+ to Ethernet.
- Connect USB serial terminal cable to Zynq Ultrascale+ and to PC.
- Connect Full HD HDMI 60 FPS video source to Imageon board.
- Connect Full HD HDMI 60 FPS display to Imageon board.

#### TEST

- Zynq Ultrascale+ will start to boot OS. If you have Full HD HDMI monitor, kbd and mouse connected to Zynq, you will get access to the Zynq Ultrascale+ Debian desktop.
- Optional: Open Putty terminal. Set it to:
   (145200 brac 8 data bits, stan bits)
  - (115200 bps, 8 data bits, stop bit 1, parity none, flow control off)
- Optional: Use Putty terminal to login as user: root password: root
- Change directory to /boot
- Export path to the shared library. Type in Debian Linux terminal or desktop terminal: export LD\_LIBRARY\_PATH=/boot
- Start application code by typing:
- ./sobel\_all.elf

RESULT

- The application will synchronize with the video input and start to perform HW accelerated edge detection (sobel algorithm) with 60 FPS.
- In parallel to the video processing, the application will compute eight single precision floating point matrix multiplications
  - o In SW on ARM A53
  - HW accelerated by two 8xSIMD FP03x8 accelerators.
- The application will tests of all elementary operations of 8xSIMD HW accelerators
- Results of ARM and HW accelerated computations are compared to be identical and MFLOPs performance is displayed.



ŪTĬA



• The running **sobel\_all.elf** application can be stopped by Ctrl-C key on the Arm terminal keyboard.

### Compilation and debug of projects from source code

The evaluation package includes SW projects for Xilinx SDK 2018.2 tool running on Win10.

These projects can be modified and recompiled for ARM and executed on Zynq Ultrascale+ with or without debugging support. Open SDK 2018.2 tool, in working directory:

c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\_relea se\fp03x8\_v26x2s\te02\_dma\_v26x2\_sw\

Projects in this directory link to the same shared library:

c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\_relea
se\fp03x8\_v26x2s\te02\_dma\_v26x2\_sw\Release\sd\_card\libsobel\_dma\_v26x
2\_hw.so

Each project has two configurations:

- **Debug** for debugging with –O0 flag with debug information symbols included.
- **Release** for maximal performance with -O3 flag and without debug symbols.

You can modify and re-compile the SW code in the Xilinx SDK 2018.2 tool on Win 10 PC..

## DEBUG of SW application from Xilinx SDK 2018.2

The application can be executed or debugged from the SDK 2018.2 tool. For example:

| Eclipse Launcher                                                                            |        |  |  |  |  |
|---------------------------------------------------------------------------------------------|--------|--|--|--|--|
| Select a directory as workspace                                                             |        |  |  |  |  |
| Xilinx SDK uses the workspace directory to store its preferences and development artifacts. |        |  |  |  |  |
|                                                                                             |        |  |  |  |  |
| Workspace:         X:\te02_4x2_async_mulf64_sgdma_sw                                        | Browse |  |  |  |  |
|                                                                                             |        |  |  |  |  |
| Use this as the default and do not ask again                                                |        |  |  |  |  |
| Recent Workspaces                                                                           |        |  |  |  |  |
|                                                                                             |        |  |  |  |  |
| ОК                                                                                          | Cancel |  |  |  |  |
|                                                                                             |        |  |  |  |  |

Figure 14: Select debug workspace.

SDK debugger needs environment information about the location of the actual shared library on the board. For example:



© 2020 ÚTIA AV ČR, v.v.i. All disclosure and/or reproduction rights reserved

|                                                                                       |                             |                                         | ~                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|---------------------------------------------------------------------------------------|-----------------------------|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Debug Configurations                                                                  |                             |                                         | ×                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Create, manage, and run configurations                                                |                             |                                         | The second secon |
| Run or Debug a program using System Debugger.                                         |                             |                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                       | Name: System Debugger u     | sing Debug_sobel_all.elf on Linux Agent |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| type filter text                                                                      | 🧿 Target Setup 📄 Applic     | ation 🕪 Arguments 📧 Environment 🛛 🚋 Syr | nbol Files 🧤 Source                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Performance Analysis                                                                  |                             |                                         | » <sub>2</sub>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Target Communication Framework                                                        | Environment variables to se | t                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Xilinx C/C++ Application (GDB)     Kilinx C/C++ application (System Debugger on QEMU) | Variable                    | Value                                   | New                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| <ul> <li></li></ul>                                                                   | LD_LIBRARY_PATH             | /boot                                   | Select                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| System Debugger using Debug_sobel_all.elf on Linux Agent                              |                             |                                         | E 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                                                                       |                             |                                         | Edit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                                                                       |                             |                                         | Remove                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|                                                                                       | Append environment to       | o native environment                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                       |                             | nent with specified environment         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                       |                             |                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                       |                             |                                         | Revert Apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Filter matched 6 of 11 items                                                          |                             |                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                       |                             |                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| ?                                                                                     |                             |                                         | Debug Close                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |

Figure 15: Define the environment variable.

Before start of Debug, copy content to the sd\_card directory

#### c:\home\work\TS82fp03x8 TE0701 DTRiMC zu3cg\zu3cg deb eval ila relea se\fp03x8 v26x2s\te02 dma v26x2 sw\sobel all\Debug\sd card\\*.\*

to DS card directory visible in Win 10 file explorer.

\* \*

Enter the SD card to the board and power ON the board.

Alternatively, you can use Ethernet to perform binary copy to the SD card. If you use Ethernet, you have to type

#### reboot

to reboot the board with correct bitstream loaded to the programmable logic part of the Zynq Ultrascale+.

To debug from the PC in the Xilinx SDK debugger GUI, the Zyng Ultrascale+ TCF server has to be accessible from the PC via Ethernet. This can be tested. See Figure 16.

21/34





| SDK Targ    | Target Connection Details X                |                    |        |  |  |  |  |
|-------------|--------------------------------------------|--------------------|--------|--|--|--|--|
| Edit Tar    | rget Connection                            |                    |        |  |  |  |  |
| Edit Tar    | get Connection                             |                    |        |  |  |  |  |
| _           | Target Name Linux Agent                    |                    |        |  |  |  |  |
|             | s default target                           |                    |        |  |  |  |  |
| Specif      | Specify the connection type and properties |                    |        |  |  |  |  |
| Туре        | Linux TCF Agent                            |                    | $\sim$ |  |  |  |  |
| Host        | 10.0.032                                   |                    |        |  |  |  |  |
| Port        | 1534                                       |                    |        |  |  |  |  |
| Advanced >> |                                            |                    |        |  |  |  |  |
| ?           |                                            | Test Connection OK | Cancel |  |  |  |  |

Figure 16: Test connection to Linux TCF Agent.

#### Compile SW application directly on the Zynq Ultrascale+ board

Xilinx SDK 2018.2 tool creates files for the **make** utility, which can be used for compilation of SW application directly on the board with use of the gcc C compiler or the g++ C++ compiler of the Arm Debian OS.

You can copy complete SDK 2018.2 project to the Debian file system and compile on board by copy complete content of the C or C++ directory with SDK projects. Example of C++:

```
c:\home\work\TS82fp03x8_TE0701_DTRiMC_zu3cg\zu3cg_deb_eval_ila_relea
se\fp03x8_v26x2s\te02_dma_v26x2_sw\sobel_all\
```

to the Debian file system into directory:

```
/home/fp03x8_v26x2s/te02_dma_v26x2_sw/sobel_all/
```

To compile in Arm Debian the **sobel all** project, change the directory to:

```
cd /home/fp03x8_v26x2s/te02_dma_v26x2_sw/sobel_all/Debug
```

and export the relative path to the Debug version of the shared library by typing in Debian console:

22/34

```
export LD_DATA_PATH=../../Debug/sd_card
```

In Debian terminal, clean and then recompile the project by typing:

make clean make





Finally, execute the re-compiled C++ debug version of the application compiled by Arm Debian g++ compiler by typing in the Debian console:

#### ./sobel\_all.elf

You are done. See the application running on the board.

To close the Debian OS, type in the Debian terminal:

#### halt

This will close all open files on the SD file system an halt the ARM system.

Press the S1 button on the TEBF0808 carrier board to stop the power supply on the board.

The ventilator on the carrier board will stop.

Now you can safely remove the SD card. The PC terminal remains connected.

You can modify the SD card in PC and continue with other tests.

To close all work, you can close the PC terminal and then completely power down the TE0701 carrier board.

### Guide for compilation and use of C MEX functions in Scilab

In Debian terminal, change directory to

#### /home/zu3cg deb eval ila release scilab/cc/mmultf

Start Scilab by typing

scilab

In Scilab, execute script

#### mmultf\_cc.sce

This script will compile C MEX function mmultf.c to shared library libmex\_mmultf.so in the same directory

Quit Scilab by typing

quit

Copy created shared library libmex\_mmultf.so

/home/zu3cg\_deb\_eval\_ila\_release\_scilab/cc/mmultf/libmex\_mmultf.so

to

signal processing

23/34





/home/zu3cg\_deb\_eval\_ila\_release\_scilab/test/test\_mmultf\_4xB/
libmex\_mmultf.so

In Debian terminal, change directory to

/home/zu3cg\_deb\_eval\_ila\_release\_scilab/test/test\_mmultf\_4xB

Start Scilab by typing

#### scilab

In Scilab execute script

mmultf\_4xB\_test.sce

This script will execute mmultf() C MEX function present in shared library libmex\_mmultf.so and generate reference header files in the current directory. Files contain single precision floating point reference data used for testing of 8xSIMD HW accelerators. Quit Scilab by typing.

quit

Use same process to compile and use all other reference MEX C functions.



Figure 17: C MEX compilation in Arm Scilab. Remote X11 Desktop.

See execution of the sobel\_all.elf application with Scilab in the Full HD HDMI desk top in Figure 10.



# **10 APPENDIX – DTRiMC tool guidelines**

DTRIMC tool requires the 8xSIMD HW IP core as input. Contact UTIA to get license for use of this IP.

| Contact UTIA to buy | the required 8xSIMD HW accelerator IP:                       |
|---------------------|--------------------------------------------------------------|
| Name of the IP:     | fp03x8_v26_v40                                               |
| ID:                 | 7                                                            |
| Device:             | xczu3cg-sfvc784-1-e                                          |
| Tool chain:         | Vivado/SDSoC 2018.2                                          |
| Contact:            | UTIA AV CR v.v.i.; Pod Vodarensnou vezi 4, 18208 Prague 8,   |
|                     | Czech Republic;                                              |
|                     | Jiri Kadlec; email: kadlec@utia.cas.cz tel: +420 2 6605 2216 |

The fp03x8\_v26\_v40 IP is not included in the evaluation package in required HDL source code. The compiled evaluation version of the fp03x8\_v26\_v40 IP is present in the BOOT.BIN files in sd card directories of the evaluation package.

The evaluation package can be downloaded from UTIA for free from www server <a href="http://sp.utia.cz/index.php?ids=projects/fitoptivis">http://sp.utia.cz/index.php?ids=projects/fitoptivis</a>

### Guide for compilation of HW in the DTRiMC tool

 Unpack the DTRiMC evaluation package to Win10 directory. DTRiMC tool is in: c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\ Change directory to:
 c:\home\work\TS82fp02w8\_TE0701\_DTRiMC\_zu3cg\deb\_cu2l\_ila\fp02w8\_w26

c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\fp03x8\_v26 x2s\zusys\

- 2. Add the UTIA 8xSIMD HW IP to the package as the directory \ip c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\fp03x8\_v26 x2s\zusys\ip lib\ip
- 3. On Win10, open dos terminal window, change directory to the folder c:\home\work\TS82fp03x8 TE0701 DTRiMC zu3cg\zu3cg deb eval ila\fp03x8 v26 x2s\zusys\
- 4. To overcome limitations of Win10 related to the need of short directory paths, use the script \_use\_virtual\_drive.cmd to create a virtual short path to your directory drive X:\susys Type command:

use virtual drive.cmd

Select x as name of the virtual drive and select 0 to create the virtual drive.

Go to the created virtual short-path directory by typing in the win 10 terminal: x:

cd zusys

5. Use text editor of your choice and open and modify script <code>design\_basic\_settings.sh</code> Select correct path to SDSoC 2018.2 tool installed on your Win7 or Win10. Line 38:

@set XILDIR=C:/Xilinx
Select proper Xilinx device:

Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i.

@set PARTNUMBER=24

The selected number corresponds to the number defined in file

X:\zusys\board\_files/TE0808\_board\_files.csv

Verify, if line 78 of script *design\_basic\_settings.sh* sets the SDSoC flow support by: *ENABLE\_SDSOC=1* 





@set ENABLE SDSOC=1

folder:

- 6. Start the Xilinx Vivado 2018.2 and create the design by executing of script: X:\zusys\vivado create project guimode.cmd
- Optional: You can use Vivado automation and to the created HW design the In circuit Logic Analysator (ILA) monitor to enable capturing of selected accelerator outputs of your choice.
- 8. In Vivado console, execute command: TE::hw\_build\_design -export\_prebuilt After the Vivado compilation, new hardware description file zusys.hdf is generated in

X:\zusys\prebuilt\hardware\3cg 1e 1gb\

## Guide for configuration and compilation of PetaLinux in the DTRiMC tool

The configuration and compilation of the *Petalinux 2018.2* kernel and *Debian 9.8 Stretch* image as the FitOptiVis run time resource for the Zynq Ultrascale+ module TE0820-03-03CG-1E with Zynq Ultrascale+ device xczu3cg-sfvc784-1-e device is described now. The configuration has to be performed in the Ubuntu 16.04 LTS OS.

The DTRiMC tool is configured for use of Ubuntu 16.04 LTS in the *VMware Workstation Player* in Win10. The Petalinux 2018.2 distribution can be downloaded to the Ubuntu 16.04 LTS from

https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedde d-design-tools/2018-2.html

#### and installed to the default Ubuntu directory:

/opt/petalinux/petalinux-v2018.2-final

The standard PetaLinux 2018.2 distribution requires few modifications.

- 1. Copy content of these Win 10 directories:
  - X:\zusys\prebuilt X:\zusys\os

to Ubuntu directories:

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/
```

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/prebuilt/
```

- 2. In Ubuntu, open terminal window and set path to the PetaLinux 2018.2: source /opt/petalinux/petalinux-v2018.2-final/settings.sh
- 3. Go to the directory copied from the evaluation package with pre-defined configuration for the Zynq Ultrascale+ module TE0820-03-03CG-1E:
  - cd

/home/devel/work/TS82fp03x8\_TE0701\_DTRiMC\_zu3cg/zu3cg\_deb\_eval\_ila/fp03x8
\_v26x2s/zusys/os/petalinux/

It contains a predefined configuration according to Zynq Ultrascale+ board requirements.

4. The zusys.hdf file created in Win 10 in Vivado 2018.2 tool is present in the Ubuntu folder:



ŪTĬA

26/34

/home/devel/work/TS82fp03x8\_TE0701\_DTRiMC\_zu3cg/zu3cg\_deb\_eval\_ila/fp03x8
\_v26x2s/zusys/prebuilt/hardware/3cg\_1e\_1gb/

5. Use the xusys.hdf file as input for the PetaLinux configuration by (on single line)

petalinux-config --get-hw-description= /home/devel/work/TS82fp03x8\_TE0701\_DTRiMC\_zu3cg/zu3cg\_deb\_eval\_ila/fp03x8 v26x2s/zusys/os/prebuilt/hardware/3cg le lgb/



6. Verify if the PetaLinux filesystem location is changed from the ramdisk to the extra partition on the SD card, select:

```
Image Packaging Configuration --->
Root filesystem type (SD card) --->
```

7. Verrify if option to generate boot args automatically is disabled and if user defined arguments are set to:

```
earlycon clk_ignore_unused root=/dev/mmcblk0p2 rootfstype=ext4 rw
rootwait quiet
Loove the configuration 2x Evitand Yes
```

Leave the configuration, 3x Exit and Yes.

- 8. Build PetaLinux, from the bash terminal execute petalinux-build
- 9. Files image.ub, u-boot.elf and bl31.elf are created in:

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/petalinux/images/linux/image.ub
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/petalinux/images/linux/u-boot.elf
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
v26x2s/zusys/os/petalinux/images/linux/bl31.elf
```



27/34

## Guide for configuration and compilation of Debian OS in the DTRiMC tool

The file system is based on the latest stable version of Debian 9.8 Stretch distribution (03. 25. 2019). Follow the steps below.

10. Go to the folder with PetaLinux:

```
cd
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
v26x2s/zusys/os/petalinux/
```

11. The 64bit Debian image will be created by execution of the *mkdebian.sh* script. The script checks all the tools that are needed to create the image, most of them are a standard part of the Ubuntu 16.04 LTS distribution.

When some of them are missing, install them by: sudo apt install *Package* 

Table 1: tools with a corresponding package name.

| Package          |
|------------------|
| coreutils        |
| mount            |
| parted           |
| util-linux       |
| dosfstools       |
| e2fsprogs        |
| debootstrap      |
| gzip             |
| сріо             |
| coreutils        |
| apt              |
| debconf          |
| sed              |
| locales          |
| locales          |
| qemu-user-static |
|                  |

12. Create the Debian image. It will consist of two partitions.

The file system of the first one will be FAT32. This partition is dedicated for image of the PetaLinux kernel. The second partition will contain the Debian using EXT4 file system. Create the Debian image from the external Ethernet repositories by this command:

chmod ugo+x mkdebian.sh

sudo ./mkdebian.sh

During the creation procedure, you will be asked to set language. Choose *English* (US). The resultant image file will be called *TE0808-debian.img*, its size will be 7 GB.



28/34

| Keyboard layout:<br>English (US)<br>English (US) - C<br>English (US) - E<br>English (US) - F<br>English (US) - F | English (Colemak)<br>English (Dvorak alternative international no dead keys)<br>English (Dvorak)<br>English (Dvorak)<br>English (Macintosh)<br>English (Macintosh)<br>English (Programmer Dvorak)<br>English (VS, alternative international)<br>English (US, international with dead keys)<br>English (US, with euro on 5)<br>English (Workman)<br>English (Workman, international with dead keys)<br>English (Workman, international with dead keys)<br>English (US, with euro on 5)<br>English (Workman, international with dead keys)<br>English (US, with euro as 5)<br>English (US, with euro as 5)<br>English (Vorkman, international with dead keys)<br>English (Uorkman, international with dead keys)<br>English (classic Dvorak)<br>English (international AltGr dead keys)<br>English (the divide/multiply keys toggle the layout)<br>Russian (US, phonetic) |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| English (US) - S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Serbo-Cròatian (US) ´                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Other                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | <ok> <cancel></cancel></ok>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

13. Compress the created image to file TE0808-debian.zip:

zip TE0808-debian TE0808-debian.img

#### 14. Copy compressed image file from Ubuntu

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
v26x2s/zusys/os/petalinux/te0820-debian.zip
```

#### to Win 10 file:

X:\zusys\prebuilt\os\petalinux\default\te0820-debian.zip

#### 15. Copy these files from Ubuntu

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/petalinux/images/linux/image.ub
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/petalinux/images/linux/u-boot.elf
/home/devel/work/TS02fp03u9_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03u9
```

/home/devel/work/TS82fp03x8\_TE0701\_DTRiMC\_zu3cg/zu3cg\_deb\_eval\_ila/fp03x8
\_v26x2s/zusys/os/petalinux/images/linux/bl31.elf

#### to Win 10 files:

```
X:\zusys\prebuilt\os\petalinux\default\image.ub
```

```
X:\zusys\prebuilt\os\petalinux\default\u-boot.elf
```

```
X:\zusys\prebuilt\os\petalinux\default\bl31.elf
```

#### 16. In Ubuntu, clean Petalinux project files

```
petalinux-build -x mrproper
```

#### 17. In Ubuntu, delete files

```
/home/devel/work/TS82fp03x8_TE0701_DTRiMC_zu3cg/zu3cg_deb_eval_ila/fp03x8
_v26x2s/zusys/os/petalinux/TE0808-debian.zip
```

/home/devel/work/TS82fp03x8\_TE0701\_DTRiMC\_zu3cg/zu3cg\_deb\_eval\_ila/fp03x8
\_v26x2s/zusys/os/petalinux/TE0808-debian.img

#### 18. In Ubuntu, close all applications and shut down Linux.



http://sp.utia.cz



Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i. 19. In Win 10, close the VMware Workstation Player.

You can continue with preparation of the Zynq Ultrascale+ board with created files:

- Petalinux kernel image image.ub
- Compressed Debian image TE0808-debian.zip
- U-boot program *u-boot.elf*
- Support firmware *bl31.elf*

This ends the DTRiMC tool configuration and compilation steps for the Petalinux and Debian.

## Guide for creation of SDSoC platform OS in the DTRiMC tool

 In the open Vivado 2018.2 console, create and compile the initial BOOT.bin file and the initial SW modules by execution of the command: TE::sw run hsi

The resulting BOOT.bin file will be located in the folder

X:\zusys\prebuilt\boot\_images\3cg\_1e\_1gb\u-boot\BOOT.bin

2. In Vivado 2018.2 console, create the SDSoC platform by execution of the command: TE::ADV::beta\_util\_sdsoc\_project The SDSoC 2018.2 platform is generated in to the directory X:\SDSoC PFM\TE0820-03\03CG-1E\

and it is also packed into the ZIP file.

### Guide for creation of shared library and HW kernel in the DTRiMC tool

1. On Win10, in the open dos terminal window, cancel the current virtual drive X: by executing from the command line

\_use\_virtual\_drive.cmd

and response (1)

- Change directory to c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\fp03x8\_v26 x2s\SDSoC\_PFM\TE0820-03\03CG-1E\
- 3. In Win10, open dos terminal window and use the copy of the script \_use\_virtual\_drive.cmd to create a new virtual short path to get short SDSoC directory X:\03CG-1E

```
use virtual drive.cmd
```

Select X as name of the virtual drive and select (0) to create the virtual drive. Go to the created virtual short-path directory by:

```
Χ:
```

cd 03CG-1E

- 4. Open SDSoC project in directory X:\03CG-1E
- 5. In SDSoC import HW kernel design project

te02\_4x2\_async\_mulf64\_sgdma\_hw

```
from the directory
```

```
c:\home\work\TS82fp03x8_TE0701_DTRiMC_zu3cg\zu3cg_deb_eval_ila\fp03x8_v26
x2s\SDSoC_PFM_src\TE0820-03\03CG-1E\
```

6. Define the custom SDSoC platform



X:\03CG-1E\zusys

- 7. Change both imported projects from Debug to the Release compilation target
- 8. Compile both projects by the SDSoC compiler
- 9. Result of compilation are the SD cards with BOOT.bin file shared object library definition files in directories:

X:\03CG-1E\sobel dma v26x2 hw\Release\sd card\

#### 10. Copy content of the directory

X:\03CG-1E\sobel dma v26x2 hw\Release\sd card\

to

```
c:\home\work\TS82fp03x8_TE0701_DTRiMC_zu3cg\zu3cg_deb_eval_ila_release\fp
03x8 v26x2s\te02 dma v26x2 sw\Release\sd card\
```

```
and also to
```

```
c:\home\work\TS82fp03x8 TE0701 DTRiMC zu3cg\zu3cg deb eval ila release\fp
03x8_v26x2s\te02_dma_v26x2_sw\Debug\sd_card\
```

#### 11. Optional:

Copy ILA nets definition files debug nets.ltx and zsys wrapper.ltx from the directory

X:\03CG-1E\sobel dma v26x2 hw\Release\ sds\p0\vivado\prj\prj.runs\impl 1\

to

c:\home\work\TS82fp03x8\_TE0701\_DTRiMC\_zu3cg\zu3cg\_deb\_eval\_ila\_release\fp 03x8 v26x2s\te02 dma v26x2 sw\Release\sd card\

#### and also to

```
c:\home\work\TS82fp03x8 TE0701 DTRiMC zu3cg\zu3cg deb eval ila release\fp
03x8 v26x2s\te02 dma v26x2 sw\Debug\sd card
```

31/34

- 12. Clean both projects
- 13. Close SDSoC tool



### Guide for retargeting of the DTRiMC tool for another device/module

The DTRiMC tool is configured for Trenz Electronic module with ID=7, module TE0820-03-03CG-1E, part xczu3cg-sfvc784-1-e, memory 1GB with short module name 3cg\_1e\_1gb.

Hovewer, the DTRiMC tool scripts can be modified to target different Trenz Electronic module and device. See the actual list of Trenz Electronic modules in:

https://shop.trenz-electronic.de/en/Products/Trenz-Electronic/TE08XX-Zynq-UltraScale/TE0820-Zynq-UltraScale/

Use text editor of your choice and open and modify script *design\_basic\_settings.sh* Modify ID of the module/device: @set PARTNUMBER=7

Trenz Electronic modules supported by this release of the DTRiMC tool.

| ID | Module             | Partname            | Memory | ShortName  |
|----|--------------------|---------------------|--------|------------|
| 2  | TE0820-02-02EG-1E  | xczu2eg-sfvc784-1-e | 1GB    | 2eg_1e_1gb |
| 3  | TE0820-02-02EG-1E3 | xczu2eg-sfvc784-1-e | 1GB    | 2eg_1e_1gb |
| 4  | TE0820-02-02CG-1E  | xczu2cg-sfvc784-1-e | 1GB    | 2cg_1e_1gb |
| 5  | TE0820-02-03EG-1E  | xczu3eg-sfvc784-1-e | 1GB    | 3eg_1e_1gb |
| 6  | TE0820-02-03EG-1E3 | xczu3eg-sfvc784-1-e | 1GB    | 3eg_1e_1gb |
| 7  | TE0820-02-03CG-1E  | xczu3cg-sfvc784-1-e | 1GB    | 3cg_1e_1gb |
| 8  | TE0820-02-02EG-1EA | xczu2eg-sfvc784-1-e | 1GB    | 2eg_1e_1gb |
| 9  | TE0820-02-02EG-1EL | xczu2eg-sfvc784-1-e | 1GB    | 2eg_1e_1gb |
| 10 | TE0820-02-02CG-1EA | xczu2cg-sfvc784-1-e | 1GB    | 2cg_1e_1gb |
| 11 | TE0820-02-03EG-1EA | xczu3eg-sfvc784-1-e | 1GB    | 3eg_1e_1gb |
| 12 | TE0820-02-03EG-1EL | xczu3eg-sfvc784-1-e | 1GB    | 3eg_1e_1gb |
| 13 | TE0820-02-03CG-1EA | xczu3cg-sfvc784-1-e | 1GB    | 3cg_1e_1gb |
| 14 | TE0820-02-04CG-1EA | xczu4cg-sfvc784-1-e | 1GB    | 4cg_1e_1gb |
| 15 | TE0820-03-04EV-1EA | xczu4ev-sfvc784-1-e | 2GB    | 4ev_1e_2gb |
| 16 | TE0820-03-02CG-1EA | xczu2cg-sfvc784-1-e | 2GB    | 2cg_1e_2gb |
| 17 | TE0820-03-02EG-1EA | xczu2eg-sfvc784-1-e | 2GB    | 2eg_1e_2gb |
| 18 | TE0820-03-02EG-1EL | xczu2eg-sfvc784-1-e | 2GB    | 2eg_1e_2gb |
| 19 | TE0820-03-03CG-1EA | xczu3cg-sfvc784-1-e | 2GB    | 3cg_1e_2gb |
| 20 | TE0820-03-04CG-1EA | xczu4cg-sfvc784-1-e | 2GB    | 4cg_1e_2gb |
| 21 | TE0820-03-03EG-1EA | xczu3eg-sfvc784-1-e | 2GB    | 3eg_1e_2gb |
| 22 | TE0820-03-03EG-1EL | xczu3eg-sfvc784-1-e | 2GB    | 3eg_1e_2gb |
| 23 | TE0820-03-2AI21FA  | xczu2cg-sfvc784-1-i | 2GB    | 2cg_1i_2gb |
| 24 | TE0820-03-2BE21FL  | xczu2eg-sfvc784-1-e | 2GB    | 2eg_1e_2gb |
| 25 | TE0820-03-3AI210A  | xczu3cg-sfvc784-1-i | 2GB    | 3cg_1i_2gb |
| 26 | TE0820-03-3BE21FA  | xczu3eg-sfvc784-1-e | 2GB    | 3eg_1e_2gb |
| 27 | TE0820-03-3BE21FL  | xczu3eg-sfvc784-1-e | 2GB    | 3eg_1e_2gb |
| 28 | TE0820-03-02CG-1ED | xczu2cg-sfvc784-1-e | 2GB    | 2cg_1e_2gb |
| 29 | TE0820-03-2AE21FA  | xczu2cg-sfvc784-1-e | 2GB    | 2cg_1e_2gb |
| 30 | TE0820-03-2BE21FA  | xczu2eg-sfvc784-1-e | 2GB    | 2eg_1e_2gb |
| 31 | TE0820-03-3AE21FA  | xczu3cg-sfvc784-1-e | 2GB    | 3cg_1e_2gb |
| 32 | TE0820-03-3AI21FA  | xczu3cg-sfvc784-1-i | 2GB    | 3cg_1e_2gb |
| 33 | TE0820-03-4AE21FA  | xczu4cg-sfvc784-1-e | 2GB    | 4cg_1e_2gb |
| 34 | TE0820-03-4DE21FA  | xczu3cg-sfvc784-1-e | 2GB    | 4ev_1e_2gb |
| 35 | TE0820-03-4DI21FA  | xczu3cg-sfvc784-1-i | 2GB    | 4ev_1i_2gb |
|    |                    |                     |        |            |

After the change of the target, the DTRiMC tool requires also change of the input 8xSIMD HW IP core. Contact UTIA to get license for the required HW IP version.



32/34

http://sp.utia.cz

| Contact UTIA to buy | the required 8xSIMD HW accelerator IP:                              |
|---------------------|---------------------------------------------------------------------|
| Name of the IP:     | fp03x8_v26_v40                                                      |
| ID:                 | Select ID                                                           |
| Device:             | Select partname                                                     |
| Tool chain:         | Vivado/SDSoC 2018.2                                                 |
| Contact:            | UTIA AV CR v.v.i.; Pod Vodarensnou vezi 4, 18208 Prague 8,          |
|                     | Czech Republic;                                                     |
|                     | Jiri Kadlec; email: <u>kadlec@utia.cas.cz</u> tel: +420 2 6605 2216 |

Implement all design steps with the DTRiMC tool for the retargeted module/device.



http://sp.utia.cz

Akademie věd České republiky Ústav teorie informace a automatizace AV ČR, v.v.i.

# Disclaimer

This disclaimer is not a license and does not grant any rights to the materials distributed herewith. Except as otherwise provided in a valid license issued to you by UTIA AV CR v.v.i., and to the maximum extent permitted by applicable law:

(1) THIS APPLICATION NOTE AND RELATED MATERIALS LISTED IN THIS PACKAGE CONTENT ARE MADE AVAILABLE "AS IS" AND WITH ALL FAULTS, AND UTIA AV CR V.V.I. HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and

(2) UTIA AV CR v.v.i. shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under or in connection with these materials, including for any direct, or any indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or UTIA AV CR v.v.i. had been advised of the possibility of the same.

#### **Critical Applications:**

UTIA AV CR v.v.i. products are not designed or intended to be fail-safe, or for use in any application requiring fail-safe performance, such as life-support or safety devices or systems, Class III medical devices, nuclear facilities, applications related to the deployment of airbags, or any other applications that could lead to death, personal injury, or severe property or environmental damage (individually and collectively, "Critical Applications"). Customer assumes the sole risk and liability of any use of UTIA AV CR v.v.i. products in Critical Applications, subject only to applicable laws and regulations governing limitations on product liability.

34/34

