How to enhance DSP co-processing capabilities and what are the applications

Currently, the demand for high-speed communication and ultra-fast computing is increasing day by day. Wired and wireless communication standards are used everywhere, and data processing architectures are expanding every day. The more common method of wired communication is Ethernet (LAN, WAN, and MAN networks). Mobile phone communication is the most common wireless communication method, which is realized by the architecture using DSP. As the primary vehicle for voice connectivity, telephony is currently meeting the ever-increasing demands of voice, video and data.

Author: Navneet Rao

Currently, the demand for high-speed communication and ultra-fast computing is increasing day by day. Wired and wireless communication standards are used everywhere, and data processing architectures are expanding every day. The more common method of wired communication is Ethernet (LAN, WAN, and MAN networks). Mobile communication is the most common wireless communication method, which is realized by the architecture that applies DSP. As the primary vehicle for voice connectivity, telephony is currently meeting the ever-increasing demands of voice, video and data. When creating an architecture, system designers must not only consider the high-end requirements of the triple play model, but also meet the following requirements: high performance; low latency; low system cost (including NRE); scalable, extensible architecture; integrated Off-the-shelf (OTS) components; distributed processing; supports multiple standards and protocols.

These challenges involve two main aspects: the connectivity between computing platforms/boxes in wired or wireless architectures and the specific computing resources within those platforms/boxes.

Connectivity between computing platforms

Standards-based connections are now more common. Parallel connection standards (PCI, PCI-X, EMIF) can meet the current needs, but they are slightly insufficient in terms of scalability and extensibility. With the advent of packet-based processing, there is a clear trend toward high-speed serial connections (see Figure 1).

How to enhance DSP co-processing capabilities and what are the applications
Figure 1 Serial connection trend

The desktop computer and networking industries have adopted standards such as PCI Express (PCIe) and Gigabit Ethernet/XAUI. However, the interconnect requirements for data processing systems in wireless architectures are slightly different and are characterized by: low pin count; backplane chip-to-chip connectivity; scalable bandwidth and speed; DMA and information transfer; support for complex scalable topologies; Multipoint transmission; high reliability; absolute time synchronization; quality of service (QoS).

The Serial RapidIO (SRIO) protocol standard easily meets and exceeds most of the above requirements. As a result, SRIO becomes the primary interconnect for data plane connections in wireless infrastructure devices.

How to enhance DSP co-processing capabilities and what are the applications

Figure 2 SRIO network building blocks SRIO networks are built around two basic building blocks: endpoints and switches (see Figure 2). Endpoints process packets at source and sink, while switches transmit packets between ports without parsing them. SRIO is specified in a three-tier architectural hierarchy (see Figure 3):

• The physical layer specification describes the details of the device-level interface, such as packet transfer mechanisms, flow control, electrical parameters, and low-level error management.

• The transport layer specification provides the necessary routing information for the movement of packets between endpoints. Switches operate in the transport layer by using device-based cabling.

• The logic layer specification defines the overall protocol and packet format. All packets have a payload of 256 bytes or less. Transactions use load/store/DMA operations to 34-/50-/66-bit address spaces.

Transactions include:

・NREAD-read operation (returned data is the response)

・NWRITE – write operation, no response

・NWRITE_R – Robust write, the response comes from the target endpoint

・SWRITE – Streaming write

・ATOMIC – Atomic read/modify/write

・MAINTENANCE – system discovery, detection, initialization, configuration and maintenance operations

How to enhance DSP co-processing capabilities and what are the applications
Figure 3 Layered SRIO Architecture

SRIO – Advantage Outlook

Computing resources in the platform

Today’s applications have high demands on the amount of processing resources. Hardware-based applications are developing rapidly. Firewall applications such as compression/decompression algorithms, antivirus and intrusion detection, and security applications requiring encryption engines such as AES, triple DES, and Skipjack were originally implemented in software, but have now moved to hardware. This requires large parallel ecosystems where bandwidth and processing power can be shared. The system requires the use of a CPU, NPU, FPGA or ASIC for shared or distributed processing.

All of these application-specific requirements need to be considered when building a system that can adapt to future changes. The requirements for computing resources include:

・Multiple hosts – distributed processing

・Direct peer-to-peer communication

・Multiple heterogeneous operating systems

・Complex topology: discovery mechanism; redundant paths (failure recovery)

・Supports high reliability: lossless protocol; automatic retraining and device synchronization; system-level error management

・Able to support the communication data plane: multicast; traffic management (lossy) operations; link, level, and flow-based flow control; protocol interworking; high transaction concurrency

・Modular and expandable

・Support a wide ecosystem

A wide variety of requirements derived from computing devices in wireless architectures can be supported by the SRIO protocol.

The SRIO specification (see Figure 4) defines a layered package-based architecture that supports multiple domains or market segments, enabling system architects to design next-generation computing platforms. Using SRIO as a compute interconnect makes it easy to: make architecture independent; deploy scalable systems with carrier-grade reliability; enable advanced traffic management; and deliver high-performance, high-traffic. In addition, an ecosystem of large suppliers makes the selection of OTS components and assemblies easy.

How to enhance DSP co-processing capabilities and what are the applications
Figure 4 SRIO specification

SRIO is a packet-based protocol that supports: moving data through packet-based operations (read, write, message); I/O non-coherence functions and cache coherence functions; efficient by supporting data streaming, data partitioning, and reorganization functions Interworking and protocol encapsulation; traffic management framework by enabling millions of flows, support for 256 traffic levels and lossy operation; flow control, support for multiple transaction request flows, providing QoS; support for prioritization, which eases bandwidth allocation and Issues such as transaction ordering, and avoid deadlocks; topology support, through system discovery, configuration, and maintenance to support standard (tree and mesh) and arbitrary hardware (daisy-chain) topologies, including support for multiple hosts; error management and classification (possible recovery, reminder and lethal).

Xilinx IP Solutions for SRIO

The Xilinx Endpoint IP solution for SRIO is designed for the RapidIO specification (v1.3). The complete Xilinx Endpoint IP solution for SRIO includes the following parts (see Figure 5):

• Xilinx Endpoint IP for SRIO is a soft LogiCORE solution. Fully compatible maximum payload operations are supported for sourcing and receiving user data through target and origination interfaces at the logical (I/O) and transport layers.

・The buffer layer reference design, available as source code, automatically re-prioritizes packets and adjusts queues.

• SRIO physical layer IP enables link training and initialization, discovery and management, and error and retry recovery mechanisms. Additionally, high-speed transceivers are instantiated in the physical layer IP to support 1-lane and 4-lane SRIO bus links at line rates of 1.25Gbps, 2.5Gbps, and 3.125Gbps.

• The Register Manager reference design allows an SRIO host device to set and maintain endpoint device configuration, link status, control, and timeout mechanisms. In addition, ports are provided on the Register Manager for user designs to probe the state of the endpoint device.

How to enhance DSP co-processing capabilities and what are the applications
Figure 5 Xilinx Endpoint IP Architecture for SRIO

The entire Xilinx Endpoint IP LogiCORE solution for SRIO has been fully tested, hardware validated, and is currently being tested for interoperability with major SRIO device vendors. LogiCORE IP is delivered through the Xilinx CORE Generator software GUI tool, which allows users to customize baud rate and endpoint configuration, and supports extended features such as flow control, resend compression, doorbell and messaging. This allows users to create a flexible and extensible custom SRIO endpoint IP optimized for their own application.

Virtex-5 FPGA computing resources

Xilinx Endpoint IP for SRIO ensures high-speed connectivity between both sides of the link using the SRIO protocol. In the smallest Virtex-5 devices, the IP occupies less than 20% of the available logic resources, thus ensuring that the user design uses most of the logic/memory/I/O, focusing on the system application.

logic module

The Virtex-5 logic fabric features a six-input look-up table (LUT) based on a 65nm process that provides the highest FPGA capacity. With improved carry logic, the device offers 30 percent better performance than its predecessor. The device consumes significantly less power due to fewer LUTs required and features a highly optimized symmetrical routing architecture.


Virtex-5 memory solutions include LUT RAMs, Block RAMs, and memory controllers to interface with large memories. The Block RAM architecture includes prefabricated FIFO logic, i.e. embedded error detection and correction (ECC) logic for external memory. Additionally, Xilinx provides comprehensive design resources to instantiated memory controller blocks in system designs through the Memory Interface Generator (MIG) tool. This allows users to take advantage of hardware-proven solutions and focus on other critical areas of the design.

Parallel and Serial I/O

SelectIO technology can implement almost any parallel source-synchronous interface required by the customer in the design. Using the SelectIO interface, it is easy to create a variety of industry-standard interfaces for more than 40 different electrical standards, as well as special-purpose interfaces. The SelectIO interface provides a maximum rate of 700Mbps (single-ended) and 1.25Gbps (differential).

All Virtex-5 LXT FPGAs integrate a GTP transceiver and operate at speeds between 100 Mbps and 3.2Gbps. In addition, GTP transceivers are among the lowest power MGTs in the industry, with less than 100mW of power per transceiver. With the introduction of proven design techniques and methodologies to simplify design, the flow of high-speed serial design becomes simple and fast.

In addition, through new design tools (RocketIO Transceiver Wizard and IBERT) and new silicon capabilities (TX and RX equalization with built-in Pseudo-Random Bit Sequence (PRBS) generator and checker), the various features and benefits of the ported architecture can be exploited , from parallel I/O standards to more than 30 serial standards and emerging serial technologies.

DSP module

Each DSP48E slice offers a performance level of 550MHz, allowing users to create a wide variety of applications requiring single-precision floating-point performance, such as multimedia, video and imaging applications, and digital communications. This expands the capabilities of the device over previous devices, while also providing power advantages, reducing dynamic power consumption by more than 40%. Virtex-5 FPGAs also increase the number of DSP48E slices, which are optimized for the ratio of available logic resources and memory.

Integrated I/O modules

All Virtex-5 LXT FPGA devices have an endpoint block that implements PCIe functionality. This hard IP endpoint block allows easy scaling from x1 to x2 and x4 or x8 with simple reconfiguration. The module (x1, x4 and x8 links) has passed rigorous PCI-SIG compliance and interoperability testing, allowing users to use PCIe with confidence.

Additionally, all Virtex-5 LXT FPGA devices feature a Tri-State Ethernet Media Access Controller (TEMAC) capable of speeds up to 10/100/1000Mbps. This module provides dedicated Ethernet functionality, combined with the Virtex-5 LXT RocketIO transceiver and SelectIO technology, for easy connection to many network devices.

Using these two modules for PCIe and Ethernet, a range of custom packet processing and networking products can be created that dramatically reduce resource utilization and power consumption. By using these various resources available in Xilinx FPGAs, intelligent solutions can be easily created and deployed.

SRIO embedded system application

Consider building an embedded system around an x86-based CPU. The CPU architecture has been highly optimized to easily meet a variety of applications that require playing with numbers in the palm of your hand. Users can easily implement a variety of algorithms in hardware and software using CPU resources to perform functions such as email, database management, and word processing that do not require extensive multiplication. Performance is measured in millions or billions of instructions/operations per second, while efficiency is measured in the time/cycles required to complete a particular operation.

High-performance applications that require a lot of fixed-point and floating-point operations take a long time to process the data. Examples of this include signal filtering, fast Fourier transforms, vector multiplication and searching, image/video analysis and format conversion, and simple digital processing algorithms. High-end signal processing architectures implemented in DSPs can easily perform these tasks and optimize such operations. The performance of these DSPs is measured in how many multiply and accumulate operations per second are performed.

Users can easily design embedded systems using CPUs and DSPs to take full advantage of both processing technologies. Figure 6 shows an example of a system using an FPGA, CPU, and DSP architecture.

The primary data interconnect in high-end DSPs is SRIO. The primary data interconnect in x86 CPUs is PCIe. As shown in Figure 6, users can easily deploy FPGAs to extend DSP applications or to bridge discrete data interconnect standards such as PCIe and SRIO.

How to enhance DSP co-processing capabilities and what are the applications
Figure 6 CPU-based scalable, high-performance, embedded system

In the system shown in Figure 6, the PCIe system is hosted by the Root Complex chipset. The SRIO system is hosted by the DSP. The 32/64-bit PCIe address space (base address) can be intelligently mapped to the 34/66-bit SRIO address space (base address). PCIe applications can communicate with the root complex through memory or I/O reads and writes. These transactions are easily mapped to SRIO space via NRead/NWrite/SWrite.

Designing such bridge functions in Xilinx FPGAs is straightforward because the back-end interfaces of these Xilinx endpoint function blocks, PCIe and SRIO are all similar. In this way, the “Packet Queue” module can perform crossover tasks from PCIe to SRIO or vice versa, creating a flow of packets that can traverse both protocol domains.

SRIO DSP system application

In applications where DSP processing is the main architectural requirement, the system architecture can be designed as shown in Figure 7.

How to enhance DSP co-processing capabilities and what are the applications
Figure 7 DSP intensive array

Virtex-5 FPGA-based DSP processing can be combined with other DSP devices in the system to form an intelligent co-processing solution. The entire DSP system solution can be easily expanded if SRIO is used as the data interconnect. Such solutions are future-proof, provide extensibility, and are supported in a variety of form factors. In DSP-intensive applications, fast numerical analysis or data processing can be achieved by offloading the corresponding processing tasks to the x86 architecture. The PCIe subsystem and SRIO fabric can be easily interfaced using Virtex-5 FPGAs for efficient function offload.

SRIO baseband system application

Existing 3G networks are maturing at a rapid pace, and OEMs are deploying new form factors to alleviate specific capacity and coverage issues. To address these particular problems and assess market trends, an FPGA-based DSP architecture is ideal, which uses SRIO as a data plane standard. In addition, early DSP systems can be quickly upgraded to a fast, low-power FPGA DSP architecture for scalability benefits.

As shown in the system in Figure 8, you can design a Virtex-5 FPGA to meet existing line-rate processing needs for antenna traffic, as well as provide connectivity to other system resources through SRIO. Existing early DSP applications have inherently slow parallel connections and are extremely easy to port due to the presence of the SRIO endpoint functionality available on Virtex-5 FPGAs.

How to enhance DSP co-processing capabilities and what are the applications
Figure 8 Scalable baseband uplink/downlink card


SRIO is emerging in a number of new applications, mainly centered on DSPs in wired and wireless applications. Key advantages of implementing SRIO architecture in Xilinx devices include: Availability of the entire SRIO endpoint solution; flexibility and scalability to make different levels of products using the same hardware and software architecture; technology enables low power consumption; easy configuration via CORE Generator software GUI tool; proven hardware interoperability with industry-leading vendors supporting SRIO connectivity on their devices; integrated I/O through use of PCIe and TEMAC O modules, enabling system integration, thereby reducing overall system cost.

In addition, Virtex-5 FPGAs have DSP resources that meet the requirements of existing earlier DSP systems in terms of power consumption, performance and bandwidth. Further advantages are in system integration, such as functional blocks for Ethernet MACs, endpoint blocks for PCIe, processor IP blocks, storage elements and controllers, and more. Additionally, overall system cost savings can be achieved due to the exhaustive list of IP cores that support multiple source integration in the FPGA.

The Links:   TT 250 N 18 KOF 25CN FP15R12YT3

Related Posts