LDA 40G Portfolio

40G with no compromises on latency

As 40G Ethernet gains interest in the trading industry, LDA Technologies continues supporting its customers in maintaining leading positions in the market and introduces a set of 40 GbE products.

40G Mux

Powered by the Neo platform, NeoMux 40G ultra-low latency muxing solution features:

  • Latency:
    • Below 49 ns latency client to exchange
    • Below 2 ns exchange to client (L1 replication)
  • Nine client ports (40G)
  • One exchange port (40G)
  • One 10G Uplink store-and-forward port to Exchange for housekeeping (operations) purposes (BGP, Multicast etc)
  • Various statistics, including CRC errors for all client and exchange port.
  • Fiber Lane Alignment Assistant
    Helps identifying incorrectly connected fiber lanes to exchange and clients ports.
  • Client 40G port Skew Monitor.
    Monitors unnecessary latency introduced by 40G lane skew on client side.
  • 16KB buffer per client port.

Rate Converter

40 - 10 - 40 Gigabit Ethernet rate converter is another LDA solution running on the Neo platform.

Based on LDA's unique 644 Mhz / 16bit 10G Ethernet IP core, it provides conversion between 40 Gb and 10 Gb Ethernet protocols with unmatched latency of under 50 ns.

40GbE -> 10GbE conversion is cut-through with the latency measured from the first byte in to the first byte out.

10GbE -> 40GbE conversion is implemented using the “smart prebuffering” technique. The solution analyzes the Ethernet frame for latency-critical protocols such as IPv4 and expected packet length. Since 40G Ethernet is precisely four times faster than 10G, ¾ of the packet is prebuffered before releasing it into a 40G medium. In the time required for the 40G side to forward the prebuffered data, the remaining quarter of the packet arrives and goes straight into 40G medium, thus compensating for serialization delay and providing under 50ns last-to-last latency.

Protocols that are not latency-critical (such as ARP, IGMP) are processed in store-and-forward mode, so the packet is wholly prebuffered on the 10G Ethernet side before being released to 40Gb.

For each incoming 40G line, the rate converter supports eight outgoing 10G ports, thus providing rebroadcasting functionality on 40G to 10G direction and muxing functionality on the way from 10G to 40G.

There is a large buffer on 40G -> 10G direction that absorbs the bursts happening on the 40Gb side, which allows minimizing packet losses that may occur due to significant data rate difference.

The solution offers received packets, CRC errors, dropped packets, forwarded packets, and data rate statistics per port.

MAC/PCS

LDA's 644 MHz 40 GbE MAC/PCS is the lowest latency IP core on the market with a roundtrip latency of 45.6 ns.

Optimized for the HFT industry, it provides an interface for seamless integration into existing code resulting in instant latency reduction.

The core fully integrates the custom Xilinx GTY wrapper exporting only essential interfaces such as clocks, GTY SerDes inputs and outputs, and AXI streaming interfaces.

The core provides IP-level Synchronous Ethernet (jitter attenuators) support.

IP Core Interfaces

The 644 MHz FPGA core provides multiple user interfaces:

  • 64-bit AXI Streaming interface running at 644 Mhz
  • 128-bit AXI Streaming interface running at 322 Mhz
  • 64-bit RX PCS bus for ultra-low latency parsing of received data
  • 128-bit 322 Mhz AXI Streaming interface is implemented using the multicycle path approach, which means that data delivered to the user interface is not delayed through an extra clock conversion FIFO.

Latency Information

The roundtrip latency of the MAC/PCS core is 45.6 ns, including transceivers on Xilinx Ultrascale Plus -3 speed grade FPGAs.

The core provides a FULLY REGISTERED AXI Streaming user interface.

FPGA Utilization

The resource utilization* per MAC/PCS core:

  • 3850 CLB LUTs
  • 3300 CLB Registers
  • *Measured on VU9P FPGA.

322.265MHZ 128-bit Mode

128-bit mode allows seamless integration into existing systems designed for conventional 40 Gb Ethernet IP. 128-bit mode provides “single clock domain” architecture with a built-in RX clock domain crossing FIFO.