GRFPU High-Performance Floating-Point Unit

Overview

The GRFPU is an IEEE-754 compliant floating-point unit, supporting both single and double precision operands. The advanced design combines high throughput with low latency. The host interface is clean and versatile, simplifying the interfacing to processor pipelines and DSPs. The accuracy and convergence of the FPU algorithms have been proven mathematically, and the implementation has been validated with more than 20 million test vectors.

  • IEEE-754 compliant, supporting all rounding modes and exceptions
  • Operations: add, subtract, multiply, divide, square-root, convert, compare, move, abs, negate
  • Data formats: single and double precision (32- and 64-bit floats)
  • Fully pipelined, 3 clock cycles latency for all operations except divide and square-root
  • Non-blocking parallel execution of divide and square-root operations
  • Clean and versatile interface
  • LEON FP Control unit available
  • Supports all SPARC V8 floating-point instructions
  • 250 MHz (250 MFLOPS) on a typical 0.13um standard cell process using less than 100 kgates
  • Fault-tolerant (FT) version available

Operation

Functional Description The GRFPU performs operations on single and double precision floating-point operands. All operations are IEEE-754 compliant, with exception of denormalized numbers which are flushed to zero. The specified four rounding modes and the detection of exception conditions is fully supported.

alt

An FPU operation is started by providing the operands, opcode and rounding mode on a rising clock edge. The result and the exception flags will be available three clocks later. The FPU is fully pipelined and a new operation can be started every clock cycle. The only exceptions are the FDIV and FSQRT instructions which require between 15 and 24 clock cycles to complete, and which are not pipelined. They are however calculated in a separate non-blocking execution unit, allowing all other operations to be performed in parallel without stalling the FPU pipeline. The table below summarises the throughput and latency of the supported operations:

Operation Throughput Latency Description
FADDS, FADDD, FSUBS, FSUBD, FMULS, FMULD, FSMULD 1 3 Add, subtract, multiply
FITOS, FITOD, FSTOI, FDTOI, FSTOD, FDTOS 1 3 Convert between floats and integers
FCMPS, FCMPD, FCMPES, FCMPED 1 3 Compare
FDIVS/FDIVD 15/16 15/16 Divide (single/double)
FSQRTS/FSQRTD 23/24 23/24 Square-root (single/double)

Validation

The GRFPU core has been extensively validated with a large set of test vectors. Special test programs such as TestFloat, UCBTEST and IEEE CC754 has been used, as well as floating-point based application software.

LEON FPU Control Unit

The GRFPU can be attached to LEON processors through the LEON FPU Control unit (GRFPC). The control unit receives SPARC FPU instructions (FPOP) from the LEON integer unit, and schedules them for execution by the FPU. The FPOPs are executed in parallel with other integer instructions, the LEON pipeline is only stalled in case of operand or resource conflicts. The GRFPC also includes the FPU register file, the processor floating-point status register (FSR) and a deferred trap queue. The GRFPC is available for all versions of the LEON processor.

alt

Fault-tolerance

The fault-tolerant version of GRFPU and GRFPC includes SEU protection by design.

Documentation

Document File Date
GRFPU/GRFPC White Paper grfpu_wp.pdf 6-Jul-2004
GRFPU presentation from DASIA 2004 grfpu_dasia.pdf 6-Jul-2004

For evaluation purposes, Xilinx and Altera netlists of GRFPC/GRFPU for LEON2/3 are available from the LEON download page.

Availability

GRFPU and GRFPC are available immediately and licensed together.

The GRFPU has been used several critical applications, in particular in the Aeroflex UT699, UT699E and UT700 devices as well as in our GR740 device.