Synthesized CPU Core Issues


Register files >>
<< FPGA CPU Speeds

Usenet Postings
  By Subject
  By Date

  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

  Multis and fast unis
  Inner loop datapaths

  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design

Google SiteSearch
Subject: Re: FPGA vs CPLD? Any Experts out there?
Date: 12 Apr 1999 00:00:00 GMT
Newsgroups: comp.arch.fpga

Weri Kuolstad wrote in message <7etaf5$1ev$>...
>Hi Jan,
>       I have been following this thread very closely. I am designing a
>CPU based on the MIPS 2000 from Computer Organization and Design : The
>Hardware/Software Interface" by John Hennessy and David Patterson onto an
>ORCA2C40 FPGA. Obviously I have that book. I also have the new Michael
>book on Verilog that has the Xilinx Student Edition (I don't have the book
>right now with me to quote the exact version #.) I am doing this design in
>Verilog with two main design goals 1. Describe the entire design at a
>behavioral level in Verilog
>                                   2. Get the entire 32-bit design to fit
>the ORCA2C40.
>I would appreciate any book/link suggestions.
>Thank you.

Be careful.  An ill-prepared behavioral design may be much larger or slower
than necessary.

To achieve a feasible, fast, and/or small FPGA implementation of a RISC
processor, or anything else, you must first determine how your datapath maps
to FPGA device primitives, e.g. 4-LUTs, FFs, BUFTs, RAMSs, CYs, etc.  I
think this is crucial.  You must study and internalize your FPGA data
sheets, and, if available, review exemplary implementations.

Only when you understand where (and how and how many of) your rams, adders,
registers, muxes, etc. should fall on the die, only then, should you write
your first line of Verilog or draw your first FDCE.

For the specific case of an instruction set compatible processor
implementation: only when you understand what should be implemented in
hardware, what in state machines, and what should trap to software, only
then should you "break out the Verilog".

For example, MIPS-I implies a 32-bit barrel shifter.  These are expensive to
implement in an FPGA, comparable in area to a modest I-cache.  If you
thought about how a barrel shifter would map to device primitives, you might
instead profitably design a small, multi-cycle shifter, perhaps one which
only does 1- and 4-bit shifts each cycle, saving LOTS of chip area for other

Another example.  MIPS-I implies a 1-cycle branch delay.  In a
straightforward implementation of the pipelined datapath sketched in
Hennessey and Patterson, this would require 2 PC adders, one for PC+4 and
one for PC+branch-displacement, and a MUX selecting between them.  Instead,
if you can accept a 2-cycle branch latency (e.g. one branch delay slot and
one annulled cycle on branch taken), you can build a circuit in about 1/3
the area (PC + cheap-mux(4,sign-ext(branch-disp))).

Sooo, once you have decided what you expect the tools to output in the end,
then it's a simple matter of "pushing on a rope" to get your particular
elaboration tools to map your design specification into the right inputs to
your FPGA vendor's implementation tools.  Schematics give you more direct
control, HDLs more parameterization, netlist generators, the best of both
worlds (at the expense of incompatibility with anything else).

If you take this advice to heart, you should have no trouble fitting your
design into a 2C40.  IIRC that has 30x30 4-bit PFUs.  My first 32-bit
pipelined RISC, which did most of the MIPS-I integer instructions, had a
datapath that was 16x11 2-bit CLBs, e.g. only about 5% of a 2C40.  See my
datapath floorplan slide (in or at for an example.

If you can figure out how to make your behavioral Verilog source code
compile to the desired device primitives, I would use that.  Otherwise I
would try to specify the datapath (only) in structural Verilog.  I
experimented with this last year using Foundation / FPGA Express Verilog
with good results, although I had a little helper script to generate a UCF
file to constrain the resulting primitives' LOCs to my desired floor plan.
I look forward to using other Verilog compilers which reportedly can pass
FMAP and RLOC attributes through to the FPGA implementation tools.

I had not heard of this Celitti book w/ XSE, can you provide more

Jan Gray

Subject: Re: FPGA vs CPLD? Any Experts out there?
Date: 12 Apr 1999 00:00:00 GMT
Newsgroups: comp.arch.fpga

Jan Gray wrote in message <7ettgk$ddr$>...
>Only when you understand where (and how and how many of) your rams, adders,
>registers, muxes, etc. should fall on the die, only then, should you write
>your first line of Verilog or draw your first FDCE.

I don't like my own advice here, so let me try again.

Implementing a processor or other substantial design is an iterative process
with subproblems which require analysis and experimentation.  The more
expert you are with your tools and with the device architecture, the less
experimentation you'll need.  If you're new to FPGA design, I think taking
some time to try out different solutions to the subproblems will help to
save time overall and achieve a better result.

Some of the subproblems to investigate include:
* how to implement a register file?  a 2 read / 1 write port register file?
* how to source an operand from a register or an immediate field
* how to implement an ALU? a shifter?
* how to multiplex the many results (incl. ALU, shifts, loads, sign exts
(lbs), jal's)
* how to implement zero/negative/carry/overflow detect?
* what is the external memory or on-chip bus interface like?
* how to implement load/store byte lane alignment logic?
* how to implement an instruction register? a program counter? incrementing
it? branch displacements?
* how to pipeline the design? how many stages are beneficial? how to stall
pipe? how to annul insns?
* how to deal with pipeline hazards? memory not ready? branch/jump shadows?
data hazards?
* where to implement the effective address adder?
* should memory be 1- or 2- ported? how to mux eff. addr. with PC?
* how to do interrupts and return from interrupt?
* what is the clock discipline? rising or both edges? 1 or multiple clocks
per insn?
* what are the critical paths? what is the feasible cycle time? what is the
required cycle time?
* is any retiming needed?

Some of these analyses will benefit from actually designing the subunit and
observing what the tools produce, including layouts and delays (EPIC /
static timing analysis). And trying some alternatives.

Then you'll know approximately how much area and time it takes to do a
register file writeback and read vs. an add vs. a wide-mux vs. a 32-bit zero
detector and will be able to make intelligent tradeoffs.

Have fun!

Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001