fpgacpu.org - Register File Implementation

Register File Implementation

Home

Register files (2) >>
<< Synthesized CPUs

Usenet Postings
  By Subject
  By Date

FPGA CPUs
  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  Transputers
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

Multiprocessors
  Multis and fast unis
  Inner loop datapaths
  Supercomputers

Systems-on-a-Chip
  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

CNets
  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

Miscellaneous
  Floorplanning
  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design

Newsgroups: comp.cad.synthesis,comp.lang.verilog,comp.arch.fpga
Subject: Re: Read/Writes to memories/register files for PIC core
Date: Fri, 18 Jun 1999 15:58:26 -0700

Rickman wrote in message <376A87F0.577C41CD@yahoo.com>...
>... You could even use different
>addresses on the read and write by using a dual port ram. Or if you
>multiplex the address, you can still use different addresses on read and
>write by clocking the read data into an output register on the falling
>edge of the clock and changing the address with the clock as well. You
>will need to be very careful about timing of the multiplexer in this
>case. If it changes too quickly, you will not meet the address
>setup/hold times on the ram.

Yes!  For my 1+eps instruction/clock XC4000 RISC processor register files, I
choose the second technique above, because it saves significant area.

Each clock, write a result back into the register file ram on the clock
rising edge and read an operand and capture it into the ram's CLB's FFs on
the clock falling edge.  A multiplexor selected by clock provides the write
address (destination register no.) while clock is low (e.g. ahead of the
clock rising edge) and the read address (source register no.) while clock is
high.

(Another timing issue: assume you are writing a new result to register 1 and
in the next half clock are reading/latching register 1 as a source operand.
You must provide enough time for the new value to be written *and* read out
before that clock falling edge.  Presumably this is no longer than tWOS +
tICK, but intra-CLB should be better.)

Keep two copies of the register file to enable two arbitrary read ports.

Explicitly floorplan the register file and its register file address mux.
The latter can be placed directly above the column of rams to drive the
vertical longlines on that column.

The Xilinx dual port distributed rams are nice but they are half as dense as
the single port ones.  (Also note, one dual port ram bank provides one write
and two read ports IF your instruction set is entirely of the form dest =
dest op src and provided you don't pipeline your datapath.)

Example RISC CPU register file / datapath area costs using single and dual
port ram:

Word size / # Regs / Target / Datapath area (CLBs) / 2R1W regfile area
(CLBs) / ram type / percent area
---
16-bit / 16 regs / XC4005XL / 8x8 / 8x2 / single port / 25%
16-bit / 16 regs / XC4005XL / 8x10 / 8x4 / dual port / 40%

32-bit / 16 regs / XC4010XL / 16x8 / 16x2 / single port / 25%
32-bit / 16 regs / XC4010XL / 16x10 / 16x4 / dual port / 40%

32-bit / 32 regs / XC4010XL / 16x10 / 16x4 / single port / 40% (*)
32-bit / 32 regs / XC4010XL / 16x14 / 16x8 / dual port / 57% (**)

(*) see e.g. http://www3.sympatico.ca/jsgray/sld021.htm

(**) or worse, since we must fashion each 32x1 dual port ram from two 16x1
ones, requiring a 2-1 mux per bit, and since  M1 tools do not allow two
FMAPs and an HMAP to be RLOC'd to the same CLB.

Now then, register file design for an FPGA VLIW machine, *that* is a fun
topic.

Jan Gray

Subject: Re: Read/Writes to memories/register files for PIC core
Newsgroups: comp.arch.fpga
Date: Mon, 21 Jun 1999 00:40:21 -0700

[Newsgroup list trimmed to comp.arch.fpga only.]

ems-@riverside-machines.com.NOSPAM wrote in message
<376b6ef7.3817695@news.dial.pipex.com>...
>On Fri, 18 Jun 1999 22:58:49 GMT, "Jan Gray" 
>wrote:
>>(**) or worse, since we must fashion each 32x1 dual port ram from two 16x1
>>ones, requiring a 2-1 mux per bit, and since  M1 tools do not allow two
>>FMAPs and an HMAP to be RLOC'd to the same CLB.
>
>I'm not quite sure what you mean by this (a 16x1 dual-port RAM
>requires both the F and G function generators, so you couldn't pack 2
>in a 4K CLB), but M1 has no problem with 2 FMAPs and an HMAP in the
>same CLB - for a VHDL example, see:
>
>http://www.riverside-machines.com/pub2/xilinx/vhdl_rpm/top.htm
>
>You'll have to manually navigate to rloc4.vhd, since my ISP keeps
>messing around with 'upgrades' and screwing up FTP accesses.

Sorry, I was neither clear nor (strictly speaking) correct.  I should have
said that (in my experience) M1 does not allow two FMAPs and a *fully
independent 3-input* HMAP to be RLOC'd to the same CLB.

To achieve a 32x1 dpram, we need 2 16x1 dprams (2 CLBs) and a 2-1 mux.  (To
write, we assert one dprams' WE.  To read, the mux selects one of the two
rams' outputs.)  Depending upon the floorplanned logic surrounding the
regfile rams, we can sometimes implement the mux "for free" in nearby,
otherwise unused, H function generators.

Unfortunately, M1 does not allow you to constrain two FMAPs and an HMAP to a
CLB, *if* none of the 3 H inputs is an F output, as is the case here.  So
"there ain't no such thing as a free mux."

See also http://deja.com/getdoc.xp?AN=415774321 and
http://www.xilinx.com/techdocs/2289.htm.

Your example, rloc4.vhd, has 2 FMAPs and an HMAP constrained to one CLB, but
one of the HMAP inputs is an F generator output F_OUT.

Thank you for your helpful web pages!

Jan Gray