CNets and Datapaths

Home

Generators vs. synthesis >>
<< CNets

Usenet Postings
  By Subject
  By Date

FPGA CPUs
  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  Transputers
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

Multiprocessors
  Multis and fast unis
  Inner loop datapaths
  Supercomputers

Systems-on-a-Chip
  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

CNets
  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

Miscellaneous
  Floorplanning
  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design
 

Google SiteSearch
Subject: Re: Has anyone ever used a C -> Xilinx netlister?
Date: 24 Oct 1996 00:00:00 GMT
newsgroups: comp.arch.fpga


Austin Franklin <darkroom-@ix.netcom.com> wrote in article
<01bbbefd$7414e3b0$a1cab7c7@drt1>...
> There are two C -> Xilinx netlisters I know of, and I am curious if
anyone
> has any experience with them, or any others.
> 
> The first I know of is called NLC, developed by Christian Iseli at the
> System Logic Lab of the Federal Polytechnical School of Lausanne,
> Switzerland.  The second I know of was developed by the networking group
of
> Digital in California.
> 
> Thanks,
> 
> Austin Franklin
> darkroom-@ix.netcom.com

Certainly Digital has one.

I have my own tool, called CNets, which emits XC4000 XNF.  It is a set of
classes which extend C++ with user defined types to describe nets, buses
(vector of nets), and gate expressions.  There are also functions to emit
primitives such as flip-flops, SRAMs, TBUFs, IBUFs, etc.  My processor
designs are specified in CNets.

I wrote CNets for three reasons.
* I was growing old drawing and labeling 32-bit datapath elements in
ViewLogic
* the DOS WorkView tools didn't seem to work very well under Win95 or NT,
and I loathe DOS
* I couldn't afford Verilog or VHDL tools, and neither could other people
(computer engineering hobbyists) with whom I wish(ed) to share my tools

In retrospect, and particularly in light of recent discussions about doing
Xilinx designs using VHDL, CNets, though primitive, has turned out to be
quite a pleasant design environment -- the best of both worlds.

Like schematic capture, all primitive instantiations, placements, and other
constraints, are explicit (if you wish).  No "pushing on a rope" to get the
design you have in mind through the HDL synthesis tool.  Floorplanning
datapaths is a breeze.  And since I tend to do my own "mental technology
mapping" (of gates into LUTs), by default CNets automatically emits FMAPs
for 2-4 input logic functions.

Like HDLs, CNets is based on a programming language, therefore it is easy
to have functions, loops, conditionals, variable-width structures,
lookup-tables, conditional emission of logic, etc.  Best of all, CNets
designs recompile, relink, run, and emit the new XNF, ready for place and
route, in less than ten seconds.  Wait another 30 minutes for PPR and
you're done. :-)  Also, for users familiar with C++, CNets should be more
familiar than Verilog or VHDL.

To give you a feel for CNets, attached below are some examples of code from
one of my designs.  Keep in mind this was a first cut, a quick hack, to get
my darn designs going.

This past summer I started CNets, version 2, in Java -- "JHDL" if you will.
 Key new features included better support for design 'modules' and a
cycle-based simulator.  Unfortunately, I didn't finish this project.  I may
yet try again this fall in my copious spare time.

Cheers,

Jan Gray



lfsr_div emits lfsr counters to divide by arbitrary n.  It is similar to an
XBLOX counter with style LFSR.  The first half of the function determines
how many bits are in 'n', what the lfsr shift register bits 'w' look like
after 'n' clockings.  The second half of the function actually emits an
n-bit shift register, whose input is the xnor of certain taps off the shift
register.  The formal arguments are
	out -- the net which is true after n clockings of the LFSR counter
	ce -- the counter clock enable
	reset -- if true, reset the counter on next clock edge
	n -- the divisor

// emit an lfsr counter and decoder to divide by n
//
// See "Efficient Shift Registers, LFSR Counters, and
// Long Pseudo-Random Sequence Generators", Peter Alfke,
// Xilinx App Note, Aug. 1995
//
void lfsr_div(Net out, Net ce, Net reset, unsigned n) {
	Env e;
	e.setModule(out.getName());

	// choose appropriate width counter
	static unsigned taps[32][4] = {
		{ 0 }, { 0 }, { 0 }, { 3, 2 },
		{ 4, 3 }, { 5, 3 },	{ 6, 5 }, { 7, 6 },
		{ 8, 6, 5, 4 }, { 9, 5 }, { 10, 7 }, { 11, 9 },
		{ 12, 6, 4, 1 }, { 13, 4, 3, 1 }, { 14, 5, 3, 1 }, { 15, 14 },
		{ 16, 15, 13, 4 }, { 17, 14 }, { 18, 11 }, { 19, 6, 2, 1 },
		{ 20, 17 }, { 21, 19 }, { 22, 21 }, { 23, 18 },
		{ 24, 23, 22, 17 }, { 25, 22 }, { 26, 6, 2, 1 }, { 27, 5, 2, 1 },
		{ 28, 25 }, { 29, 27 }, { 30, 6, 4, 1, }, { 31, 28 }
	};
	check(n <= (1 << 30));
	for (unsigned bits = 1; n >= (1U << bits); bits++)
		;
	check((1U << (bits-1)) <= n && n < (1U << bits));

	// determine bit pattern of terminal state (after n-1 clockings of lfsr)
	unsigned w = 0;
	for (unsigned i = 1; i < n; i++) {
		unsigned in = 0;
		for (unsigned j = 0; j < 4 && taps[bits][j]; j++)
			in ^= (w >> (taps[bits][j]) - 1) & 1;
		w = ((w << 1) & ((1 << bits) - 1)) ^ !in;
		check(w != 0);
	}

	// emit shift register and gates to compare to terminal state
	bus(lfsr, bits+1);
	out = lfsr(bits,1) == w;
	lfsr[0] = gnd;
	net(lfsr_in) = xnor(lfsr[taps[bits][0]], lfsr[taps[bits][1]],
                        lfsr[taps[bits][2]], lfsr[taps[bits][3]]);
	net(lfsr_reset) = out | reset;
	ff(lfsr[1], lfsr_in & ~lfsr_reset, ce);
	for (i = 2; i <= bits; i++)
		ff(lfsr[i], lfsr[i-1] & ~lfsr_reset, ce);
}
		
Some of the trickier constructions:

 out = lfsr(bits,1) == w;
  -- drive 'out' with an AND gate which is true when the bits lfsr[bits:1]
     match the bit pattern in the integer 'constant' w.
	
 net(lfsr_in) = xnor(lfsr[taps[bits][0]], lfsr[taps[bits][1]],
                     lfsr[taps[bits][2]], lfsr[taps[bits][3]]);
  -- drive /lfsr_in with the XNOR of up to four taps off the shift
     register.

 ff(lfsr[i], lfsr[i-1] & ~lfsr_reset, ce);
  -- emit a flip-flop whose D input is the AND of lfsr[i-1] and NOT
     lfsr_reset, whose clock enable is 'ce', whose Q output is lfsr[i].



This next piece of code defines a 'cbit'-bit pipelined RISC datapath. 
Sorry for the sparse comments.  This is *not* what I would consider
production code!  You may wish to visit
http://www3.sympatico.ca/jsgray/sld020.htm to see what this does.
....
for (i = 0; i < cbit; i++) {
	unsigned r = rowForBit(i);
	unsigned t = 1 + even(i);

	// instruction register
	ff(ir[i], mem.d[i], c.irce, _, loc(r,colIR));

	// PC incrementer and MAR (memory address register)
	m2(marmux[i], pcincr[i], adder[i], c.addersel, loc(r,colMMAR));

	if (i >= 2)
		ff(pc[i], marmux[i], c.pcce, _, loc(r,colPC));
	ff(mem.ad[i], marmux[i], c.marce, _, loc(r,colMAR));
	tbuf(res[i], pc[i], c.pct, tloc(r,colPCT,t));
	if (i >= 2 && even(i))
		inc2(pcincr[i+1], pcincr[i], pccin[i+2], pccin[i+1], pc[i+1], pc[i],
             pccin[i], loc(r,colPCIncr));

	// result bus and write back
	ff(wb[i], res[i], c.wbce, _, loc(r,colWB));

	// register file, A and B operand buses
	ram(rfa[i], wb[i], c.rna, c.awe, loc(r, colRFA+odd(i)));
	m2(ma[i], rfa[i], res[i], c.forward, loc(r,colMA));
	ff(a[i], ma[i], c.ace, _, loc(r,colA));
	ram(rfb[i], wb[i], c.rnb, c.bwe, loc(r, colRFB+odd(i)));
	ff(dout[i], rfb[i], c.doutce, _, loc(r, colRFB+odd(i)));
	m2(mb[i], rfb[i], ir[i= 4)     ? a[i-4] : rola[cbit-4+i], c.shl4t,
         tloc(r,colShiftT  ,t));
	tbuf(res[i], (i >= 2)     ? a[i-2] : rola[cbit-2+i], c.shl2t,
         tloc(r,colShiftT  ,t));
	tbuf(res[i], (i >= 1)     ? a[i-1] : rola[cbit-1+i], c.shl1t,
         tloc(r,colShiftT+1,t));
	tbuf(res[i], (i < cbit-1) ? a[i+1] : rora[1-cbit+i], c.shr1t,
         tloc(r,colShiftT+2,t));
	tbuf(res[i], (i < cbit-2) ? a[i+2] : rora[2-cbit+i], c.shr2t,
         tloc(r,colShiftT+3,t));
//	tbuf(res[i], (i < cbit-4) ? a[i+4] : rora[4-cbit+i], c.shr4t,
         tloc(r,colShiftT+5,t));

	// data in
	tbuf(res[i], mem.d[i], (i < 8) ? c.dinbytet : (i < 16) ? c.dinhalft :
                                           c.dinwordt, tloc(r,colDinT,t));
	// zero/sign extension
	if (i < 8)
		;
	else if (i < 16)
		tbuf(res[i], c.halfsex, c.halfsext, tloc(r,colExtT,t));
	else
		tbuf(res[i], c.wordsex, c.wordsext, tloc(r,colExtT,t));

	// data out
	tbuf(res[i], dout[i], c.doutt, tloc(r,colDoutT,t));
	tbuf(mem.d[i], res[i], (i < 8) ? mem.doutbytet : (i < 16) ? mem.douthalft
                                   : mem.doutwordt, tloc(r,colDBusDoutT,t));
}

Things to note here.
* c.XXX are control signals from the "control unit" module
* mem.XXX are memory datapath signals from the "on-chip memory bus" module
* some elements are commented out -- try commenting out parts of your
schematic sometime.
* constraints -- the optional calls to 'loc()' and 'tloc()' apply placement
constraints, forcing elements to certain (row,col) positions; the various
colXXX are just enum constants, so it is easy to refloorplan things; note
no rloc()s yet!

Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001