fpgacpu.org - Regexps and Parsing in FPGAs

Regexps and Parsing in FPGAs

Home

Life in an FPGA >>
<< FPGAs as coprocessors

Usenet Postings
  By Subject
  By Date

FPGA CPUs
  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  Transputers
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

Multiprocessors
  Multis and fast unis
  Inner loop datapaths
  Supercomputers

Systems-on-a-Chip
  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

CNets
  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

Miscellaneous
  Floorplanning
  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design

Newsgroups: comp.arch.fpga
Subject: regular expression matching and parsing in FPGAs (was chess...)
Date: Fri, 24 Dec 1999 12:30:30 -0800

I'm not sure how regular expression matching and parsing relates to chess,
but these can of course be done at tremendous speeds in an FPGA.  Here are
some ideas.

REGULAR EXPRESSIONS

A regular expression (including compositions of simpler REs) can be
implemented in hardware through these transformations:

1. convert the RE to an NFA (non-deterministic finite-state automaton)
2. convert the NFA to a DFA (deterministic FA) using the subset construction
3. implement the DFA as a one-hot FSM (finite state machine).

Here each state is a flip-flop.  Transitions between states are
combinatorial expressions of the states and the input.  If the input is
encoded, e.g. as 8-bit characters, there would probably be additional logic
to decode characters ('.') or character classes ([0123456789]).

This could consume one character per clock at 100-200 MHz.

LR PARSING

A parser for a context-free grammar can be implemented in hardware through
these transformations.

1. derive LR-parser shift-reduce tables (actions, goto, etc.) from the CFG
productions.
2. implement the shift-reduce parsing algorithm approximately as:

  // types
  enum Token { term1, ..., nTerminals, nonterm1, ..., nTNT};
  enum State { s0, s1, ..., nStates };
  enum Prods { ..., nProds };
  enum Action { accept, error, shift, reduce1, reduce2, ... };

  // tables
  Action actions[nStates][nTNT];
  State goto[nStates][nTNT];
  Token lhs[nProds]; // token on lhs of production, e.g. X for X ::= Y1 ... Yn
  int rhs_len[nProds]; // length of rhs of production, e.g. n for above

  // shift-reduce parser
  State stack[]; // a stack of states
  int depth= 0; // index of top-of-stack
  State s = s0;
  stack[depth] = s;
  Token input = first token;
  repeat {
      Action a = actions[s][input];
      if (a == accept)
          stop, accept;
      else if (a == error)
          stop, error;
      else if (a == shift) {
          s = stack[++depth] = goto[s][input];
          input = next token;
      }
      else if (a == reduce p) {
          s = stack[depth -= rhs_len[p]];
          s = stack[++depth] = goto[s][lhs[p]];
      }
  }

3. implement *that* in an FPGA:

For example, for parsing C, from off the top of my head,
  nStates < 256,
  nTNT < 128,
  nProds < 100,
  Stack depth <128.

So implement 'input' in a 7-bit-wide FIFO; 'stack' in a 128x8 SRAM; 'lhs' in
a 100x7 SRAM; 'rhs_len' in a 100x4 SRAM, and 'actions' and 'goto' in
external SSRAM (for C, they won't fit in a few block rams).  'Stack' is
indexed by depth; 'actions' by s and input; 'goto' by s and mux(input,
lhs[p]).  Implement depth in an accumulator (a 7-bit register + an adder of
+1 or - prod_size[p]).

Perhaps you could clock this parser at 100 MHz.  A pity that it is not easy
to pipeline the action and goto lookups.

So here's a sketch of an FCCM to syntax check a preprocessed C program.
Characters come in at 200 MHz to the lexical analyzer, which uses the RE
recognizer described above to deposit tokens into a FIFO.  The C parser
drains the FIFO, consuming a token approximately every few cycles at 100
MHz.  Overall the machine gobbles C code at 200 MB/s, or about 5 million
lines per second, and then either the green Accept LED or the red Error LED
lights up. :-)

(Interesting gedanken experiment but I'd rather do it in software, thanks.)

(I would be surprised if this (shift-reduce parsing in hardware) has not
been studied already.  Also, I wonder if there are alternative
implementations of the LR parser algorithm that can better use 1-hot
encodings.  The problem is the current state gets pushed and popped, and a
stack of 1-hot states is not very storage efficient; thus you'd have to
encode them into a binary representation and decode them back to a 1-hot
encoding.)

(I suppose there might be real-world applications of work like this in areas
like text classification and DNA sequencing.)

Ref: see e.g. Compilers: Principles, Techniques, and Tools, Aho Sethi and
Ullman, or Crafting a Compiler, Fischer and LeBlanc.

Jan Gray