[M3devel] code submission

Jay jayk123 at hotmail.com
Mon Feb 18 20:50:20 CET 2008


aka "prefix tree"
if I want to recognize the strings, foo, food, fudge, and bar
look at the first character
  if it is 'b', go ahead and compare to bar (or the tail to "ar")
  if it is 'f'
   look at the second character
     if it is 'u', go ahead and compare to "fudge" (or the tail to "dge")
     if it 'o'
        compare the third character
        and so on
 
Whether or not the data "stops" at unique prefixes is an implementation detail.
If it does, the leaves can contain the entire string or just the tail.
I've fiddled with generating large nested switch statements from a list of strings.
I noticed there are many optimizations you can make.
Of course, first you check the length.
Then, really, instead of progressing from left to right, you can pick out characters with a higher switching factor.
For example if I have the strings foof and food, just look at the last character.
 
As well a hash based algorithm is probably best.
Precompute the hashes of all the strings.
Sor them.
Hash your input string.
Binary search for the hash.
And the compare for an exact match.
 
If you are going to do the final compare anyway, then you are going to touch the whole string anyway, you might as well hash it and do a binary search I think. Make sure the hash function is fast of course, and not "perfect".
 
 - Jay
 



> Date: Mon, 18 Feb 2008 20:35:36 +0100> From: wagner at elegosoft.com> To: mika at async.caltech.edu> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] code submission> > Quoting Mika Nystrom <mika at async.caltech.edu>:> > > Hello everyone,> >> > I was looking at some code I've written and realized that I have a> > very small piece of tested code that may be of interest to other> > Modula-3 users. It's a generic "trie" that I coded a few years> > ago. Does anyone have an opinion on adding it to cm3? Where, if so?> >> > http://www.async.caltech.edu/~mika/trie/> >> > It's useful for parsing; if you're parsing a language with longish> > keywords it is many times faster to use this structure and ARRAY> > OF CHAR than using the standard hash tables with TEXTs. (I developed> > it to parse an old "punch-card", i.e., 80-character fixed-width> > records, format for financial data.)> > Of course we could add this as a small package to m3-libs, or> even to libm3 (it looks rather small).> > I hate to sound so uneducated, but what exactly does `trie' stand for?> It's not contained in my favourite online dictionary either.> > Olaf> > > > -- > Olaf Wagner -- elego Software Solutions GmbH> Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95> http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194> 
_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080218/0e0e8ce0/attachment-0002.html>


More information about the M3devel mailing list