Tuesday, September 25, 2007

Apparently Fast Erlang File Read and Regexp

Claes Wikstrom sent a link to the erlang-questions list to his faster regexp library. No report yet on speed...

...the only fast way today to process a large file line/by/line is to
  1. file:open(Filename, [read, raw])
  2. In a loop {ok, Bin} = file:read(Fd, BufSize)
  3. Use a binary regex matcher such as...

(I don't know the state of the regex lib in OTP today, last time I looked it sucked bigtime though)


Here's an example of its use. Note it uses Erlang's binary representation (a sequential hunk of memory) of strings instead of its list-of-characters representation...
Erl Interface to posix regular expressions by klacke@emailaddress.xyz
LICENSE: BSD style, free, use,  molest and rewrite

To build, make and sudo make install

To use:

1. Compile your regexp.

4>  {ok, RE} = posregex:compile(<<"abc.*foo">>, [extended]).

Try to match something 

7> posregex:match(RE, <<"abc mre text here foo">>, []).

If it doesn't match 

9> posregex:match(RE, <<"abdc mre text here foo">>, []).

Try to match and find out where the match occured

10> posregex:exec(RE, <<"abc mre text here foo">>, []).  

Free memory occupied by the compilation (or exit process since
RE is an erlang port)

11> posregex:free(RE).

