"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Tuesday, September 25, 2007

Apparently Fast Erlang File Read and Regexp

Claes Wikstrom sent a link to the erlang-questions list to his faster regexp library. No report yet on speed...

...the only fast way today to process a large file line/by/line is to
  1. file:open(Filename, [read, raw])
  2. In a loop {ok, Bin} = file:read(Fd, BufSize)
  3. Use a binary regex matcher such as...
http://yaws.hyber.org/download/posregex-1.0.tgz

(I don't know the state of the regex lib in OTP today, last time I looked it sucked bigtime though)

/klacke

Here's an example of its use. Note it uses Erlang's binary representation (a sequential hunk of memory) of strings instead of its list-of-characters representation...
Erl Interface to posix regular expressions by klacke@emailaddress.xyz
LICENSE: BSD style, free, use,  molest and rewrite

To build, make and sudo make install

To use:

1. Compile your regexp.

4>  {ok, RE} = posregex:compile(<<"abc.*foo">>, [extended]).
{ok,#Port<0.101>}

Try to match something 

7> posregex:match(RE, <<"abc mre text here foo">>, []).
ok

If it doesn't match 

9> posregex:match(RE, <<"abdc mre text here foo">>, []).
{error,nomatch}

Try to match and find out where the match occured

10> posregex:exec(RE, <<"abc mre text here foo">>, []).  
{ok,[{0,21}]}

Free memory occupied by the compilation (or exit process since
RE is an erlang port)

11> posregex:free(RE).
ok 

No comments:

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.