Not only is Steve Vinoski blogging again, he's blogging about Erlang. Not only is he blogging about Erlang, he's written some code like Tim Bray's but parallelized it on two-core and eight-core machines, with ease, as a relative newbie to Erlang...
Reading between the lines, it seems that Tim was hoping to take advantage of Erlang’s concurrency to put his multicore machines to work analyzing his logs...Very cool. There are still the benefits to be gained from improving Erlang's I/O and regexp libraries for doing the sequential aspects of Tim's work. But this shows the real value of Erlang (and Erlang-like capabilities if they show up in other language systems) for the increasingly multi-core, multi-node world.I decided to take a crack at it myself...
The way this solution works is that it uses multiple Erlang processes to convert chunks of the input file to lists of strings and process them for matches...
The best I got on my MacBook Pro after numerous runs was 0.301 seconds with 2400 processes, but the average best seems to be about 0.318 seconds. The performance of this approach comes pretty close to other solutions that rely on external non-Erlang assistance, at least for Tim’s sample dataset on this machine.
I also tried it on an 8-core (2 Intel Xeon E5345 CPUs) 64-bit Dell box running Linux, and it clocked in at 0.126 seconds with 2400 processes, and I saw a 0.124 seconds with 1200 processes. I believe this utilization of multiple cores was exactly what Tim was looking for.
If you’re a Java or C++ programmer, note the ease with which we can spawn Erlang processes and have them communicate, and note how quickly we can launch thousands of processes. This is what Tim was after, I believe, so hopefully my example provides food for thought in that area. BTW, I’m no Erlang expert, so if anyone wants to suggest improvements to what I’ve written, please feel free to comment here.
No comments:
Post a Comment