"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Saturday, September 22, 2007

Erlinguistics: Counting with a Functional Accumulator

Tim Bray though Erlang weird, which it is, but he's pushing on with it, which is fun and educational, hopefully, for everyone following along.

Tail recursion is really a glorified goto statement, and after a while it starts to feel like an elaborate fake-recursive lie.
Once this style is adopted though, solutions tend use less code and become more tractable. At least code comes out more factored and there's less of it in one place. i.e. you know those long stretches of looping code that can last for a couple of screenfuls? You tend not to find those in code written in a functional style. Assignments don't change the state of a variable 17 lines into a 49-line iteration. So in a sense there is no "lying" in a recursive function, but sometimes some unwinding is needed to get a sense of what's happening.

On counting...

I thought of two ways to count things. Every Erlang process has a process dictionary, i.e. a hashtable or content-addressable store or whatever, with get() and put() calls. This is the kind of thing that the Ruby code above depends on; the square brackets are a dictionary lookup. The problem is, Joe Armstrong severely disses the use of the dictionary... [Tim's analogy elided]

There’s also a hashtable package called ets, of which Joe expresses less disapproval.

Another way to count things, which I suspect would make Erlinguists happier, is to spawn a process for each thing you need to count

Yes, using a process to maintain changeable state is generally a good thing. Think of each such process as a "little database" and a "little messaging protocol" for accessing that database. Counting and inspecting the count would be the degenerate example. This approach usually would be reserved for more complicated kinds of "little databases".

More preferable in this simple counting situation might be to include one or more "accumulators" in the recursive function. Instead of passing around a Counter process id, the actual "count" could be passed around. For example, the original "morse codes in erlang" uses an accumulator that is not a count, but it is an accumulator of a list of results. As the results are accumulated, those results are passed around until the recursion is terminated, and then those results are returned as, well, the results.

A simpler counter kind of accumulator could look like the code below, which is close enough to Tim's original example. I'm not sure what his process_match is doing, but in this case, handle_match is playing that part, and here simply printing information about the match. The count is maintained in the Count "accumulator" which is passed around with the recursion. The top-level function "seeds" the accumulator with 0.


-import(file, [open/2]).
-import(io, [format/2, get_line/2]).
-import(regexp, [match/2]).

count_matches(Filename, Pattern) ->
  {ok, In} = open(Filename, read),
  count_matches(In, get_line(In, ""), Pattern, 0).

count_matches(_In, eof, _Pattern, Count) ->
count_matches(In, Line, Pattern, Count) ->
  case match(Line, Pattern) of
    {match, Start, Length} ->
      handle_match(Line, Start, Length),
      count_matches(In, get_line(In, ""), Pattern, Count + 1);
    nomatch ->
      count_matches(In, get_line(In, ""), Pattern, Count)

handle_match(Line, Start, Length) ->
  format("Found a match in ~s from ~w for ~w~n", [Line, Start, Length]).
Taking this approach to its limits results in functions like lists:foldl and lists:foldr which "fold" some value over a list of elements, starting from the left or the right of the list, respectively. It's all a variation on "map/reduce", essentially. The above code could be made more general to "fold" over the lines in a file from top to bottom for some given functional argument.

1 comment:

Steve Jenson said...

I didn't think Tim was claiming that tail recursion in general felt like a lie just that his particular use of it didn't make things feel nicer. I'm not sure if he's used to programming in that style yet, it does take a while.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.