Making it stick.: 9/16/07

Saturday, September 22, 2007

PDXFunc Wiki

The PDXFunc, the Portland functional programmers group has a wiki. Nothing there yet, but there it is, and it's a wiki.

Actually it seems to be password protected, so I could not edit the sidebar to include a link to the PDXFunc google group. Coming soon, I hope.

Biztalk Services In The Cloud, Or: Why Not Jingle All The Way

Jon Udell mentioned Biztalk's Internet Service Bus a while back. I'm just getting caught up with that.

Curiously, in this video they mention using WCF and Biztalk Services to implement a chat that can traverse firewalls, etc.

Have they considered not hurting the web and using IETF standards like XMPP? XMPP defines an HTTP-based client. And the Jingle (XEP-0166) proposal, em, which actually has some implementations, extends all *that* to accommodate higher-bandwidth, out-of-band, protocols for VOIP, includes firewall negotiation, etc. to figure out if they need to go through a server.

Ah, but, yeah, we could just use this Microsoft code and go non-standard all the way. Maybe I'm missing something. Or maybe you're happy in your little Microsoft corner of the internets.

Early Diagnostics

Tim Bray and Cédric Beust each make good points about Erlang's error reporting. My first reaction was: apparently they've not used Haskell. But coming back around, I do agree that the messages can be confusing, especially to beginners, especially to beginners whose mindset is in the typical assignment-oriented, imperative C/C++/Java model of computing.

Pattern matching errors can result in a "bad match" message. When you thought you were doing assignment, and you forgot you forgot you cannot "re-assign" different values to the one some variable already has, then having the system tell you "bad match" is going to cause some grief.

Learning anything new, and especially "re-learning" something you thought you already knew, will take time. Grant it, that effort may not always pay off. Maybe it's just not your cup of tea. On the other hand what I learned about programming in Erlang is that a few error messages pop-up with some regularity, and I now am fairly quick to identify the cause.

Good practices to aid diagnosing errors are:

Write a little code at a time.
Test-drive that code using an xUnit framework or just using the erl command line.
Use pattern matching and guards to narrow the acceptable parameters to a function.

That last one is a key: catch the error asap, and rely on the language itself to catch it for you. In Tim's example, the error message is correct - it's a bad argument. But the top-level function did not catch it. Rather the recursion was fairly deep before some other function caught the problem.

That top-level function looks as though it should expect a list, and not an atom. So this would have helped greatly...


scan(Arg) when is_list(Arg) ->
  ...

Or maybe better...


scan([]) ->
  %% Empty list, do the default thing...
  ;
scan([Head | Tail]) ->
  %% Do something with Head then recursion on the Tail...
  ...
  scan(Tail).

Cédric also says Erlang "feels old"... so do I sometimes, but I still have something left every now and then. :-D

Erlinguistics: Counting with a Functional Accumulator

Tim Bray though Erlang weird, which it is, but he's pushing on with it, which is fun and educational, hopefully, for everyone following along.

Tail recursion is really a glorified goto statement, and after a while it starts to feel like an elaborate fake-recursive lie.

Once this style is adopted though, solutions tend use less code and become more tractable. At least code comes out more factored and there's less of it in one place. i.e. you know those long stretches of looping code that can last for a couple of screenfuls? You tend not to find those in code written in a functional style. Assignments don't change the state of a variable 17 lines into a 49-line iteration. So in a sense there is no "lying" in a recursive function, but sometimes some unwinding is needed to get a sense of what's happening.

On counting...

I thought of two ways to count things. Every Erlang process has a process dictionary, i.e. a hashtable or content-addressable store or whatever, with get() and put() calls. This is the kind of thing that the Ruby code above depends on; the square brackets are a dictionary lookup. The problem is, Joe Armstrong severely disses the use of the dictionary... [Tim's analogy elided]
There’s also a hashtable package called ets, of which Joe expresses less disapproval.
Another way to count things, which I suspect would make Erlinguists happier, is to spawn a process for each thing you need to count

Yes, using a process to maintain changeable state is generally a good thing. Think of each such process as a "little database" and a "little messaging protocol" for accessing that database. Counting and inspecting the count would be the degenerate example. This approach usually would be reserved for more complicated kinds of "little databases".

More preferable in this simple counting situation might be to include one or more "accumulators" in the recursive function. Instead of passing around a Counter process id, the actual "count" could be passed around. For example, the original "morse codes in erlang" uses an accumulator that is not a count, but it is an accumulator of a list of results. As the results are accumulated, those results are passed around until the recursion is terminated, and then those results are returned as, well, the results.

A simpler counter kind of accumulator could look like the code below, which is close enough to Tim's original example. I'm not sure what his process_match is doing, but in this case, handle_match is playing that part, and here simply printing information about the match. The count is maintained in the Count "accumulator" which is passed around with the recursion. The top-level function "seeds" the accumulator with 0.


-module(foo).
-export([count_matches/2]).

-import(file, [open/2]).
-import(io, [format/2, get_line/2]).
-import(regexp, [match/2]).

count_matches(Filename, Pattern) ->
  {ok, In} = open(Filename, read),
  count_matches(In, get_line(In, ""), Pattern, 0).

count_matches(_In, eof, _Pattern, Count) ->
  Count;
count_matches(In, Line, Pattern, Count) ->
  case match(Line, Pattern) of
    {match, Start, Length} ->
      handle_match(Line, Start, Length),
      count_matches(In, get_line(In, ""), Pattern, Count + 1);
    nomatch ->
      count_matches(In, get_line(In, ""), Pattern, Count)
  end.

handle_match(Line, Start, Length) ->
  format("Found a match in ~s from ~w for ~w~n", [Line, Start, Length]).

Taking this approach to its limits results in functions like lists:foldl and lists:foldr which "fold" some value over a list of elements, starting from the left or the right of the list, respectively. It's all a variation on "map/reduce", essentially. The above code could be made more general to "fold" over the lines in a file from top to bottom for some given functional argument.

Wednesday, September 19, 2007

Atom and RelaxNG

(Update: Links now work, thanks to Sjoerd Visscher.)

I'm fairly new to the details of Atom format and brand-spanking new to the details of RelaxNG. So here's what I did, and if you have any comments, then add them here or send an email to patrickdlogan at gmail dot com. I also did not find any explicit examples like this on the internets, so this may help someone else -- it's not complex, and that makes me feel good about atom.rnc and relaxng in general.

My objective was to use relaxng to validate an atom entry where the content is inline'd xml. The validation should be for the atom stuff per se, as well as for the specific xml content. A second objective was to apply the relaxng schema for that specific xml to that content separate from the atom stuff. So the xml content should stand alone or be inline'd in an entry and one schema definition should be usable for both cases.

I grabbed atom.rnc, the compact relaxng definition for atom. I grabbed the jing validator. Applying just this schema worked. Good. Good.

Then I spent some time looking at that schema and reading through various relaxng combination mechanisms. The combination takes place at the definition of atomInlineOtherContent in atom.rnc. This is where the inline'd xml content goes.

An interleaved combination did not work, as I found out, because that definition as well as the definition of my own xml includes "text", which causes relaxng trouble. Although I don't fully understand why this should cause trouble if my schema is more specific.

The combination mechanism that did work is to redefine atomInlineOtherContent in my own schema...


namespace atom = "http://www.w3.org/2005/Atom"

# Usage:
# java -jar jing.jar -c atom_policy.rnc example_insurance.atom

# Note: this includes an unmodified atom.rnc grammar from the ietf
# standard. This grammar here redefines the inline content pattern to
# specify that such content should include appropriate attributes from
# the atom spec, e.g. type="application/xml", but the content itself
# should be a  tree of elements.

include "policy.rnc"

include "atom.rnc"
{
  atomInlineOtherContent =
     element atom:content {
        atomCommonAttributes,
        attribute type { atomMediaType }?,
        amitPolicy
     }
}

This looks just like the original definition except the actual content has to match the definition of amitPolicy as defined in policy.rnc. (I should probably redefine the attributes to require a specific type of "application/xml" or something, but this is close enough for now.)

The example entry with policy content validates using this more specific grammar. More goodness.

I met the second objective with just a little bit of cruft. I'd like to use policy.rnc to validate just some xml that starts with amitPolicy and has no atom elements at all.

I could not apply policy.rnc directly because I'd have to add a "start" to that grammar, which then gets the validator bogged down when using that grammar with atom, two starting points, not combined correctly.

Instead I wrote another small schema to define the starting point for a standalone policy...


include "policy.rnc"

# This currently exists to support validating policy xml content
# without being part of an atom entry. There may be another way to
# avoid having this separate little grammar.

start = amitPolicy

There may be another way to do this without defining another schema like this.

Tuesday, September 18, 2007

Erlang Interview

André Pang was interviewed at builder.au re: Erlang...

Our very first server was written in C++, but we realised that we had a problem if the server crashes. There might be 20 VFX houses connected to this server, and when it goes down, they all go down with it. So we looked at using an Erlang server, by writing custom plug-ins to ejabberd, which is a popular XMPP (Extensible Messaging and Presence Protocol) platform -- so we layered all of our code above that.
In the end we found that to be a good solution, in fact the only time I can remember it crashing is when we forgot to allocate swap space for the server and it ran out of memory. Due to the way you can set up ejabberd and Erlang in general is that it really suits a distributed system without a single point of failure. It makes it easy to set up different nodes which all replicate as each other, for instance, there were plenty of times when the primary server went down, but users were automatically connected to the secondary server instead. With a bit of smarts in the client it was possible to reconnect to a secondary server in the middle of the session and only lose a small amount of information in the process.
We had three or four servers around the world, and it was incredibly cheap to deploy, because with the Erlang runtime you don't need any special hardware or anything. It was very reliable and turned out to be a very good decision.

Sunday, September 16, 2007

AMQP, XMPP

Stefan Tilkov suggests...

Personally, I see AMQP on the one side and AtomPub/XMPP on the other side of a fence — AMQP internally, addressing the same problem domain as current, proprietary queueing solutions, and AtomPub and XMPP over the Internet.

I'd like to understand more about his reasoning.

What would be the drivers for this dichotomy? Why are two different messaging systems necessary? What would limit AMQP from being used on the internet scale? What would limit XMPP from being used on the intranet scale?

Without some reasoning behind this dichotomy, it just feels to much like the old WS-* on the intranet, HTTP on the internet. I don't recall seeing a sufficient rationale for that.

Now it seems fairly clear that HTTP fits well on both scales, no?

So is HTTP sufficient for all kinds of "message transfer"? HTTP and XMPP? HTTP, XMPP, and AMQP? Or... how many kinds of messaging are enough?

As an aside, our viking friend suggests Apache can go beyond the traditional as well. I think we have a lot to learn still about how simple the enterprise can eventually be.

Emacs Tag Magic

Tim Bray on programming in C using Emacs...

I’ve been using Emacs, and I seem to recall that it has all sorts of navigation magic.

That would be the "tags" mechanism. Tags work with non-emacs tools as well, but emacs is especially good with them. See the manual.

Tag databases can be created for all kinds of languages.

Those IDE's still have *nothing* on emacs. :-D

Nuke-you-lair

On nuke-you-lair power...

How effective is long term storage of nuclear waste? Stewart's answer was typically provocative. As I recall it, he said something like this: "We don't know, but our framing of the question shows a failure of long term thinking. We've all been imagining that we have to solve the nuclear waste problem for all time to come. In fact, we only have to solve it for a few hundred years. Either by then technology will have advanced sufficiently that it will no longer be a problem, or we will have regressed so far that a few nuclear waste dumps in out of the way places will be the least of our worries."

In other words, we're screwed. Welcome to your post-apocalyptic future.

Making it stick.

Search This Blog