...our main J2EE app at work with 9 second pauses. These would happen on average every 50 seconds. Needless to say this was a... problem.That's quite a canary. Let's name it Lazarus for the time being.
In the end we consulted some engineers at Sun who... gave us the following piece of black magic:
java ... -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=1200m -XX:SurvivorRatio=16
I'd bet the object story is even more interesting than the GC story.
As coincidence would have it, Greg is in town for OSCON and we ended up having coffee and then dinner with a group staying in town an extra day. There *is* a good object story behind the GC story... they are doing a *lot* in that VM with actual data and several kinds of caching. There are several ways they could slice up the objects into more VMs, in particular partitioning along data domains is natural for their application. Moving things into more VMs is something they've considered, but could take some effort working around their current use of ejb.