May 30, 2004

Java, open source, standards, and conformance

[Revised 5/30/04]
My colleague Simon Phipps has just posted a lengthy piece entitled On Java and Openness. I agree with him pretty much 100%.

A lot depends on whether you approach compatibility and interoperability from a "standards" position or not. The "standards" viewpoint is that there are going to be multiple implementations, that this is a good thing, and that you need to make sure that these implementations can interoperate. This was always the IETF approach: a proposed standard with only one implementation was viewed as problematic, because you couldn't distinguish between the intention of the standard and the accidents of implementation.

I started at Sun in 1985, and my first job was to do an implementation of NFS for MS-DOS from spec. All of the previous implementations (for various types of Unix as well as VMS) had been based on the source code, so I guess that I made Rusty Sandberg and the other spec-writers pretty nervous. But it worked. Later I worked with folks from FTP, Microsoft, and JSB to develop the Windows Sockets specification, and we recognized from the start that the conformance verification model was going to be key to acceptance.

I've always taken an object-oriented approach to standards: I believe that a standard is an object with one method, bool Conform(implementation i). If an implementation, i, conforms to the standard, I can use it. Of course this presupposes that the standard is adequately specified and the Conform() method is trustworthy, but that's why doing standards is hard work. It also explains why the Reference Implementation (RI) is such an important concept - that if (or rather when) the language of the standard proves inadequate or ambiguous, there is an authoritative answer: the standard is what the RI does. Critically, the RI is not supposed to be the only implementation; it should be optimized for clarity and conformance rather than performance, size, efficiency, etc.

Everything that I've seen about the open source movement suggests that it is designed to encourage group participation on a single code base, rather than the creation of multiple independent implementations. Where's the 100% compatible clone of Perl, or Apache or Sendmail? In that respect (paradoxically?), open source leads to a monoculture, just as much as Windows does. When you actually have multiple implementations, you have to face the question of whether compatibility is important or not. For platform technologies - systems that other people rely on to make their software work - the answer seems to be yes. And despite the absolutists that Simon cites, there is no evidence whatsoever that a free market approach can sustain compatibility, except by preferring one choice and allowing the others to wither. To me, monoculture - even a free monoculture - seems dangerous. Diversity is good.

Posted by geoff2 at May 30, 2004 01:02 AM
Comments

> Everything that I've seen about the open source movement suggests that it is designed to encourage group participation on a single code base

Actually, it's the exact opposite - everyone wants to invent a 'better' wheel :)

o Desktop environments - GNOME, KDE and XFCE.
o Browsers / layout engines - KHTML (Knoquerer, Safari). Gecko (Mozilla, Firefox, Epiphany ...)
o Widget toolkits - QT, GTK+.
o Operating systems / kernels - *BSD, Linux.
o Source control - CVS, Subversion, Arch.

...and lots more.

However, the implementations tend to follow the specs. Layout engines support CSS / HTML. KDE and GNOME are working towards standards for 'look and feel' (menus, mime types etc. ) so that e.g., you can see the same desktop shortcuts in both DEs. The operating environments are POSIX(ish), and therefore you can get huge programs like the X server, GNOME and KDE up and running on all of them.

> Where's the 100% compatible clone of Perl, or Apache or Sendmail?

We need an MTA that supports SMTP and RFCs, or a web server that follows HTTP / HTTPS and associated RFCs. Apache, Sendmail are just implementations. Postfix, exim and qmail are some alternatives to sendmail. I would claim that Perl is not a general purpose programming language, cloning it would be a wasted effort, and besides, no one is crazy enough to do it :)

The Apache web server has no "open source" equal. It does compete with IIS, though.

Would you still say that "open source" leads to monoculture?

This begs the question: is the expression (monoculture == bad) always true? A monoculture backed by predatory forces is always bad, but what of a monoculture that is the outcome of success based solely on *technical* merit?

I wholeheartedly agree that diversity is good, as long as it doesn't become a cacophony.

Posted by: Sahil at May 30, 2004 08:06 PM

You seem to be confirming my point. For platform technologies, like QT, GTK, Gnome, KDE, multiple equals incompatible. The developer has to write to Gnome or KDE, or else stay with X - which gets back to one.

But with Java, the objective is to allow multiple compatible implementations to exist. You say that no-one would clone Perl, "it would be a wasted effort". But why? Suppose someone came up with a brand new Perl implementation that was 50% faster but took 50% more memory? Wouldn't be great to be able to choose between two implementations and pick the one best suited for the job at hand? The real reason that we can't is because Perl is whatever Larry says it is.

So yes, I stand by what I said.

Posted by: Geoff Arnold at May 30, 2004 10:52 PM

Python has "CPython" (the 'normal' python that everything thinks of) Jython, which is a python interpreter running on java. IronPython which is python running on the .NET IL. All different implementations of the same thing. For the "Pie-thon" competition Python is being reimplemented on top of Parrot as well.

Posted by: Perry Lorier at May 31, 2004 03:04 AM

Python is an interesting test case. I read the Python FAQs, and I couldn't see any discussion of how the various flavours were being kept compatible. Are they truly independent implementations, or is the original Python the "master" and all the others "clones"?

Posted by: Geoff at May 31, 2004 08:10 AM

"the standard is what the RI does"

Even if the RI is buggy? Or has a fundamentally broken memory model? Doesn't implement floating points correctly? Cements bugs in fdlibm as standard? ...

Everyone has bugs, even RIs. If the standard is supposed to be what the RI does instead of the letter of the standard, then you end up with code that breaks with the slightest fix of the standard's amibiguities or the RI. And both usually evolve. ;)

I've seen enough Java code that assumes that JDK 1.3 had the class loading worked out only to fall flat on its face on Kaffe by assuming that there is a null bootstrap ClassLoader or that the system ClassLoader is an instance of URLClassLoader. Just because it doesn't crash on the RI, doesn't mean the code is portable Java.

For example, I've seen J2EE implementations that happily used reflection to access private fields of ClassLoader, so they wouldn't work anywhere else but on Sun's JDK. Is the RI correct in letting such code run at all? Does that mean the Java API is incorrect for not specifying such fields?

If the standard sucks in part, don't use the part. Plain and simple. Write your own, or use something that's more precisely defined. Programmers that blindly code to the RI just make the maintenace programmer's job unnecessarily hard.

Posted by: Dalibor Topic at June 1, 2004 02:49 AM

It's not clear that there is a real RI for the base JVM and JDK - perhaps my colleagues can comment on this. And the RI doesn't "allow" anything; it's designed to complement the specification.

There's nothing wrong with writing a J2EE implementation that only works on Sun's JDK as long as it still suports J2EE application components. If I remember correctly, EJBs are not permitted to mess with class loaders or threads. If the dependency to which you allude were to leak through and create a dependency at the application level this would be a bug.

Posted by: Geoff Arnold at June 1, 2004 11:32 AM

Your answer to Sahil was confusing, and seemed to ignore the bulk of his points to concentrate on an unrelated one you really wanted to make. You said "The developer has to write to Gnome or KDE, or else stay with X - which gets back to one", implying that one problem they share is requiring a developer to write applications targeted to a particular environment. You seem to be saying this is one definition of a monoculture. But isn't it true, by this definition of a monoculture, that Java developers would be locking themselves into a Java monoculture? It doesn't matter from the developer's standpoint whether there are one or a dozen Java implementations. Viewed from the outside, all Java implementations would be expected to look and behave the same.

In your original article you say that diversity is good, but you seem to have a curious idea about what real diversity is. Having multiple implementations of the same thing is not diversity, just as a room full of clones would not represent a diverse population. The benefits of diversity come when users can choose from multiple products which offer real and meaningful differences, giving them opportunities to find one which meets their particular needs. The open source community provides diversity by opening up a wide range of projects for developers to contribute to and users to select from. Calling any particular project a monoculture is simply a red herring.

Platform technologies, both proprietary ones such as Microsoft Windows and open source ones such as Gnome and KDE, will always require developers to write to their respective interfaces. Writing applications for Java requires developers to adopt a particular language and the use of specific class libraries, which makes it seem like even more of a monoculture than other alternatives (according to your implied definition).

Finally, if compatibility is truly the number one priority, then having multiple implementations for the same product can only create problems. Development that is concentrated in a single code base which is portable across many platforms has none of the compatibility issues which are inevitable with creating and maintaining multiple independent implementations. In this respect, most open source projects easily avoid the compatibility problems in Java that Sun seems so quick to point out.

Posted by: Gary Shao at June 4, 2004 03:25 AM