Wednesday, February 09, 2011

Version or aversion from version

Since past 5 years I was sniffing a lot around Maven and it's repositories (Proximity was initially released in end of 2005, yay!). And almost always I found myself facing with following, or at least somewhat related problems: version sorting.

Question that looks so simple and logical, but still, not resolved -- at least that's what user requests shows -- still today, after 6 years. So, what is it about?

I'll use term "artifact" in it's broadest sense, it's just a blurb of bytes in a Maven repository, that might be a library in JAR, a self executable app packed as JAR, a full blown application packed az ZIP.... you know, artifact. Naturally, the complete Maven coordinates are GAV (G = groupId, A = artifactId, V = version), but let's stick with version only for now.

People tend to think that "version" of an artifact is just as any other sequence of numbers: it's known where it starts, which one is next or previous. Well, that's not always true. Actually, it's almost never true.

Latest as what?

And here comes the notion of "LATEST" too. It was a concept in Maven2, that proved wrong. What was wrong in it? Well, exactly it's nature, "latest" as latest what? Latest released? Biggest version number?

And finally, the worst problem: people tend to confuse version (let's call it with FQN "Maven artifact version") and marketing version. Or worse: they tend to do all sort of "magic" and eloquent versions like "1.0-alpha-1", or "1.0-RC" etc. And this is the point when people introduce those strange, and usually very funny versions, usually not obeying any "standard" convention, and dooming softwares trying their best to sort versions to fail miserably.

One important thing to know, before you choose your version of a release artifact is that Maven repositories are eternal. Once released, your artifact will stay there forever. Later, when you realize your mistake, or that your "alpha" is actually "pre-alpha", you can't do anything, except violate the eternity contract. And that's wrong to do.

Or, change your way of thinking about Maven version.

Artifact version should not be used for "marketing versioning". Full stop.

No for marketing version

I'll go further, and just state version could be just a number N increasing on every release. So, a sequence of 1, 2, 3, ... etc, similar to Mercurial's "local revision" or Subversions "revision" numbers. You cannot "engineer" and tag your release with some "cool" revision number in Subversion (not that you would want that, this is just for example' sake), it forces you to simply "pick the next one".

Yes, the above will not work if your project uses branches (they usually do). So, let's say then that version is N.M.. and so on, and finally we arrive to general contract of X.Y.Z, used across many Java, Maven but also non-Java projects (think Linux kernel versioning).

In general, you should NOT go beyond X.Y.Z form. Personally, I'd change Maven to enforce this version form, and just fail the build if any other form of version is found in POM. Done.

But still, with branches, your artifact might produce following (simple!) "timeline", releases on time axis:

  • 1.0 (initial cut, from trunk)
  • 1.1 (from trunk, and then branched as 1.x, since you intend a major rewrite)
  • 2.0 (from trunk)
  • 1.1.1 (from branch 1.x, a bugfix release)
  • etc

So, which one of these is "latest"? Before you say "it's 2.0", think again. Last released is "1.1.1". Greatest version is "2.0". Okay. But I believe you noticed the (at least) two different semantics for "latest". Here you go, and you can easily add new meaning to LATEST too.

Not to forget about different scenarios: the sorting has to be "stable", since either you have a long running repository where you deploy from time to time, or you are actually doing a restore from a backup after system failure, and let's say, you have artifacts only and intend to restore repository metadata using some tools... you'd expect same ordering in metadata, right?

Do you "compare" groupIds (let's put aside the usual processing of them, like sorting those alphabetically)? Or artifactIds? Do they have metrics? Are they comparable? Not in this sense. So why would version have these properties? Just take a peek at examples below:

A good example of broken versions is this one below. Your sorting algorithm should know what "pre" means! Until some "final" 4.0 release, the "pre-alpha" will be taken as latest always for "sorted versions"!

http://repo2.maven.org/maven2/org/sonatype/flexmojos/flexmojos-parent/

For great "source" of existing versions in Maven Central Repository, I always go to this file below, I gathered once:

https://github.com/cstamas/nexus-ruby-support/blob/master/versions-20100319.txt

Message in the bottle

Obviously, the most frequent reasons for eloquent versions is usually that a developer(s) want to add some meaning to it. Usually for marketing reasons. And that's wrong. Introduce a "marketing version" somewhere else, use that on your site and explain the reasons behind it, but Maven version should not be used for marketing versioning. Or stop expecting "proper" behavior for sorting of the same versions. Use your site for "explaining", the artifact version is just a "pointer" to proper place on your site.

In this shed of light, I'd say you could release always! You could simply consider every successful CI build (or every 5th or whatever) as a release (and THAT would be agile)!

I mean, no project out there (let's consider the "better" Maven3 world-only, as compared to Maven2 world for a moment) will pick up a new release "by mistake". All the version are locked down in your POM, right? Plugins, dependencies, everything.

A bit of digression here: yes, the version ranges. Well, I look at them as bit of a mistake. Mistake of how they are implemented in Maven. Having ranges during runtime (a la OSGi) -- sourced from deployed POMs -- is fine, but during build-time is totally wrong in my opinion. To be more precise, ranges skews the "factual truth about the build", and their purpose is more meaningful for runtime (when someone consumes your artifact and want more freedom to calculate runtime dependencies), but you, when building your artifact, you did build it against one single fixed version! But with a range in version tag, the information "against which exact version was this artifact built" is lost.

I'd rather have some solution (like introducing new "runtimeVersion" in dependencies tag, used by consumers of deployed POMs) to state my "compatibility" against a range of versions, but have my POM properly describing my build.

Version is just a pointer: 0xdeadbeef

Yes, Maven version can have "holes", they must not strictly follow each other. Maven version is just an element in artifact coordinate triplet primarily, and only after that is meant to carry some "light" (pointing?) information for people to comprehend only by "looking at them". Not the other way around!

Change the way of thinking about version, and you will save yourselves from a lot of grief.

2 comments:

vi said...

Good post about maven versioning. I agree and suppose some extension to the theory. My main problem is that Sun Microsystems (rest in peace) specified versioning long ago (at least in java se 1.3 in 1998) in packaging but it has not been applied widely. Actually they realized that a software has at least TWO version numbers. One is specification-version, that has strict and easily understandable format major.minor.micro. It is parallel with maven GAV (or inversely :-) that

"A specification is identified by the:

Owner of the specification
Name of the Specification
Version number - major.minor.micro
"
But also has an implementation version with an other GAV as

"Implementation-Vendor:
Implementation-Title:
Implementation-Version:
"
I believe that the main reason of the problem of version handling in maven is maven has only ONE version for an artifact. This single version is mixed used for specification when you try to build against and also for implementation when you need it at runtime. Actually this separation could also be accomplished in a way that api and the impl is packaged separately as you mentioned. In this case the api package has no problem with version number as it has only ONE version number de facto. The impl package can be used as a runtime dependency but it has no connection in version number with the api package as it lacks the specification version information from the maven versioning point of view.
What do you think about it?

vi said...

Hi Tamas,
Good post about maven versioning. I agree and try to extend the theory. I think the main reason of the problem is that maven use only ONE version number! Why? Far far away Sun Microsystems (rest in peace) realized that a software package has at least two version numbers (not counting the marketing version number:-). They specified the packaging version numbers as follows:
"
Specification-Vendor:
Specification-Title:
Specification-Version:
Implementation-Vendor:
Implementation-Title:
Implementation-Version:
"
What a coincident with maven GAV? But it separates the specification (means API) and the implementation. In maven these two version numbers are compressed into THE artifact version. As a consequence it is used mixed for specifying the contract against you build your artifact and the runtime required dependency. This separation can be accomplished in another way as you mentioned in "SLF4J Logging + Maven". You create two packages (two maven projects), one for API and one for the impelemantation. In case of the api project there is no problem as it has only one version de facto. The impl project has the implementation version but has NO connection at the version level with the api package it implements.
The specification is very strict for the specification version number to ensure it is sortable.
So there is no need to reinvent the wheel, but adapt the J2SE infrastructure.
What do you think about it?