Friday, January 06, 2012

The flawed Snapshots

Recently I hear people screaming how Maven Snapshots are bad (whatever "bad" means). "We don't want snapshot dependencies in our build!", "Snapshots introduces instabilities", yada yada yada. True, if you don't understand how they work, and as a potential consequence, you misuse them.

Before I'd start this discussion, I'd like to explain some terms I use to avoid any misunderstanding. I assume Maven (the latest stable one, not some ancient version of it) is used for development. I also assume CI is in place, and CI jobs does deploys too to MRM (no manual deploy happens). Finally, as you guessed, I assume MRM is in place too, hosting your releases and CI built snapshots (and doing some other handy things like proxying, but it's irrelevant for now).

Snapshots in general

In general, they are strange beasts indeed. If you imagine your company, and projects running within, those could be visualized more or less as concentric circles. Some of the projects might be intersecting (remember Venn diagrams from grade school?). Rarely, the project "circle" might span outside of the company too.

Usually, you want to avoid to use snapshots in cases when the "edge" (connecting Your project within one circle with the snapshot artifact's project in another circle) cross a circle boundary. They might introduce instability then.

Rule of the thumb: if you are not governing the life of a snapshot, you should not touch it (except in some exceptions, it's not all black or white as we know). "Governing" as "is your project, you are changing it, hence, your activity triggers CI to rebuild it". But, "governing" might be also "you periodically sync the sources of a foreign project in SCM, possibly apply patches, and again, implicitly you trigger the CI builds by doing this". The latter is one of the best practices when you do need a snapshot (cutting edge) of a library, but you have no direct influence over it. Is the only possible way in case you need a patched cutting edge too, or your patches are "pending", not yet applied to the project sources (by some foreign entity, since we talk about foreign project). This works if you are allowed (able) to access sources: project in question allows that, is OSS or it allows by some other reason or agreement. In this case, you would usually NOT use the SCM change trigger on the job (it depends on SCM, you could use the trigger if you maintain a forked Git repository for example), you'd trigger it rather manually after syncing the latest changes, directly influencing when snapshot is built and what changes it contains. You govern it.

In every other cases, all you should do is: avoid snapshot and use release.

Naturally, this is not always possible, so you are left with "freeze the snapshot" solution: download the snapshot binary, rename it (making it's Maven coordinates a non-snapshot, but use some distinguishing mark, as SVN revNo, Git commit hash part, or a date at least for future reference) and upload to your MRM. You can "freeze" another (newer) snapshot from time to time if needed (whenever needed).

Snapshots within an organization

Snapshot within organizations consumed by (intersecting or not) projects of same organization might fall under the "in general" case very easily. Usually they should fall in there, but it depends on many factors (can't compare a company spanning across globe and a 4 person company doing two projects easily). A two independent projects should consume each other artifacts only after they are released.

But, situation here is slightly better than the "in general" case, since the snapshot is not governed by some foreign entity, but by our colleagues. And colleagues tend to collaborate. So, you can easily ping your mate if your snapshot contains some breaking change how to adopt the consuming project code for it. Or in reverse, you can nag them to fix something in there (and have the CI build and deploy it for you). Ultimately, the CI you visit every day builds those artifacts too, you still has insight about it.

Naturally, this works in "small scale", but not so in "large scale", think companies spanning across multiple continents. "Common sense" is the best practice here, you should wage and decide which approach works the best for you.

Just to remark: "organization" here, in a way I use it does not map to organizations (in real world) like ASF is. Again, the modeling depends on actual context. In case of ASF, the "organization" would better map to a "top level project". Also, "organization" might map to a branch office only (in case of geographically distributed company), etc. In general, every "participant" in message passing from project A to project B is at least one circle. In other way: the circle boundary would mean "you can less and less influence the remote end".

Snapshots within a project

A project might contain multiple reactors, or a project (that might be called "main project(s)") might have some subordinate smaller projects (offering utilities and such) managed by same team. In this case, since both are governed by you, a similar approach should be taken as with SCM "feature branches" (or just branches in general, both short lived and long lived ones): if you pick up a story that requires a modification in a subordinate project, in your branch you modify the POM of the consuming project, you apply the needed changes (bug fix, new feature) to both, consuming and subordinate project, and finally you release the subordinate project. Meaning, merge into main project happens with a new release (or release is done shortly after the merge, irrelevant). Meaning, the subordinate project "lives" as snapshot dependency in consuming project almost as long as you have the branch living (in case of short lived branches). In case of long lived branches, you perform the release when you finish the story and it's result is accepted by main project. Ultimately, you can modify this to "a modified subordinate project lives as snapshot dependency until next release cycle", since release plugin will force you to release it anyway. But the former fits better the "release early release often" mantra, and is even better in there's more consuming projects, not only one.

A bit of digression here: versions and releases are cheap. In case of subordinate projects, you can easily end up with situations that main project "1.0" uses "1.0" of subordinate project, but "2.0" of main uses "3.4" of subordinate project (for example, because between "1.0" and "2.0" you had multiple ongoing stories affecting the subordinate project). It does not stir any water. Nobody cares, believe me.

Not all snapshots are equal

One final word of warning: you, as a human are able to deduce a lot from snapshot version (and it's "possible" or "expected" behavior), a lot more than Maven can. For Maven, it's really just black (is snapshot, handle it as such) or white (is a release). But for human is not. Some examples:

You see a snapshot versioned "1.9.2.4-SNAPSHOT". Easily, by verifying the existence of preceding release ("1.9.2.3" in this case) might "suggest" you, that this snapshot will not contain groundbreaking or API breaking changes, is about a bug fix. Should not break your code unexpectedly (unless your functionality depends on the existence of the bug being fixed!).

You see a snapshot version "2.0.0-SNAPSHOT" (or any zero-zero one). As you guessed, usually you do expect API changes and breaking changes in here. So, these ones are to be avoided as dependencies (multiplied as many times as many "circle" boundaries it crosses). As interesting example, Lucene project uses interesting approach to versioning: as project progress, they release "1.1", "1.2", "1.2.1" etc, but as part of v2 preparation, they start publishing "1.9" and such versions, are kinda "messengers" for API and other changes upcoming in the not-yet-released "2.0". This eases adoption of new API, while it does require more knowledge about intentions of developers, so it requires more reading on Lucene site to understand where are "1.5" and other "missing" versions.

You see a snapshot versioned "1.10.0-SNAPSHOT". By checking for existence of preceding release (you find "1.9.2.3") you assume is okay. But is not. This one might never be released. This is a well known problem in Maven world: "latest" snapshot you find in a snapshot repository might never be released. Again, this is up to you to be involved with the project/entity producing the snapshot if you have to consume (participate in meeting, subscribe to their mailing lists, forums etc).

Conclusion

Snapshots are not flawed, but they do need care. Sadly, developer's knowledge using them usually is.

No comments: