Tuesday, December 06, 2011

There’s a time and a place for everything and it’s called college

I really really despise L18N. Especially when it is used in blunt way "just translate whatever text you see on screen". In my opinion, that's just plain wrong. Just like South Park's Chef explains about drugs, there's a time and place for localization too.

Since my first computer (good old times, 1984 with Atari 800 XL), I always used my gadgets (computers, dumb phones, smart phones, iPods, etc) Language settings set to English (while do setting Regional Settings to some continental value, since I use Metric system). And this is why: I do not understand what the fucking machine wants to tell me in Hungarian. Neither German. Or whatever else language, except English.

One typical example: my wife and I have exactly same smartphones. As I described above, mine is using English as Menu Language, while she insisted on "Hungarian menus", so she got them. One day, she realized that while using my phone, she is able to write way longer messages, but while on her phone, the text messages was about third as long (and if longer, cut into multiple messages automatically by phone)! So, she asked me "to do the same I did with my phone, since it's annoying for her to squeeze her messages in so little characters" (she's not a Twitter user either). Sure, no problem! I started wandering at her phone's menus around messaging options, and one menu did caught my eye: "Input mode" (naturally, localized in hungarian). Ok, enter here, and there was 3 options given: "Automatic" (auto-disposed, I don't like gadgets making decisions instead of me), "GSM Standard alphabet", and "Accented characters"… Hm, nothing suspicious… So I continued the search, but failed naturally. She was still able to send "short" text messages. Then I realized, and looked at my phone, same menu: "Input mode", options are "Automatic", "GSM Alphabet" and… "Unicode"! The precious translator translated the "Unicode" name into "Accented characters"! Dumb ass. That explained everything. Setting her phone to use "GSM Alphabet" solved her problem of short messages, but I bet examples like these are easily found in multiple places.

Another great example is Apple OSX. Naturally, her Mac uses Hungarian localization (available since Lion). I was frowned, that not only Finder and (those that are localized into Hungarian) application menus are in Hungarian, but Apple localized Application names too! Usually, when she asks me for help (usually I need to kill Flash plugin), I start what I do on my machine: Cmd + Space and start type "activi"(ty monitor), Spotlight brings it up, press Enter and start looking for rogue process. But not on her machine…. Spotlight does not reports "Activity Monitor" as something that exists on my wife's Mac, while we both use same OSX! Just really annoying. The application name "Activity Monitor" is localized too! This reminds me of the old Microsoft fiasco, when they "localized" Excel for Hungarian in a way, that even functions were localized too, hence, non-hungarian and hungarian spreadsheets were simply incompatible! Way too stupid. I mean, okay, localize Finder, but an OS tool???

So, just like Chef says: there's time and place for localization too. I believe if Mary (Mariska) type her email, it's okay to use Mail.app menus in English (Hungarian). Same for typing in a word processor. But.

English is the language (it could be Latin or Esperanto, I don't care) is well fit for these "one word commands", like "Save", "Quit" or "Copy" and "Paste". Many times the forced one word translations are hilarious, or instead, the almost "sentence like" translations ruins the UI design. And regularly, differs the meaning they carry at least to make you wonder what the original label was. Natural languages are that "by design", your never be able to translate the perfect meaning, due to language constructs, cultural differences or because of sloppy translator, or because of all these. And you just ruin the applications doing that, and waste a lot of resources and money doing it (is waste just like some companies are suing each other instead turning that money to R&D). Again, there's time and place for doing it.

But, if you do an application used by some narrow "set" of users, like a tool for developers, or tools for IT technicians, I'd never bother localizing that. There is a "lingua franca" for them, and that's English. Just accept it as a fact.

In my opinion, Chef was right. But he was talking about drugs: "Look children: this is all I’m gonna say about drugs. Stay away from them. There’s a time and a place for everything and it’s called college." Well said!

Friday, May 20, 2011

Trick: gather outbound GETs made by Nexus for a $reason

The $reason might differ a lot, I was just curious how to do this in a "lab" environment, to have a list of URLs fetched by Nexus (that were made actually to fulfill client requests). Again, this is a test, not quite usable in production environments -- unless you spice it up maybe.

All I wanted to have a list of URLs (artifacts) that my Nexus fetched during a test. I wanted to check that list, sort it, count the distinct URLs, check for dupes -- if any, etc. This is here just as reference to me in future, or maybe may help somebody else too.

How to do it:

  1. Set up a "clean" Nexus installation, by let's say unzipping the bundle somewhere.

  2. Fire it up, login as "admin" user and set logging to DEBUG level over UI -- Nexus will spit out outgoing HTTP GETs in DEBUG log level like these:
    jvm 1    | 2011-05-20 14:44:58 ... - Invoking HTTP GET method against remote location http://repo1.maven.org/maven2/...
  3. Start some client to fetch against nexus, I did this:
    cstamas@marvin test]$ mvn -s settings-1.xml clean install > b1.txt & mvn -s settings-2.xml clean install > b2.txt & mvn -s settings-3.xml clean install > b3.txt &
    ... and went for a coffee.
  4. process the logs.

Processing the logs

  1. Concat the logs into single file -- if needed. I had to, I ended up with two log files, since DEBUG made wrapper to roll the file based on size I guess.
  2. Filter the logs appropriately, I used combination of tools like grep and awk, to produce my list of URLs

Example session:

$ cp ~/worx/sonatype/nexus/nexus/nexus-distributions/nexus-oss-webapp/target/nexus-oss-webapp-1.9.2-SNAPSHOT-bundle.zip .

$ unzip nexus-oss-webapp-1.9.2-SNAPSHOT-bundle.zip

$ cd nexus-oss-webapp-1.9.2-SNAPSHOT/bin/jsw/macosx-universal-32/

$ ./nexus console

$ cd ../../../logs

$ less wrapper.log

$ less wrapper.log.1

$ cat wrapper.log.1 wrapper.log > remoteFetches.txt

$ less remoteFetches.txt

$ cat remoteFetches.txt | grep "Invoking HTTP GET method against remote location" > remoteFetches-filtered.txt

$ less remoteFetches-filtered.txt

$ awk 'BEGIN{FS=" "}{ printf "%s\n", $18}' remoteFetches-filtered.txt > remoteFetches-urls.txt

$ less remoteFetches-urls.txt


It gave me list like this one (unsorted, URLs are ordered as Nexus made them):


























Have fun!

Friday, May 06, 2011


Dear visitors and commenters! Please excuse me for my negligence, I simply got no notification from blog that I have comments, that were plainly marked as SPAM!

Sorry again, will look to fix this issue, and prevent valid comments being declared as spam (by blogger engine).

Tuesday, May 03, 2011

Home networking solved (ugg)

This is an interesting story, that was baffling me for two days, affecting even my work-hours presence too, hence, I was eager to solve it.

My provider is hungarian T-Home (a DT subsidiary, just like T-Mobile and all other "magenta" companies are). Since January, we have IPTV set up, which meant a new cable modem too. The old "dumb" cable modem had to be replaced, and I wanted to migrate my home networking infrastructure without any disturbances. I did, and it worked. Up to yesterday, when something happened...

The switch

The day when serviceman appeared to install the units was interesting, so let's start with that. He just brought two boxes, one with modem and one with IPTV Set Top Box (STB). Let's forget the fact he initially brought the wrong STB -- the one without HDD while we ordered the one with HDD, he installed them quickly and fast. The interesting part was when he spotted my infra sitting next to modem while he was switching the old modem with new one (which turned out to not be modem only at all). He put on a blunt smile, and told me, "You have to stop using your own router, it interferes with STB. The new modem has router capabilities too.". I asked "how" does it interfere? He just kept repeating "You have to uninstall your router" and smiling. I believe he had to tell me that due to some company policy (the contract has some stupid limit of machines allowed to connect, but nowadays when even micro ovens has WIFI, those policies may wipe out my... um). Okay, "I will remove everything in a moment you finish" I lied.

So, what he installed looked very promising. Both of the gears wears "Cisco" sticker. The modem (and router, and AP as later turns out) is "Cisco EPC3925 EuroDOCSIS 3.0 2-PORT Voice Gateway" model EPC3925. It features 4 LAN ports, 2 phone ports (I am not using those, SIP phone rulez), and N WIFI AP. The STB is Cisco ISB6030MT.

Both of the "high quality" Cisco gears, not some cheap shit. Yeah. I believed that for few days until I tried to google for them. It's cheap shit with nice stickers on it. Cisco did acquire few companies, and blatantly rebranded them (why are they ruining their own trademark?). I did not care for TV as long as it works and does what we want (it does, even if it runs ancient WinCE!!!), but this was a reason more I did not want to rely on this modem as router. I wanted to use it as least as possible. So, I decided to change network segment for my home stuff. This is what I ended up with:


In short, the modem was set up on IP and I did not want to fiddle with it too much, so I switched my home network to 192.168.1.x. Modem, STB and WRT-H are directly wired (is better to reduce multicast group latency), and WRT-H (H as home) is routing to 192.168.1.x segment, but also does DHCP, DNS for home (and to "fix" the damn Apache Software Foundation SVN server to work with git-svn, but that's another story) and QoS. Wired connections from it goes to Apple TimeCapsule (TC) and Gigaset SIP Phone's base station. And it serves as WiFi AP for home machines like Macs and phones and such. Both WRTs are actually good old Linksys WRT-54GL running the best custom firmware I had chance to find, the Tomato firmware. And the WDS is here just to "hop" over the internet to my office, and to be able to use the printer (it's actually an MFP) from home.

Not wanting to fiddle with modem, all I did (that changes the "factory" preset config as T-Home is shipping them) is shutting down the WiFi on it. Yes, T-Home is shipping them with WiFi on, and my neighbor is full of WiFi noise with meaningless SSIDs (they are randomly generated), and many of my neighbors are simply unaware they have WiFi! Why oh why is T-Home shipping them like this? Why not turning on WiFi on the spot if customer asks for it in the first place?

And everything was working like a charm. Until yesterday.

The drops

The network since change to IPTV and new modem was fairly stable and fast. I did notice some small "drops" (like a browser trying too long to get a page), but they were intermittent and were rare, so I did not fiddle with those.

Yesterday it started to falling apart. My wife was unable to browse anything, my browser, git and svn was timeouting (not connection refused but like TCP packets went to devnull somehow)... It was a nightmare. And the most interesting thing, is that UDP was working without a problem! Initially I thought it's network outage (or brownout) that keeps recurring on provider side, but was suspicious that Skype for example worked without interruption (same for TV reception, that uses UDP mutlicast). So I phoned my provider, asking about outage and describing the problem, but after a long session (they did some remote measurements and other checks), they convinced me it's problem on my side, they had good signal quality readouts, and no packet loss reported (I did confirm the signal quality, since modem does print those out on it's ugly UI). To convince myself even more, I hooked up a Mac directly over the wire STB was using to try the network (to rule out WiFi, any in-the-middle router, etc). It was working like a charm. So, really, it must be my equipment.

Tracing the problem clearly showed that TCP packets are somehow disappearing in my network, and WRT-H was becoming the target of suspicion. But it was reporting no problem, and to make things worse, the "outage" was simply sporadic: in a moment the network was working just fine (the TCP at least, since UDP services had no outage at all), and in next moment, it stopped and packets were lost. Routing table did look okay there, but still, I wanted to check

Mac (actually all BSD kernels I believe) have a nice monitoring tool route -n monitor, and it clearly showed that packets are lost:

got message of size 124 on Tue May  3 11:31:17 2011RTM_LOSING: Kernel Suspects Partitioning: len 124, pid: 0, seq 0, errno 0, ifscope 0, flags:<UP,GATEWAY,HOST,DONE,WASCLONED,IFSCOPE>locks:  inits: sockaddrs: <DST,GATEWAY>
got message of size 124 on Tue May  3 11:31:32 2011RTM_LOSING: Kernel Suspects Partitioning: len 124, pid: 0, seq 0, errno 0, ifscope 0, flags:<UP,GATEWAY,HOST,DONE,WASCLONED,IFSCOPE>locks:  inits: sockaddrs: <DST,GATEWAY>

The gateway was WRT-H's IP address, meaning the TCP packet did left Mac, but was lost. There was a LOT of these messages when the problem was present, but in next moment, they stopped and network worked. I was freaking out. I disassembled my network to it's bits to rule out WDS, one router, another router, shut down TimeCapsule but nothing reliable. Btw, try to google for these kernel messages above, NOTHING but nothing really you can discover about them.

So, I googled for hungarian hacker community, knowing I am not alone having this piece of crap of equipment. And what a luck, I did found answer here. Many thanks to Hungarian Unix Portal and people participating in this forum! The guys starting this thread had exactly same symptoms as I had, but using different HW and OSes, he used Ubuntu (I started suspecting at Apple's OSX and who knows what, actually, I was clueless).

The limit

In short, it turned out that crappy wannabe-Cisco modem has a Conn-track connection limit set to 1024! But there is no Admin UI you can find it out or at least read the value! When the connection count is over that threshold, it starts dropping the connections! This is applied to TCP (stateful) connections, hence UDP is unaffected by this. It turns out -- luckily the guy in forum experimented out with his modem -- that modem's "SPI Firewall" is doing this, limits connection count to 1024 when turned on. And guess what the modem default is! I did not apply other fixes he proposed (again, I am not using modem's AP), but shutting down modem's firewall did make it work! Again, many thanks HUP user "ufoka"!

Later, I figured what happened. At home we have two laptops, and two smart phones going out (to the internet, making connections on modem), the printer for example is just "local" connection. But the phones, while did having WiFi set up for home networking, were mostly left on 3G to conserve battery. But when I bumped their firmware to latest Froyo, I started using mine with WiFi constantly on (since battery consumption showed very good and durable). Over the weekend my wife's phone was updated too, and her WiFi got turned on too. And it seems we were already near the 1k connections, and this just made us closer.

Simply, the blunt modem, when the threshold were hit, started silently dropping TCP connections, since it detected as "flood" or whatnot, and this is why hit it. Enabling the phones just made things worse. And this explains the "sporadic nature" of the problem too: the phones does sync here and then, when my pressed Enter in browser she actually created a connection "burst", same for me, etc. Blah.

Long story short: It's solved!

Tuesday, April 12, 2011

SLF4J Logging + Maven

Just as a "foreword": I assume SLF4J is used for logging, not opening discussion about this, do it elsewhere. Also, I assume you use Apache Maven 3.x for builds. These two facts are NOT about what this post is, I consider them granted and don't want to argue about these above.

This post is more about your SLF4J related artifact dependencies and their scoping. Usually, the things that make you swear (as real productive coder). I really love how slf4j project is "layered". There is not one monolithic artifact with all-or-nothing capability, but you actually have real freedom (see below) as a developer (and even as integrator) to choose actual backend and to "normalize" your logging if needed. But sadly, too many times I see this "layered" structure either unused (backend dragged along as transitive dependency) or misused in some other ways.

This is why I quickly drafted this little post, for quick reference. First the rules:

Rule #0

When you use more than one artifact of some project (SLF4J has 2-3 of them usually needed in an application), define properties for their shared version. Also, with proper use of dependency management section, you ensure you get what you want on your build's classpath. This is general advice.

It relieves you of later changes (bumping to newer version) and also clearly states what you want to expect as dependency.

Rule #1

From SLF4J project, the slf4j-api is the only and the one and only one artifact you want to depend in compile scope. Full stop.








Rule #2

From SLF4J project, the bridge ("-over-" and "-to-") artifacts should be "runtime" scoped, if needed at all. And it's only in some cases when you use another library as dependency, that relies on commons-logging for example (typical example is Apache HttpClient 3.x). In this case, you have to have commons-logging API on classpath (during tests and also runtime), and jcl-over-slf4j does exactly that. But, you don't want to compile against it -- even by mistake. Hence, the "runtime" scope. It will make it transitively dragged, but not interfere with classpath you want to code against.








Rule #3

From SLF4J project, you never ever want to include any backend artifact in a scope that is transitively propagated. Full stop.

To describe these rules above with examples, I'll try to take a look at those from different perspectives.

From SLF4J's artifacts perspective

The slf4j-api is against what your code compiles. Easy-peasy, "compile" scope is what you need.

The slf4j bridges are never used by your code -- otherwise why do you compile against slf4j-api in the first place? So, it must be some dependency. There are some strange cases when you extend some class and you do have to have it on classpath, but those are rare exceptions. Drag these only if you must (like in example above).

As a reusable library publisher, you don't want to assume what backend the project consuming your library use, right? So, do not make you library drag one.

From publishing developer role perspective

As reusable library developer, you want to ease the lives of your consumers, right? But you also love SLF4J, right? This makes your situation the simplest, just add slf4j-api as "compile" scoped dependency (that will be dragged), and use slf4j-simple in your tests, naturally with "test" scoped dependency. Done.

As application developer, you actually pull in multiple libraries, add some "glue" code and you create one or more deliverables. Here, you always want to keep your modules "clean", slf4j-api is the only dependency you want to drag over those modules. Some simple slf4j-simple backend will pop-up in your tests probable, but the "only and the real" logging backend appears only in those modules, that actually build/package/produce the final deliverable (WAR, bundle, App, etc).

From consuming developer role perspective

As library consumer, you really hate excludes, since that's the only way to fight back developers not setting scopes right. So, bummer, that's bad news. But believe me, after writing 5th exclusion in POM (IDE integrations are of great help here!), you'll start giving back patches for POMs and eagerly waiting for next dot releases consuming those same patches. Good work!

From project module hierarchy perspecticve

If we take a Mavenized build, and look at the order of modules built by Maven Reactor, we may -- and this is a very-very huge simplification of things -- that the list beings with some "API-ish" and "Util-ish" modules, next are the "Meat-ish" or "Imple-ish" modules, and the last in the chain is "Deliverable-ish" module(s).

Initial modules like APIs and Utils might drag slf4j-api as dependency, just to clearly advocate "we use SLF4J" and make API users know what is used for logging in here. Usually we do not log in those Util or API modules.

The In-the-middle-modules usually have bridge dependencies (along with the slf4j-api one) mainly because of dependencies needing some other logging API, like commons-logging, JUL or who-knows-what. That's fine.

Modules, usually last in reactor build order, are the one producing deliverables. This is where you, actually intentionally perform the decision about logging backend to use, and add proper SLF4J backend dependencies to POM.


Keeping in mind these simple rules, will make your libraries more consumer friendly, and easier to maintain!

Wednesday, February 09, 2011

Version or aversion from version

Since past 5 years I was sniffing a lot around Maven and it's repositories (Proximity was initially released in end of 2005, yay!). And almost always I found myself facing with following, or at least somewhat related problems: version sorting.

Question that looks so simple and logical, but still, not resolved -- at least that's what user requests shows -- still today, after 6 years. So, what is it about?

I'll use term "artifact" in it's broadest sense, it's just a blurb of bytes in a Maven repository, that might be a library in JAR, a self executable app packed as JAR, a full blown application packed az ZIP.... you know, artifact. Naturally, the complete Maven coordinates are GAV (G = groupId, A = artifactId, V = version), but let's stick with version only for now.

People tend to think that "version" of an artifact is just as any other sequence of numbers: it's known where it starts, which one is next or previous. Well, that's not always true. Actually, it's almost never true.

Latest as what?

And here comes the notion of "LATEST" too. It was a concept in Maven2, that proved wrong. What was wrong in it? Well, exactly it's nature, "latest" as latest what? Latest released? Biggest version number?

And finally, the worst problem: people tend to confuse version (let's call it with FQN "Maven artifact version") and marketing version. Or worse: they tend to do all sort of "magic" and eloquent versions like "1.0-alpha-1", or "1.0-RC" etc. And this is the point when people introduce those strange, and usually very funny versions, usually not obeying any "standard" convention, and dooming softwares trying their best to sort versions to fail miserably.

One important thing to know, before you choose your version of a release artifact is that Maven repositories are eternal. Once released, your artifact will stay there forever. Later, when you realize your mistake, or that your "alpha" is actually "pre-alpha", you can't do anything, except violate the eternity contract. And that's wrong to do.

Or, change your way of thinking about Maven version.

Artifact version should not be used for "marketing versioning". Full stop.

No for marketing version

I'll go further, and just state version could be just a number N increasing on every release. So, a sequence of 1, 2, 3, ... etc, similar to Mercurial's "local revision" or Subversions "revision" numbers. You cannot "engineer" and tag your release with some "cool" revision number in Subversion (not that you would want that, this is just for example' sake), it forces you to simply "pick the next one".

Yes, the above will not work if your project uses branches (they usually do). So, let's say then that version is N.M.. and so on, and finally we arrive to general contract of X.Y.Z, used across many Java, Maven but also non-Java projects (think Linux kernel versioning).

In general, you should NOT go beyond X.Y.Z form. Personally, I'd change Maven to enforce this version form, and just fail the build if any other form of version is found in POM. Done.

But still, with branches, your artifact might produce following (simple!) "timeline", releases on time axis:

  • 1.0 (initial cut, from trunk)
  • 1.1 (from trunk, and then branched as 1.x, since you intend a major rewrite)
  • 2.0 (from trunk)
  • 1.1.1 (from branch 1.x, a bugfix release)
  • etc

So, which one of these is "latest"? Before you say "it's 2.0", think again. Last released is "1.1.1". Greatest version is "2.0". Okay. But I believe you noticed the (at least) two different semantics for "latest". Here you go, and you can easily add new meaning to LATEST too.

Not to forget about different scenarios: the sorting has to be "stable", since either you have a long running repository where you deploy from time to time, or you are actually doing a restore from a backup after system failure, and let's say, you have artifacts only and intend to restore repository metadata using some tools... you'd expect same ordering in metadata, right?

Do you "compare" groupIds (let's put aside the usual processing of them, like sorting those alphabetically)? Or artifactIds? Do they have metrics? Are they comparable? Not in this sense. So why would version have these properties? Just take a peek at examples below:

A good example of broken versions is this one below. Your sorting algorithm should know what "pre" means! Until some "final" 4.0 release, the "pre-alpha" will be taken as latest always for "sorted versions"!


For great "source" of existing versions in Maven Central Repository, I always go to this file below, I gathered once:


Message in the bottle

Obviously, the most frequent reasons for eloquent versions is usually that a developer(s) want to add some meaning to it. Usually for marketing reasons. And that's wrong. Introduce a "marketing version" somewhere else, use that on your site and explain the reasons behind it, but Maven version should not be used for marketing versioning. Or stop expecting "proper" behavior for sorting of the same versions. Use your site for "explaining", the artifact version is just a "pointer" to proper place on your site.

In this shed of light, I'd say you could release always! You could simply consider every successful CI build (or every 5th or whatever) as a release (and THAT would be agile)!

I mean, no project out there (let's consider the "better" Maven3 world-only, as compared to Maven2 world for a moment) will pick up a new release "by mistake". All the version are locked down in your POM, right? Plugins, dependencies, everything.

A bit of digression here: yes, the version ranges. Well, I look at them as bit of a mistake. Mistake of how they are implemented in Maven. Having ranges during runtime (a la OSGi) -- sourced from deployed POMs -- is fine, but during build-time is totally wrong in my opinion. To be more precise, ranges skews the "factual truth about the build", and their purpose is more meaningful for runtime (when someone consumes your artifact and want more freedom to calculate runtime dependencies), but you, when building your artifact, you did build it against one single fixed version! But with a range in version tag, the information "against which exact version was this artifact built" is lost.

I'd rather have some solution (like introducing new "runtimeVersion" in dependencies tag, used by consumers of deployed POMs) to state my "compatibility" against a range of versions, but have my POM properly describing my build.

Version is just a pointer: 0xdeadbeef

Yes, Maven version can have "holes", they must not strictly follow each other. Maven version is just an element in artifact coordinate triplet primarily, and only after that is meant to carry some "light" (pointing?) information for people to comprehend only by "looking at them". Not the other way around!

Change the way of thinking about version, and you will save yourselves from a lot of grief.

Monday, January 24, 2011

Filter streams and nasty libraries

Have you ever need a library that does some cool stuff, but you wanted it as "filter stream" and the library did not offer filter stream implementations? And it was either "Ugh, I have to redo it all" or "Ugh, I have to reorganize all" feeling, but all the work was done in that utility already in front of your nose?

Or, did you ever wanted to use GZIP compress on inputStream or decompress on output stream (the opposite what GZIPInputStream and GZIPOutputStream does)?

No problem.

Here is small reusable utility (not deployed yet anywhere, you need to build it locally to toy with it):


and an example of using this stuff in one of the "most nastiest Java library", the canonical LZMA compression utility:


This above is compatible with OS lzma CLI tools even!
Please be gentle, this is still baby code, but any improvement accepted!
Both are ASL 2.0, so enjoy!