Quick Mac Tip : install Java 1.5 on OS X Lion

If you're like me, you might need to test compatibility with Java 1.5 despite the fact that it's old and that nobody should use it anymore :) Unfortunately the reality is that some users migrate slowly, and so it might be useful to be able to test with old versions of the JDK.

As you probably know, the JDK 1.5 is not available officially for Mac OS X Lion. But there is a way to install it anyway, and it is very well documented here : http://www.s-seven.net/java_15_lion

Let me know if you liked this tip, I might add more in the future :)

A few new things...

For those of you that have been following my blog (btw thanks !), and for the newcomers, I'd like to announce a few new things coming to this blog. First and foremost, there should be more activity here, as I have been able to shift around some of my time towards more outside communication now that we have version 6.5 out the door.

What this means is that you will get a new kind of posts in this blog : samples. I will use these samples to illustrate what can be achieved using Jahia as a basis for integration of all kinds of technologies such personalization, real-time server statistics, and much more !

Of course I won't forget to have some fun, and post some videos that we have made to promote or illustrate some of the lighter sides of our work.

I must say I am very excited about the new Jahia 6.5 release, and I can't wait for you to try it ! You can download it here: http://www.jahia.com/cms/home/download.html

 

Integrate Maven and Growl

Don't you have the problem that you launch Maven builds that take some time to execute and in the meantime you switch to something else ? The problem is that often the build finishes in the background and it's only later that you come back to it, once you've finished your foreground task.

Well I do that a lot, and I found this great little tutorial on how to integrate Maven and Growl on the Mac. There are probably ways to do this also on Windows and Linux I'm quite sure, but as the Mac is my main development platform, I will leave others to take care of the other platforms :)

  1. Install growlnotify from the Extras package in Growl's standard package
  2. Add a script in your MAVEN_HOME/bin directory called "gmvn" with the content : 
    #!/bin/bash
    mvn $*
    if [ $? -gt 0 ]
    then
        echo "Build failed!" | growlnotify "Maven" --name maven --image ABSOLUTE_PATH_TO_IMAGES/maven-growl-failure.jpg > /dev/null 2>&1
    else
        echo "Build completed successfully." | growlnotify "Maven" --name maven --image ABSOLUTE_PATH_TO_IMAGES/maven-growl-success.jpg > /dev/null 2>&1
    fi
  3. chmod +x gmvn
  4. Then use gmvn instead of mvn to build your projects.

You can retrieve the images from Geoffrey's website. Note that you will have to specify an absolute path to the images. For some reason a relative path doesn't seem to work.

JCR is not dead, and neither is CMIS

Recently, an article on CMSWire caused quite a stir, mostly because it was asking the controversial question "Is the JCR dead ?". In reply, a few opinions posted by myself and other CMS actors/vendors were quick to appear, but I think some clarification is needed in order to explain what I think is really relevant for developers, integrators and end-users.

The quick answer is : neither JCR nor CMIS are really important for end-users. Fortunately most of them will never have to deal with either, and only integrators and developers will have to bother.

Now these two standards were put in opposition in the article, probably in the hope that the controversy would attract readers, but it actually doesn't make sense to do that. A lot of developers are actually using both, and the only real difficulty in integrating the two is translating queries, but apart from that it maps pretty well.

The JCR is not dead and still relevant, because other standards such as JDBC are not dead and also still relevant. A lot of people wanting to see the JCR die have probably had bad experiences with it, or have moved on to other technologies, but this standard still make a lot of sense in the Java CMS and WCM world. Where else can you get a native language API that offers powerful queries, versioning, flexible content definitions, import/export etc ? Sure CMIS offers some of these, but at the same time you probably wouldn't want to use CMIS in the middle of your Java project. CMIS is mostly oriented towards being a service interface.

One may want to ask the question : if my technology exposes as a service, why even bother with a middleware standard such as JCR ? Mostly because JCR addresses some standard features that are hard to implement well, such as advanced queries or versioning, and standardizing the interface allows for a reliable layer at which to write tests and implement things properly. For example, query parsing and execution can be complex, and having a standard to define it them good both for interoperability, migration and overall quality. This doesn't mean that it doesn't have any drawbacks, as is the case for example in the amount of query languages supported, which really should be reduced in the next version, because it complexifies the implementations. In JCR 2.0 it is possible to query for content using the SQL-2 language or the Abstract Query Model, but also with the SQL-1 or XPath languages that were part of the 1.0 specification and are now deprecated. What this implies is that the default implementation, Jackrabbit, has to support all four of these query systems at this point, and this makes the implementers' job all that more difficult, and optimizations are equally hard to do. This problem is a transitional one, and one can hope it will be resolved in the next version, as the old languages are already deprecated.

Another important and difficult to implement sub-system is versioning. We, at Jahia, have noticed how complex this can become when doing advanced versioning operations on trees. A good example is CVS, which couldn't support moves, because of the complexity that is needed to properly version such operations. It is only much later, and with a different model, that SVN managed to fill the gap. All this to say that standardization of these features allows to make sure that they are well defined, and that JCR users can rely on such features to build value on top of it. It also means that in the case of open source implementations of the JCR, people may collaborate to develop and maintain such complex code. The alternative is to redevelop this all from scratch in a non-standard way, and coming from this world I can tell you it is not the best option for neither the developers, integrators nor the end-users.

The above are just two examples of why this standard actually help build common infrastructure, and make sure that backend features are available. There are many more features that are part of the standard, such as ACLs, content definitions, import/export, observation, workspaces, transactions that are equally part of the infrastructure that most developers want to take for granted, rather than re-implement.

As I have mentioned previously, my biggest gripes with CMIS is that it is too file-oriented, and that it lacks a simple interface. Maybe I should explain what I mean by simple interface. Today, in the world of Ruby on Rails, Apache Sling, Jahia 6.5, Day's Communique, people want simple (yet still powerful) access to their content objects. All the aforementioned tools allow for simple REST HTTP mapping of URIs and attributes to back-end content objects. CMIS 1.0 still requires that you use either ATOM or SOAP to interact with content objects, and even so a lot of wiring is still required. Of course one of the goals of CMIS is that tools and libraries will be available to help with the integration, but it will be hard to beat the simplicity of a simple HTTP POST request to update content.

I have been following the CMIS open-source implementation at Apache, and the efforts are really great, but it hit a major roadblock when it switched implementation in the middle of last year, moving from the old Chemistry codebase contributed by Nuxeo to the one contributed by OpenText. It took a little while to merge all the functionality, and it has now reached a point where it is really interesting, at version 0.2.0. Of course at Jahia we are integrating with Apache Chemistry, but as we had worked with the old code base, we had to restart when the implementation changed. This is probably true of other people working with CMIS. So in no way do I want to diminish the importance of CMIS, but between the wild claims that are out there that CMIS will revolutionize the world and the hard truth of the code available in open source or closed source implementations, there is still a lot of work.

I hope there will be an emerging service standard that will fulfill the promise of being both flexible and easy to integrate, and this ends up to be CMIS so much the better. But let's not forget that before CMIS there was iECM, WebDAV, and before that many more that didn't work because of the complexity of the implementations, or because vendors never really committed to interoperability. It is also hard to keep a standard minimal, and at version 1.0 CMIS is already much more complex than I had hoped it would be.

I have also said that one of the main reasons people are interested in CMIS is that the standard is that Microsoft is onboard with the inclusion of an implementation in Sharepoint 2010. But let's not forget that Microsoft has a pretty bad history of keeping up with standards. One often forgotten example is WebDAV, called Web Folders in Windows. Initially a lot of people were very happy to see Microsoft implement this at the OS level, but very quickly, as the standard evolved, the WebDAV implementation that Microsoft built was not property maintained and therefore not seen as very important for them as a interoperability strategy, especially compared to .NET and SOAP. As times passed, Microsoft started replacing WebDAV with SOAP interfaces, and this is reflected in the current state of the integration between MS Office and Sharepoint.

On the Java side, I believe there is no current standard alternative to the JCR, and the alternative of having vendor lock-in on the lower levels is actually a little scary. We've had a custom built content repository for years before we integrated with the JCR, and maintaining this within our company was not the best way to focus our resources on something that should be common infrastructure, much in the same way we would never think of re-implementing our own database.

When working on a WCM, you really need to be able to handle tree-like structure, with varying properties depending on the position in the tree, with advanced features such as versioning, permissions, locking, structure definitions, in an API that is native to the language in which you are writing your WCM product. This is what the JCR is for and what it is good at. It is NOT a service API. It would be equally crazy to use CMIS within an implementation of a WCM, because this means that each access to a content object would have to go through layers of transformations to handle the ATOM or SOAP calls. But if you are working on integrating loosly coupled data repositories, with a lower volume of calls, then using CMIS makes a lot more sense. This is why CMIS works well with files, but not as well for generic content objects.

So in conclusion, I think that a lot of noise was generated around all this, but the good news is that standards do exist and they are alive and well, and let's hope that their evolution will make them even better. But the real interesting work is above all this, to make content easier to generate, curate, share and retrieve.

Open source makes your customers happier

I often get the question of why open source code is important, aside from the usual benefits of code review, security auditing, and the general idea that more eyeballs makes for better implementations ?

Well, where it really shines is during support, when you are investigating a bug.

Let's say that you have a bug, and that for once it is not in your code. It seems to come from some library that is used by your software. When this happens, for example when I'm working on an iPhone application, it usually takes a long time to find the method responsible for the bug. If the libraries are closed-source, this time can be made longer simply by the fact that you are not sure what the dependencies between libraries are, and can spent time trying to pin-point the location of the problem. Once you have managed to track it, all you can do is file a bug report to the author, and hope it will be adressed. In this example case of an iPhone application, this might be a long time, if ever it gets fixed (as bugs are proritized by project managers).

In the case of an open source product, such as Jahia, you have the full source code, and you are free to modify it for your own means, or redistribute the modification under the same license. So this means that when tracking down a bug, I can not only more rapidly find the origin of the problem, but I can, if I know how to do it, correct it myself and not have to rely on external resources that may not be available at the time. Also, when debugging, it is really great to be able to trace through the code to understand what is going on. Maybe the bug is in your own code, but actually seeing the source code of the external library made you understand what was wrong in your own code a lot faster. You could ask an external consultant to work on it, and he could do it provided he has a good knowledge of the code, and help you fix the issue faster. Finally you can contribute the fix back to the author of the source code if the issue was found in an external piece of software.

So in the end, bugs get identified and fixed faster, you spend less time with bugs, the customer gets a answer faster, basically everyone wins.

This is for me one of the biggest commercial advantages of open source against closed source, and if it makes the customer happier, it means you will retain him.

 

Goodbye Google Wave, Google, please open source it.

I have just heard about the Google Wave discontinuation by the end of the year, and I must say my feelings are mixed. For me, the real reason for it's failure is not user adoption like Google and others seem to put it, but mostly execution.

Google Wave had a lot of technical hurdles, because it was very ambitious. For me the most promising part of the technology was the fact that Google announced they would, at some time, open source the back-end server and let everyone, in their own infrastructure, install Wave servers, much in the same way that people install Mail servers. This is what was really great about Wave, not all the gadgets like real-time typing and correction. Because it meant that this technology would be truly de-centralised, that no one could "own" your information. Actually with the hindsight this was very strange coming from Google, who tries to get as much as people's data as they can in their infrastructure. Maybe this is the real reason they cancelled this project ? :)

Anyway, at Jahia, we initially were very skeptical about the technology, and to be frank, when we got our first accounts we didn't really know what to do with it. We had all kinds of ideas of how to integrate it with Jahia, and we played a lot with wavelets, but apart from the basic fun, we never found good use cases for it. Then suddenly one of our developers really insisted that we use it as a brainstorming tool, and we quickly discovered how great this new tool could be ! This was really the killer application for this type of usage, and despite some hurdles (no notifications at first, and lack of integration with browsers or desktop clients), we started using it regularly and were becoming quite good at it.

So for us we really found a good way to use Google Wave and despite its flaws we were happy with it. And now Google has decided to cancel the project. Fine, we'll go back to our old ways of doing things, maybe having learned a little on how to better collaborate.

I think that Google really has an opportunity now, at least if they are ready to take this chance, to fully open source the technology. Sure there might be some Google specific parts, or maybe even some missing parts, but both the GWT client and server parts could be very interesting to a lot of developers, and some might even be able to use it as a basis for building libraries to implement the Wave protocol. They have already some source code available at the Wave Protocol website and they should just expand on that, make it into something easy to build and test locally, including the GWT client.

I hope this is what they will do, and who knows, maybe this could even save Wave ?

Thoughts on Adobe's planned acquisition of Day


As the web is still reacting to the news of Day's announced acquisition by Adobe, I couldn't help but also want to share my thoughts about this interesting new development in this arena.

The first thing that comes to mind is a conflict of company ideologies. On one side a company that is built around a single product that uses a model of partly open source software, where all the infrastructure is open-sourced in Apache Software projects, and on the other a company that couldn't be more traditionally closed source.

Which ideology will take over? At this point I think that no one can really tell. I believe that the deal was probably reached because Adobe made some medium-term commitment to keeping the same strategy in the business unit that Day will join, probably mostly in an effort not to scare away some of the most valuable open source contributors such as Roy Fielding or David Nueschler, and probably also because Adobe knows that it has some benefits to go in that direction.

But if we look historically at acquisitions in this field, it takes a very real and very strong commitment to open source to stay the course. Even we, at Jahia, know and appreciate that. We distribute our own full code under the GPL license. Open source makes a lot of sense for a small company like ours, but for larger companies like Adobe the commitment is more difficult, because it is not entirely rational, it is also ideological. Some large companies, such as Google, partially use this as a recruiting tool, since researchers usually prefer the notion of sharing knowledge and having independent peer reviews than working alone.

In a company like Adobe, or even Microsoft, to go the open source route usually takes some strong will from key company executives to happen. This has partially happened at Microsoft, which has now started contributing to the Apache Software Foundation. Other examples include Novell or IBM which were able to re-invent themselves by embracing open source.

Coming back to Adobe, only time will tell if they will indeed commit to open source as a company in the long term. But I do think that it will be not be that easy for some Day employees to become part of a much larger structure that will change the way they have worked until now. This is true of any acquisition, not just this one. Hopefully they will help Adobe become a more open company.

What does this all means for Jahia? Well many things. First of all it means that there is definitely great value in open source Java companies. It also validates the standard-based approach that has been a part of our software from the start. It means that customers that want a very agile relationship with their solution provider will probably be most likely to come to us rather than to Adobe. If Adobe doesn't commit in the long term to open source, we will benefit by offering a more open product, and if they do continue Day's strategy over the long term we will be able to collaborate with them to share and improve common infrastructure, technology and code. So I think that this is really good news for us all in all :)

So in conclusion, I would like to salute all my friends at Day, well done guys !

Debugging EHCache in a cluster

Yesterday I had to find a bug in EHCache in a cluster installation, and wanted to use the EHCache remote debugger, as described here : http://ehcache.org/documentation/remotedebugger.html
 
It turns out the documentation wasn't very clear, and it wasn't less clear was where the package could be found. In fact it can be retrieved here : http://sourceforge.net/projects/ehcache/files/ehcache-debugger/ (note the fact that it seems the name of the debugger is either "remote debugger" or "debugger", but it's the same code base).
 
Now the tricky part was to make it work with Jahia. The way the debugger works is to actually participate in the cluster as an EHCache cluster node. What the documentation doesn't tell you is that in order to participate in the cluster, it will need to be able to deserialize all the objects it receives in the cluster messages, and this is why your application JARs are required. Also, I have tested it with JGroups replication, and it seems to work fine, so it can safely be used in other setupts than the RMI replication.
 
Another problematic part of the documentation is the fact that the example command line mixes the -classpath and -jar command line options, which isn't supported by the JDK 1.5. So the example command line from the documentation will not work. Also, as is the case with Jahia, there might be a lot of application JARs, so it can be quite tedious to list them all. I put a little shell script together, that will automatically create the classpath correctly for a Jahia installation, which I am showing here :
 
debug_ehcache.sh
--------------------------
 
buildClassPath() {
  jar_dir=$1
  if [ $# -ne 1 ]; then
  echo "Jar directory must be specified."
  exit 1
  fi
  class_path=
  c=1
  for i in `ls $jar_dir/*.jar`
  do
  if [ "$c" -eq "1" ]; then
  class_path=${i}
  c=2
  else
  class_path=${class_path}:${i}
  fi
  done
  echo $class_path
  #return $class_path
}
JAHIA_LIBS=/Users/loom/java/deployments/jahia-6-0-hotfix/apache-tomcat-6.0.18/webapps/ROOT/WEB-INF/lib
JAHIA_SHARED_LIBS=/Users/loom/java/deployments/jahia-6-0-hotfix/apache-tomcat-6.0.18/lib
JAHIA_CLASSPATH=`buildClassPath ${JAHIA_LIBS}`
JAHIA_SHARED_CLASSPATH=`buildClassPath ${JAHIA_SHARED_LIBS}`
CLASSPATH=${JAHIA_CLASSPATH}:${JAHIA_SHARED_CLASSPATH}:./backport-util-concurrent-3.1.jar:./commons-logging-1.0.4.jar:./commons-collections-3.2.jar:./jsr107cache-1.0.jar:./ehcache-debugger-1.5.0.jar
export CLASSPATH
java net.sf.ehcache.distribution.RemoteDebugger ehcache-jahia_cluster.xml $1 $2 $3 $4

-------------------
 
Before launching this, make sure you copy your EHCache configuration file from WEB-INF/classes/ehcache-jahia_cluster.xml . Also, as this file uses variables injected from the jahia.properties file, you will have to replace the variables with the real values, as in the example below :
   <cacheManagerPeerProviderFactory

   class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"   properties="connect=TCP_NIO(start_port=7870;bind_addr=127.0.0.1;loopback=true;recv_buf_size=20000000;send_buf_size=640000;discard_incompatible_packets=true;max_bundle_size=64000;max_bundle_timeout=30;use_incoming_packet_handler=true;enable_bundling=true;use_send_queues=false;sock_conn_timeout=300;skip_suspected_members=true;use_concurrent_stack=true):

       TCPPING(initial_hosts=127.0.0.1[7870],127.0.0.1[7871];port_range=10;timeout=3000;num_initial_members=2):

       MERGE2(max_interval=100000;min_interval=20000):

FD_SOCK:

FD(timeout=10000;max_tries=5;shun=true):

VERIFY_SUSPECT(timeout=1500):

pbcast.NAKACK(gc_lag=100;retransmit_timeout=3000;discard_delivered_msgs=true):

pbcast.STABLE:

pbcast.GMS(join_timeout=5000;shun=true;print_local_addr=true):

VIEW_SYNC(avg_send_interval=60000):

FC(max_credits=2000000;min_threshold=0.10):

FRAG2(frag_size=60000)"

   propertySeparator="::" />

You can then start using the script to listen to a cache in a Jahia cluster installation. Here is an example command line :
 
./debug_ehcache.sh SkeletonCache
 

The output looks like this :
 
Received removeAll notification.
Cache: SkeletonCache Notifications received: 1 Elements in cache: 0
Cache: SkeletonCache Notifications received: 1 Elements in cache: 0
Cache: SkeletonCache Notifications received: 1 Elements in cache: 0
Received put notification for element [ key = 2-normal-en-administrators|administrators|guest|users-$$$#$#G_ContentPage_2WORKFLOWSTATE-normalLANGUAGECODE-en#$#G_SITE-2#$#G_USERNAME-administrators|administrators|guest|users, value=org.jahia.services.cache.CacheEntry@9d04dc, version=1, hitCount=0, CreationTime = 1252940741198, LastAccessTime = 0 ]
Cache: SkeletonCache Notifications received: 2 Elements in cache: 1
Cache: SkeletonCache Notifications received: 2 Elements in cache: 1
Cache: SkeletonCache Notifications received: 2 Elements in cache: 1
Received put notification for element [ key = 2-normal-en-guest:0-$$$#$#G_ContentPage_2WORKFLOWSTATE-normalLANGUAGECODE-en#$#G_USERNAME-guest:0#$#G_SITE-2, value=org.jahia.services.cache.CacheEntry@8caee7, version=1, hitCount=0, CreationTime = 1252940747195, LastAccessTime = 0 ]
Cache: SkeletonCache Notifications received: 3 Elements in cache: 2
Cache: SkeletonCache Notifications received: 3 Elements in cache: 2
Cache: SkeletonCache Notifications received: 3 Elements in cache: 2
Received put notification for element [ key = 302-normal-en-guest:0-$$$#$#G_USERNAME-guest:0#$#G_SITE-2#$#G_ContentPage_302WORKFLOWSTATE-normalLANGUAGECODE-en, value=org.jahia.services.cache.CacheEntry@535057, version=1, hitCount=0, CreationTime = 1252940754196, LastAccessTime = 0 ]
 

This can really be a neat tool to diagnose or simply get a feel of how Jahia is using EHCache to communicate within a cluster. I hope this little blog entry will help you use this debugger, because despite the tricky setup it is very useful and a neat design.