The importance of standards

I was going through some CMS and portal software implementations yesterday, and looked at them from a standards point of view. You might wonder, why are standards important ? Isn't it easier to build something minimal that will do just the job ? Well in terms of software engineering it probably seems to be, but you end up in a very proprietary system, but using standards don't necessarily mean that you will have to build more code.
 
One standard comparison that is often mentioned nowadays is the one between SQL and CMIS. Some call CMIS the "SQL for content", while others (like my good colleague Stéphane) view it more as the SQL for file systems :) But anyway, what do these standards really bring us, except for the hassle of implementing them and even harder testing interoperability ?
 
I think one of the best examples in this area is what has happened in the browser world. They would have never existed if it wasn't for standards. When Mozilla was started, there wasn't really a standard for HTML, it was written based on the implementation, but that's ok. When Mozilla became Netscape, it added a lot of extensions to HTML, like layers, that weren't part of the standard, and when Internet Explorer came to the market, it had a different implementation of layers. So basically for a while, people building web sites either had to do two versions of their websites, or just refrain from using layers until they were standardized.
 
So what did people gain from the standards ? Well on the end users part : choice. On the web developers side : less work. On the browser's side : a very good API to implement and especially to test against ! The last point is very important, it is one of the reasons that Java became such a success. The API was clearly a "standard", established through the Java Community Process process, to make sure that various implementations would comply to the same API. Sure there were some glitches here and there, but globally it was quite a success.
 
Giving customers choice is not necessarily something they will immediately understand, but it is good for them. When Internet Explorer became the dominant browser, the best way to attack it was through its less than compliant implementation of CSS. Suddenly web developers starting complaining about high development costs, were developing better looking layouts on other browsers, and Firefox became a very strong competitor.
 
In the CMS market, this need for standardization is still in the process of happening, but nonetheless important. Customers must understand that the investment they are doing into their content management system must imply the possibility to import and export the content easily, and especially interoperate between various content system. This is especially true when you go into the semantic web, where content systems will need to create semantic links across vendors, and that is still a bit of a pipe dream at this moment.
 
Some standards are de-facto standards. When these de-facto standards are actually owned and controlled by a single company, this is more of a problem. Look for example at Microsoft's Internet Explorer or the iPhone's AppStore. Both these de-facto standards are really creating frustration on both the user and developer's sides. In the iPhone's AppStore for example, the end-user cannot use applications that run in the background or fully interact with the phone, the applications only run on the iPhone or iPod Touch, and any complaints are completely ignored by the company because it cannot handle the sheer volume of customer requests. On the developer's side, the closed platform means that you can spend 6 months developing an application that will never be accepted. Again the fact that a standard is de-facto doesn't guarantee it's success.
 
But when the de-facto standard is established by an open-source foundation such as the Apache Foundation, things can be very different. Even in the case of SpringSource, fathers of the excellent Spring Framework, the de-facto standard can become a real strong force. So the combination of de-facto and open source is a really powerful one, especially if the implementation has a large public. But what is even better is a real standard and open-source implementations, like the Apache Jackrabbit or Apache Chemistry projects. Even if the project might still run out of steam one day, the standards on which they are based will still be there, and the legacy can be guaranteed to be understood and the interface still present to allow future developers to be able to interact with the systems.
 
In the aviation industry, they do the exact opposite to what most of the software industry does, they only use "old" bricks, that they know have been proven to work, and adhere very strictly to standards and specifications. This allows constructors to protect human lives from harm. This has an incidence on cost of course, but because they are doing this mostly within a single company and also because of the materials and manpower needed. But there is software running in airplanes and space shuttles, so the importance of standards and high reliability doesn't need to be incompatible with the business of building software.
 
The real hard part is building software quickly and reliably, without incurring too many costs, and this is where the open source community comes into play. Some projects in the Apache Foundation have been incredible in that regard. The Apache Jackrabbit project is such an example, the Spring Framework is another, and there are many such stories of high quality software, adhering to standards, that have been developed much quicker than ever before. But they wouldn't be interesting if they didn't adhere to a standard, be it de-facto (because of community size and open-source) or "real".
 
Jahia was born out of a proprietary content management system, and is moving all of it's sub-systems to be built on on top of both the "real" and de-facto standards. It builds value on top of bricks that have been proven to be reliable, modern and standard such as the JCR, the Portlet API 2.0, WebDAV, GWT, the Spring Framework, Log4J, and many more. We are working on integrating CMIS and possibly other new standards (such as the work being started in the IKS project). Of course the real value to the customer is not in the bricks, but in the building you have constructed using the bricks.

IE 6 not supported in SharePoint 2010, even Microsoft likes Safari :)

At Jahia, we constantly run into the problem of IE 6 support, which (unfortunately) is still a requirement for us. But I couldn't suppress a huge smile when the following excerpt from this page : 
SharePoint Server 2010 won’t support Internet Explorer 6<. From the SharePoint Team blog: SharePoint 2010 will be “targeting standards based browsers (XHTML 1.0 compliant) including Internet Explorer 7, Internet Explorer 8 and Firefox 3.x. running on Windows Operating Systems. In addition we’re planning on an increased level of compatibility with Firefox 3.x and Safari 3.x on non-Windows Operating Systems,” according to the SharePoint Team Blog."
So basically even Microsoft is not phasing out IE 6 support and improving compatibility with Safari ! I love the "targeting standards based browsers", clearly implying that IE 6 is far from it, which I think nobody in their right mind will contest.
The real problem is that according to a Forrester Research report, IE 6 still has a 60% market share in the enterprise. I believe that the only real way that we can get this situation to change is for companies to introduce a browser policy, and to possibly use Firefox as the "new browser", and whatever version of IE is required for "legacy applications". But the reality is that IE 6 must die, and if the newer versions of IE are not interesting enough, then users should switch to other implementations, like Firefox, Safari or Chrome.
Another interesting point, although less surprising, is that CMIS is confirmed for the next version of SharePoint.

Filed under  //

Comments [0]

AtomPub a failure ? No, just not as good as JSON :)

I found a very interesting blog post by Joe Gregorio entitled AtomPub is a failure. Well of course the title is there to get your attention, but the underlying argument is interesting : we now have other options for interchange formats.
We definitely live in interesting times, where so much focus on ease of integration is moving interoperability a *lot* faster. This can be a problem for standards that are trying to keep up, and mostly why I strongly believe that it can be a good thing implementation can drive specification like in the case of Apache Chemistry, the Apache project to build a strong CMIS implementation. Standards are very important, but in the end people also want and need strong reference implementations of those standards. After all, even HTTP was specified together with it's first implementation :)
JSON is incredibly powerful because it is so simple. I can explain it to a junior developer in a few minutes. I have trouble discussing XML formats with senior developers. Sure JSON is not as complete, but it also serves a lot of purposes. Don't get me wrong, I'm not saying that we must get rid of XML, it is here to stay and quite useful, but I'm mostly talking about simplicity.
Another aspect of JSON that most tend to ignore is that with it's simplicity also comes it's efficiency. It is very hard to find another interchange format that is more compact, and easier to parse. In web systems that are constantly trying to handle bigger loads, the cost of XML processing can be offset by using JSON.
A really hope that the CMIS committee will approve the upcoming proposal to introduce JSON as a binding, even if it means in a first version to use it as an AtomPub interchange format, but I would much prefer that they go the whole way of making it an extremely simple binding that can be implemented by many in very little time. They have so far had this focus, so I'm pretty sure they are interested in this idea.
Oh and yes I am thinking about joining the OASIS TC :)

Filed under  //

Comments [1]

Will CMIS get a JSON binding ?

During the CMIS PlugFest, an idea that was mentioned by a few participants consisted of using JSON as a transport mechanism for CMIS repository operations. The idea was to be able to easily develop thin clients such as Javascript or PHP clients, and avoiding the pains of generating or parsing XML data.
I think this might potentially be the "revolutionary" binding in the specification, that really has the potential to surpass all the others over time. Sure it is absolutely great (and at the same time a double-edge sword since it means more work for implementors) that CMIS has all these bindings for a first version, but I think that we will eventually see a natural selection where only the easiest to integrate with all technologies will survive.
It takes one line of PHP (5.2) to serialize/deserialize JSON data (json_encode), and there are high quality libraries available for all the modern languages out there. Personally I've implemented JSON for an iPhone application in Objective C and it proved to be the most efficient way of transferring UTF-8 data over the wire.
The hard part of course is that JSON does not specify structure, only serialization format, and therefore it would be very important that the CMIS specification strongly describe the structure of the JSON payloads that are going back and forth between the client and the server.
JSON handles complex serialization a lot more elegantly than SOAP web services, and is simple enough for anyone to understand in a matter of minutes. Wouldn't that be a great basis for the content management interoperability standard ?
I know that a few people involved in the CMIS specification are already hard at work on proposing JSON as a binding for the 1.0 specification. This will be tricky since the deadline for the approval of the specification cannot change, so it means that all this must happen without slowing down the rest of the work. This will be a challenge, but I'm really interested in helping out, so please let me know how :)
I dream of a world where I can directly interface a lightweight Javascript CMIS client with any CMIS compliant server. My mind is racing with the possibilities...

Filed under  //

Comments [1]

Unlocking the full potential of Apache Chemistry : C++, C#, PHP, Javascript, (insert your favorite language here)

The Apache Chemistry project, the incubator project that was just approved has an incredible potential. Started as a place to experiment with a Java implementation of the CMIS specification, it can become much more. There are already implementations out there in Javascript and C++, although not yet contributed to the project, but this might happen sooner than you think.
The hardest thing to do is to achieve in any standard is true interoperability. But what if everyone was using the same code base from the Apache Foundation, freely available to businesses ? What if all this code was fully tested by automated integration test matrixes ? Even achieving this is a challenge in itself but it is possible, and I really think it should happen. My dream, although probably unrealistic is that even major vendors such as Microsoft, IBM & EMC could use the code developed in Chemistry as the basis for their CMIS implementation, and contribute back to the project whenever they see problems.
A lot of what is happening with CMIS is reminiscent of the SOAP craze. After all SOAP is used as a binding for CMIS, so it is quite normal to see similarities. One of the biggest interoperability problems for SOAP lied in the potentially complex serialization of custom objects, and this proved in real life to be very difficult to get to work between implementations. It is a testament to the great work of the contributors to the Apache projects related to SOAP implementations (Axis, and before that Apache SOAP) that you could get web services to really talk to each other.
The Apache foundation is the perfect place for such standards to be implemented and grow as the basis for the network infrastructure of the entire industry. It is a place where even competing interests can find common ground for sharing development costs.
Of course it's not necessarily easy to directly use Apache-licensed code inside corporations such as Microsoft or IBM, as they are very concerned about code auditing, especially as they are often the target of copyright infringement lawsuits, but at least IBM is known to use such code, and in some cases simply packaging Apache products (such as the IBM HTTP Server). So we know that even if it is not trivial to achieve it is possible. And for Microsoft, well maybe this will convince you ? http://port25.technet.com/archive/2008/07/25/oscon2008.aspx
I'm dreaming of an Apache Chemistry project with the following implementations available to all : Java, PHP, C#, C++, Javascript. Then of course you could have more such as Ruby, Python or whatever else you love, but the initial list would be perfect for integration with most systems, and provide truly interoperable systems, not just at the specification level, but truly at the implementation level.
Maybe it's just a pipe dream, but it is possible, so maybe we should get together and try ?

Filed under  //

Comments [3]

CMIS PlugFest interoperability demo videos

While at the CMIS PlugFest held in Basel (Switzerland) in April 2009, I managed to quickly get my Flip out to record some videos of the demonstrations of the interoperability between client and servers. The videos you will see here are organised by CMIS client implementations. Please excuse the low quality of the audio and the sometimes difficult to read screens, I hadn't planned to do this but I thought it might still be interesting to watch for all that weren't present. I highly recommend, if you have the bandwidth, that you view the videos in HD, it will be easier to read the screens.
So here are the videos :
 
SourceSense CMIS Portlet client

 
Flex CMIS Explorer

 
Javascript Chemistry CMIS client

 
Alfresco JUnit CMIS Testsuite

 
OpenText Windows CMIS integration

 
SAP ECM Explorer CMIS plug-in

 
Unfortunately I don't have a video of myself (I was first to talk and it's kind of awkward to talk and take of video of yourself :)) talking about the integration I wrote during the CMIS PlugFest, but what I said was mostly that it could connect and navigate both the Jackrabbit Chemistry and Alfresco repositories. It is a goal for Jahia to work with Apache Chemistry to be able to talk to CMIS repositories, and also expose it's own repository through Apache Chemistry.
Let me know if you thought these videos were helpful, I might make some more regularly in that case.

Filed under  //

Comments [0]

CMIS PlugFest : Day 2

The second day of the CMIS PlugFest was less about setting things up than actually trying to fill the grid of possible tests (over 28 possible tests) that cover all the combinations of client and server implementations that were available there.
I was myself hacking most of the day to get a brand new CMIS client implementation up and running, and I got the first unit tests working against both the Alfresco CMIS server and the Day CRX server. I even managed to navigate through the whole repositories. I then went on to work on the integration with Jahia, to be able to browse those repositories directly from Jahia's file manager. Unfortunately, despite the amount of code that was produced in the last 24 hours, it just wasn't possible. Despite this I believe this is really close to happening, and we should have something experimental to test against the various servers quite soon.
The good news of the day came from Jukka Zitting that was visiting the PlugFest, and that announced that Apache Chemistry was officially accepted by the Apache Foundation as an incubator project, and so over the course of the week-end Florent Guillaume, from Nuxeo, who contributed most of the code for Chemistry, will finally be able to commit all this work and let the community hack away at it.
We were also joined during the day by IBM who was also providing a server against which to test. We had very interesting discussions about what was needed in the CMIS specification. A few of us believed that CMIS should really make usage of JSON, and it seems that a proposal for going in that direction might just happen, but it must happen fast, as there is a lot of pressure to get CMIS 1.0 out the door soon.
The idea behind using JSON with CMIS is quite a natural one. Why have to deal with all the XML parsing involved in Atom Pub and Web services when you could use the extremely simple JSON format to exchange all the data you need ? I really hope that this will be added to CMIS as this could make it one of it's most flexible bindings. It is much simple to develop lightweight CMIS clients without the need for XML parsing. I'm mostly thinking of PHP, Javascript and other technologies.
In the interoperability tests, search was still a problem for many clients and servers, and this was of course expected as it is the most complex part to implement. This also means that it will be necessary in the near future to perform another interoperability PlugFest to go further into the testing and make sure that even search works properly.
The day was concluded with demonstrations of a few combinations of client and servers. Among the demonstrated clients were : Chemistry Javascript against Day CRX + Chemistry, SourceSense's CMIS Portlet against Alfresco, the CMIS Flex Explorer against Alfresco (which was also the client that worked against most of the servers !), OpenText's C++/Java client that integrates with Windows Explorer, Outlook and MS Word, and last but not least SAP's client that plugs into an existing infrastructure of explorer tools that plug into various SAP backend repositories (CMIS or not) and also exposes all the various back ends as WebDAV resources. This made me smile because there has been a lot of discussion wether CMIS should have a WebDAV binding, and basically SAP was demonstrating WebDAV on top of CMIS :)
During the demonstrations, a lot of discussions were focused on making sure that the points that were problematic to implementors were noted and then discussed in the OASIS TC. Among those points were the upload mechanism of binary data that is not very clear when using the Atom Pub binding, because there are two ways of doing this, and therefore this is confusing to client implementors. The second point concerned path navigation, and the fact that CMIS currently doesn't offer a way to lookup a content object by it's path. Currently, when given a path, the CMIS client implementations must navigate from the parent path down to all the descendants to find the object corresponding to the path, which is not very speed efficient.
It will be very interesting to see if these points can be adressed efficiently to avoid side tracking the specification. It seems that major vendors such as IBM, Microsoft and EMC want CMIS to be completed in Q3 2009, so this doesn't leave much time to solve issues.
All in all, I'd like to thank Day for hosting this PlugFest. This event, although not too certain how it would happen initially, turned out to be really necessary, and gave a really good picture of the real state of the CMIS interoperability. It is certain that this must happen again, when implementations are more mature, and especially when Apache Chemistry is fully available and ready to be tested. I would also like to thank Dave Caruana (Alfresco), who has been really great, helping me out with getting my tests up and running.

Filed under  //

Comments [1]

CMIS PlugFest : Day 1

I'm currently in Basel, participating in the CMIS PlugFest. For those of you that are not familiar with CMIS, you might think of it to content management (well mostly document management right now, but that might change) as is SQL to databases. It is a standard that will hopefully help interoperability between content management systems.
Day 1 started with some informal introductions, and setup of the existing servers. We now have OpenText, Alfresco and Jackrabbit+Chemistry servers up and running, and are running interoperability tests against those using a variety of clients, including OpenText that has a C++ plugin that connects Windows Explorer and as well as Office tools to CMIS back ends, SAP, Alfresco test units, two Flex-based clients (CMIS Explorer and CMIS Spaces), a Apache Chemistry CMIS Javascript client written by David Nuescheler at Day, an Apache Chemistry Java client that Florent Guillaume (Nuxeo software) is working hard on to commit hopefully before the end of the week.
It's really great to see so much effort going into interoperability tests. The most interesting thing about CMIS is the momentum behind it, more than the technology, that will probably still evolve over the year. It is also very important to get the first version of the specification out as early as possible, because so many specifications fall into tech-limbo, to never be completed (802.11n, etc...).
Apache Chemistry is also looking better than ever, having just been accepted into the incubator at Apache, and the code will probably be committed over the week-end. From then on hopefully the community will be able to have a look at it. This will also help interoperability, as even major vendors could use this code base to ensure compatibility.

Filed under  //

Comments [0]