Wednesday, April 25, 2007

What I am doing these days

These are the two things I am doing these days, apart from regular work at office.

1. Developing the next version of HarvestMan
2. Developing the next version of HarvestMan

Well, that is right... It was not a typing mistake :-)

HarvestMan is going to get a major update coming May, and it will be the result of more than 1.5 years of work. In fact the program has changed so much that I am changing the major version. It is going to be HarvestMan-2.0.

There are certain surprises with this HarvestMan release. Some of the interesting changes for NLP and Computational Linguistics programmers will be the addition of a plugin API that makes developing extensions for HarvestMan a breeze. In fact, the current CVS of HarvestMan already features an extension which binds it to an existing open source indexing engine. Apart from that, the program features a number of changes from the earlier version (1.4.6) so that it is almost on its way to becoming a "platform for web crawler software development", where I envision it to be.

Apart from that, this time HarvestMan will consist of two apps in one - that's right. There will be two applications using the same codebase. The crawler application (HarvestMan, of course) and a brand new web downloader application which supports multipart downloads. Let the name of this application be a mystery for the time being. The application just might change the command-line download experience of Unix/Linux users from the typical wget one. I will write more about it in the coming weeks.

In fact all of this is already in CVS. Anyone interested can checkout the latest code from berlios repository using anonymous CVS. There is not much documentation apart from the documentation in the code, but the code is pretty stable at the moment.

This should get released by mid May.

Tuesday, April 24, 2007

Open source and Innovation

What makes a successful open source project ? What makes a successful open source business ? How does successful open source projects make the transition to good business which still keep with the spirit of open source and open standards ?

These are questions any developer who is a serious contributor to open source would be interested in. Even if you are not an open source contributor, you will be interested in these questions, if your company is working in open source.

According to my experience so far, with my own projects and with some of the international projects I have participated in, a successful open source projects brings some new ideas to the table. New ideas need not be confused with new way of doing things or new I.P. It can be a new implementation of an existing protocol, it can be an open implementation of a proprietary standard, or it can be a project that uses existing open source components or applications to solve an existing problem (or a new one) in interesting and innovative ways. These need not always generate new kinds of intellectual property.

Yes, the key here is innovation. Good open source projects bring a fresh way of solving existing problems; they give a fresh perspective to existing way of doing things. Sometimes they are able to rewrite the rules by capturing the imagination of many hundreds of developers and thousands of supporting community members - a good example is the Firefox community. Some times, it will be a rather closeted group of skilled people in a rather niche area who finds a void in the experience of open source applications/operating systems and tries to fill the gap - a good example is the Beryl/Compiz projects which are working hard to bring display compositing to the Linux and open source crowd.

However, a common thread to all these project is this - they innovate. They innovate in fresh ideas, simplifying user experience and sometimes on performance. They often open up an entire new facet to an existing problem which makes programming a joy.

What do successful companies in open source have in common ? They understand the importance of keeping the developer crowd happy. They are keen to become good citizens of the open source community and contribute either their manpower or projects to the community - some do both. They understand that it is important not to just become consumers of open source but also stakeholders and participants.

When a company fails to understand this, or fails to create a working, effective developer policy towards open sourcing, it is prone to be assigned the category of a second or third rate citizen in the open source community. By just becoming a consumer of open source and not contributing enough, it risks alienating the coding crowd who tends to think of the company as a predator, not as an ally.

Most often, such companies never learn to use open source the right way too. By not participating enough, they fail to understand the driving force behind open source and people working in such projects. This in turn makes them less effective users of such software. For example, a company that brands itself as an open source integrator can never be quite effective if it does not understand the open source projects it is integrating and does not contribute developer resources to such projects; in fact, it is not even necessary to contribute directly most of the time. Indirect participation such as hosting meetings, contributing tools, toolchains and providing a platform for discussion and creation of new ideas are also good ways of contribution.

A company not doing any of these and still claiming to work in open source is somehow not doing the right thing. Such strategies are doomed to fail in the long term and even prove counter productive. In the long run a company like this is bound to move away from open source or bound to fail.