These are the two things I am doing these days, apart from regular work at office.
1. Developing the next version of HarvestMan
2. Developing the next version of HarvestMan
Well, that is right... It was not a typing mistake :-)
HarvestMan is going to get a major update coming May, and it will be the result of more than 1.5 years of work. In fact the program has changed so much that I am changing the major version. It is going to be HarvestMan-2.0.
There are certain surprises with this HarvestMan release. Some of the interesting changes for NLP and Computational Linguistics programmers will be the addition of a plugin API that makes developing extensions for HarvestMan a breeze. In fact, the current CVS of HarvestMan already features an extension which binds it to an existing open source indexing engine. Apart from that, the program features a number of changes from the earlier version (1.4.6) so that it is almost on its way to becoming a "platform for web crawler software development", where I envision it to be.
Apart from that, this time HarvestMan will consist of two apps in one - that's right. There will be two applications using the same codebase. The crawler application (HarvestMan, of course) and a brand new web downloader application which supports multipart downloads. Let the name of this application be a mystery for the time being. The application just might change the command-line download experience of Unix/Linux users from the typical wget one. I will write more about it in the coming weeks.
In fact all of this is already in CVS. Anyone interested can checkout the latest code from berlios repository using anonymous CVS. There is not much documentation apart from the documentation in the code, but the code is pretty stable at the moment.
This should get released by mid May.