I dream the paperless office

And I know pretty well it’s something almost impossible to have; yet I’d like to have it because I’m succumbing in a ocean of paper right now. And paperwork as well.

While I have the Sony Reader to avoid having to deal with tons of dead tree books (although I do have quite a bit still, lots of which are still being consulted), I didn’t try before to clean up my archive of receipts, packaging slips, and stuff like that.

Time has come now since I have to keep some fuller, cleaner archive of invoices sent and received for my new activity as a self-employed “consultant”; I decided to scan and archive away (in a plastic box in my garage, that is) the whole of the job papers I had from before, as well as all my medical records, and the remaining parts of the archive. The idea was that by starting anew I could actually start keeping some time of accountability of what I receive, and spend, both for job and for pleasure. Together with the fact that is less stuff to bring around with me, this makes two things that would get me nearer toward actually moving out of home.

Sunday and Monday I spent about eight hours a day scanning and organising documents, trashing all the stuff I’m not interested in keeping an original of (that is stuff I’m glad I can archive, but that even if I lost is not that important), and putting away in the plastic box the important stuff (job and medical records, receipts for stuff that is already in warranty, etc.). I think I got through around 400 pages, on a flatbed scanner, without document feeder, assigning a name to each, and switching scans between 150 and 300 dpi, colour, grayscale and lineart scans.

I guess I’ll try to keep my archive more updated from now on by scanning everything as it arrives instead of waiting for it to pile up for twelve years (yes I got some receipts dating back to twelve years ago, like my first computer, a Pentium 133) and then trying to crunch it away in a few days. My wrist is aching like it never did before, for the sheer amount of sheets put on and removed from the scanner (I sincerely hope it’s not going to give up on me, it would be a bad thing).

Now I’m looking for a way to archive this stuff in a quick and searchable way, file-based structures don’t work that well, tagging the stuff would work better, but I have no idea what to use for that. If anybody has a free software based solution for archiving, that can be queried by the network too is a bonus, that it works on Mac OS X with Spotlight is a huge bonus, I’d be glad to hear it.

I’m also going to try out some software for accountability; I’ve heard good words of gnucash but never tried it before so I’m merging it right now; for now I don’t have enough invoices to send out that would give me reason to start writing my own software, but if there is something out there customisable enough I’d be glad to bite the bullet and get to use it. Spending my free time to work on software I need to work is not my ideal way to solve the problem.

Up to now I only worked very low profile, without having to invoice or keep records; luckily I have an accountant that can tell me what to do, but there are personal matters, including personal debts, credit cards and other expenses I finally want to take a good look at, so that I can extinguish them as soon as possible, and then start putting some away to pay for a car and a places to move to. Not easy to do I guess, but that’s what I hope to be able to do.

Frontends to command-line or libraries?

I know I’m still convalescing so I should be resting, not thinking about development problems, but this is something that ended up in my mind because I have one thing to absolutely do (scan the release documents from the hospital to send to my GP), and for which I miss an easy interface I could instruct my mother, or my sister, to use.

Don’t get me wrong, I know there are a few SANE frontends, starting from xsane itself, but the problem is that I don’t usually stop at scanimage when I do it from the console. What I usually do is to launch a batch scan to TIFF format, then use tiffcp to join the different TIFF files in a single file with multiple pages, and then use tiff2pdf to convert it to a PDF file, which is opened by any computer I might need to send the data to (and it also is quite smaller than the original TIFF file). Lately, I’ve started trying to add to the chain also unpaper, a tool that removes some of the defects usually found in scanning pages from books (like black borders and similar), which works on PNM (thus requiring a change in the scanning command, and a further conversion later on).

But I don’t want to start fleshing down how to actually write such a tool, or I might actually start writing it right now, which is not what I’m supposed to do while I’m convalescing.

What I want to think about is that here comes one huge debate between writing frontends for command-line tools, which just interfaces with them as process calling, or writing a full-fledged program that interfaces with the low-level libraries to provide similar functionalities.

I already happen to discuss that quite often since xine and mplayer embody the two spirits: xine has its library, frontends interface with that, and a single process is used; mplayer instead has a command interface, and frontends execute a new process for playing videos.

There are of course advantages and disadvantages, one easy to spot disadvantage to xine’s approach is that a crash or freeze in xine results in a crash or freeze of the frontend, which is something Amarok users have been unfortunately familiar with.

In the case of the scanning toolchain, though, I guess it’d be probably easier to use a frontend for the tools, as re-implementing all the different functionalities would be a non-trivial work.

The disadvantages of doing it this way, though, is that you’ll have to make sure that the tools don’t change their parameters between versions, otherwise it’d be a problem to ensure the correct functionality of the tool. Also, the tools need to provide enough options to control with granularity the execution of their task, which is something unpaper actually does, but in turn makes them almost unusable for final users.

I know I’m not going much anywhere with this post, I’m afraid, but I just wanted to reflect on the fact that to have a command line tool designed to be used by frontends, you almost certainly make its syntax so complex that users would fail to grasp the basic concepts, and in turn, you’d need a command line interface to the tool too… which is why there are so many scripts interfacing to ffmpeg for converting videos, I guess.

On the other hand, one can easily write such a frontend using scripting languages, even if it’s graphical, such as with ruby-gtk2 (think RubyRipper).