PDFs and Metadata

You might remember I was thinking about archiving data a few months ago. Up to now, I only stopped at scanning out the docs in PDF (trying to keep quite current with the inbound flow of paper) so that I could have easier access to the documents, and also getting rid of the high amount of useless paper around home.

The experiment up to now seems to be working out decently well. In the sense that the amount of paper around the house started to fall down, and at the same time I’ve been able to archive most of my stuff in a decent way by just using proper paths. Unfortunately, now stuff starts to get complex as well.

What I’d be needing now is some method to arbitrarily tag PDF files (the archive is all in PDF; while Stuart noted that TIFF would also be a decent way to store the data, there is one problem there in the sense that sometimes TIFF files don’t appear correctly on OS X. And since I mix operating systems I needed something that worked on both). And obviously an easy way to get the data out searching for those tags is also needed.

I have been told that XMP from Adobe should do what I need, I remembered the technology name and I’m pretty sure that yes, the way it was designed allows for what I’m looking for; obviously the problem is whether there’s a software that allows me to write down the type of metadata that I need; I’m not really too keen on writing my own, right now.

There is also the other problem of finding the data; I remember from some years ago Beagle could be used to do some on-disk search for documents. I also remember, though, that it was tremendously heavy, eating up lots of CPU and RAM, and just partly because of Mono, the rest was Beagle itself quite easily. Does anybody know whether it has improved? Or can suggest an alternative software to do something similar? I tried merging Tracker, but it doesn’t seem like it’s interested in indexing anything on my system, I have no idea why…

In theory, I’d like something that, searching for “H3G July 2009” would find me the correct PDF with the cellphone bill for the month of July 2009, and searching for “Amazon Office 2007” would find me the invoice for Office 2007 from Amazon UK. I’m fine with writing my own description to the files to get the right one.

If somebody has suggestions, they are definitely welcome. Thanks!

Exit mobile version