This Time Self-Hosted
dark mode light mode Search

Stash your cache away

While I’m now spending a week out of my home (I’m at my sister’s family place, while she’s at the beach), I still be working, and writing blog posts, and maybe taking care of some smaller issues in Gentoo. I’m just a bit hindered because while I type on the keyboard I often click something away with the trackpad; I didn’t think about getting a standalone keyboard. I guess if somebody would want to send my way an Apple bluetooth keyboard I wouldn’t be saying no.

While finally setting up a weekly backup of my /home directory, yesterday, I noticed quite a few issues with the way software makes use of it. The first thing of course was to find the right software to do the job; I opted for a simple rsync in cron, after all I don’t care much about having incremental multiple backups a-la Time Machine, having a single weekly copy of my basic data is good enough.

The second problem was that, some time ago, I found that having a 4GB USB flash drive was enough if I wanted to copy the home, but when I looked at it yesterday, I found it being well over 5GB. How did that happen? Some baobab later, I find the problems. From one side, my medical records, (over 500 pages) scanned with a hi-grade all-in-one laser printer (no, not by me at home), are too big. They might have been scanned as colour documents (they are photocopies, so that’s not really right) or they might be at huge resolution, I have to check that, since having over half a gig of just printed records is a bit too much for me (I also have another full CD of CT scan data).

The second problem is that a lot of software misuses my home by writing down cache and temporary files in it rather than in the proper locations. Let me explain: if you need to create a temporary file or socket to communicate between different software in the same host, rather than writing it to my home, you should probably use TMPDIR (like a lot of software, fortunately, does). The same goes if you write cache data, and yes I’m referring to you, Evolution and Firefox, but also to Adobe Flash, Sun JDK and IcedTea.

Indeed, the FreeDesktop specifications already provide an XDG_CACHE_DIR variable that can be used to change the place where cache data should be saved, defaulting to ~/.cache, and in my system set to /var/cache/users/flame. This way, all the (greedy) cache systems would be able to write as much data as they want, without either wasting my space on the backup flash, or forcing me to write them to two disks (/var/cache is in a sort-of throwaway disk).

For now I resolved by making some symlinks, hoping they keep stable, and creating a ~/.backup-ignore file, akin to .gitignore with the paths to the stuff that I don’t want backed up. The only problem I really have is with evolution because that one has so many subdirectories and I can’t really understand what I should backup and what not.

Oh and there are a few more problems there: the first is that a lot of software over the past two years migrated from just the home to ~/.config but the old files were kept around (nautilus is an example) and a few directories contained very very old and dusty session data that wasn’t cleared up properly.

Providing too many configuration options to tell where the stuff is, can definitely lead to bad problems, but using the right environment variable to decide where stuff should go and where it should be looked up at, can definitely solve lots of your problems!

Comments 4
  1. Image compression (or, to be more precise, lack of) is an issue where I’m working (public health care). Sometimes we really need high resolution, for example for digital mammographies, but most of the time high resolution is just not needed.We acquire pdf scans of medical records from hospitals, and we get 16,7M colours scans of photocopies O_oBut most of the problem comes from ignorance and default resolution… Powerpoint presentation are made with cut&paste, which uses BMP encoding resulting in 40+ Mb files O_oVideo captures are even worse, with 10 minutes videos taking up to 500 Mb.Users need to be trained to pay attenction to these issues, but they usually ignore me (and complain later, when our server exhausts disk space).

  2. saw some of those…use to be a lot of .(somethings) I noted. Sometimes I just remove all the subdirs but ya gotta watch that crap with things like .mozilla and .someemailclient cause if’n ya don’t… well there went all the email or bookmarks or saved configs anyways

  3. Try to convert your medical records to the free djvu format. This format is designed especially for scanned documents and should reduce the size of it dramatically.http://djvu.org/

  4. The variable is called @XDG_CACHE_HOME@, not @XDG_CACHE_DIR@.But thanks for the pointer! 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.