This Time Self-Hosted
dark mode light mode Search

Why oh why, sqlite…

Today I returned working on my ELF symbols’ collision detection script. Basically a simple script to tell you about shared objects or programs that have symbols with the same name, which will cause symbols’ collisions at runtime.

It’s a nice script not only for that, but also because it allows you to spot internal copies of libraries, as I wrote a long time ago. I resumed working on this after talking a bit with Patrick, to allow executing it on an arbitrary set of directories, rather than executing it only on the currently-running system.

There are already a few improvement in the git repository, notably now the harvest script does not only look for files ending with .so, but checks all the ELF files. Unfortunately to do so, it has to open the files one by one at the moment, and then close them. As soon as i can I want to rewrite this to just do one pass through the files, so that they are not all open twice.

There is one bad performance problem with SQLite (3) though: whenever I do an INSERT (which happens quite a lot as I’m doing one per file, and then one per symbol), sqlite seems to open the directory where its database file is generated (/home/flame/mytmpfs), create a journal file for the transition, and the unlink it.

It would certainly take less time if:

  • the directory was opened once, and left open till needed;
  • the journal file was created once, and truncated every time as needed until the database is closed (and then could be unlinked).

Anyway, I’ll be working on the script today, as I’m taking a day off from work, and then tomorrow I’ll leave to someone else to handle the bugs I’ll be filing 😉

Update: I have one extra reason to hate SQLite now. While running the consumer script (link_collisions.rb) on the SQLite database which I just completed generating on a tmpfs filesystem, there are a few complex queries being ran. To cache those, sqlite tries to be smart, and creates a temporary file:

flame@enterprise openoffice % ls /proc/31582/fd -l
total 0
lrwx------ 1 flame flame 64 21 gen 19:19 0 -> /dev/pts/5
l-wx------ 1 flame flame 64 21 gen 19:19 1 -> /home/flame/mytmpfs/collisions
lrwx------ 1 flame flame 64 21 gen 19:18 2 -> /dev/pts/5
lrwx------ 1 flame flame 64 21 gen 19:19 3 -> /home/flame/mytmpfs/symbols-database.sqlite3
lrwx------ 1 flame flame 64 21 gen 19:19 4 -> /var/log/faillog
lrwx------ 1 flame flame 64 21 gen 19:19 5 -> /var/tmp/etilqs_R4Cy7sH2VuPIYDH (deleted)

The problem is that /var/tmp is on disk, rather than in memory. I had the doubt when I’ve seen that the ruby instance was not CPU bound like I thought (all in memory should cause it to be CPU bound rather than I/O bound).

Note that if it used /tmp, I was also using tmpfs there. I wonder why on earth it uses /var/tmp for that stuff.

(And this reminds me that I need a ew faster box, any help with that would make me happy 😉 ).

Later I’ll post a summary of what I gathered from this run. I can say as a preview that I filed two bugs already.

Comments 5
  1. Hi Diego,the trick with multiple INSERTs in SQLite is to wrap them in a transaction. That makes the whole process fast enough to be useful.Sobegin work;insert;insert;insert;commit;When you don’t wrap INSERTs, a transaction will be created and commited for every insert. And in SQLite, transactions are expensive as you have noticed *grin*Philip

  2. Thanks Philip 🙂 jkt also suggested that to me after posting this; I have modified the code already, although I haven’t tried it yet because I’m still waiting for the link_collision script output (sloooow especially since it’s I/O bound thanks to the second issue).

  3. judging from my experience with SQLite INSERT performance, you are better of cancelling that script, do the change and then re-run it.Transactions are really, really really slow in SQLite.

  4. Hi!I have the same problem. Did you resolve it? Please leave comment if yes. Thank very much.

Leave a Reply to Philip HofstetterCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.