A case study: enchant’s internal Hunspell copy

Okay Ryan, let me see if I can write something up to at least help understanding how to solve the problem (as to why it is a problem, well, that I think I explained already a couple of times ;)

I took enchant as that’s the bug you linked to, and it also seems an easy one.

First off, enchant is an abstraction library for spellchecking. It supports ispell, aspell, hspell (all three of them have a CLI interface, so theyw ork “at an arm’s length”) and MySpell (OpenOffice.org spellchecker). As it turns out, nobody really use original MySpell and rather use the Hunspell fork, even our OpenOffice.org.

As Enchant creates plugins (modules), they have to use PIC libraries; up to an year ago, more or less, Hunspell did not install any PIC compatible library, just static archives, unsuitable for shared linking, and especially unsuitable for building shared objects on PIC architectures like almost everything beside x86. This is probably the reason why an internal copy of it was used up to now.

It turns out nowadays Hunspell does install shared libraries, and even a pkg-config datafile. So it should be easy to use those instead. The first step is to give a way to choose whether to use system hunspell or not. The easiest way is usually to add a new --with-system-foo option during build. I sinerely prefer a different approach when possible: rather than having just --enable-myspell and --disable-myspell, provide a --enable-myspell=external option, that allows to use the system copy of hunspell.

So it’s just a matter of changing the current conditional not to think of the $build_myspell variable as a yes/no one, and add a new conditional:

-AM_CONDITIONAL(WITH_MYSPELL, test "x$build_myspell" = "xyes")
+AM_CONDITIONAL(WITH_SYSTEM_HUNSPELL, test "x$build_myspell" = "xexternal")
+AM_CONDITIONAL(WITH_MYSPELL, test "x$build_myspell" != "xno")

Good. By the way, seems like the build system already supposedly had some support for external myspell, but it’s not actuall used, so let’s remove the old cruft, just to avoid messing with obsolete codepaths, and rename MYSPELL_CFLAGS/MYSPELL_LIBS into HUNSPELL_CFLAGS/HUNSPELL_LIBS, I’ll show in a moment why:

-if test "x$with_system_myspell" != "xno"; then

Now of course it isn’t nice to ask for system hunspell if it’s not installed, let’s see if it is there. The easiest way is to use pkg-config. As I’m not sure since which version they fixed Hunspell to install shared libraries, I check what I have installed in my system:

flame@enterprise ~ % pkg-config --modversion hunspell

and then add a test for that:

+if test "x$build_myspell" = "xexternal"; then
+   PKG_CHECK_MODULES([HUNSPELL], [hunspell >= 1.1.9])

This shows why I wanted to rename MYSPELL_CFLAGS into HUNSPELL_CFLAGS: the PKG_CHECK_MODULES macro uses the first parameter both as the prefix of the variables and to output “Checking for FOO…” during configure run. I didn’t want users to see “Checking for MYSPELL…” while Hunspell was checked instead.

Now the changes to the configure are finished, it’s time to start with the changes to the Makefiles, in particular src/myspell/Makefile.am.

First off, let’s replace MYSPELL_CFLAGS and MYSPELL_LIBS:



-libenchant_myspell_la_LIBADD= $(MYSPELL_LIBS) $(ENCHANT_LIBS) $(top_builddir)/src/libenchant.la
+libenchant_myspell_la_LIBADD= $(HUNSPELL_LIBS) $(ENCHANT_LIBS) $(top_builddir)/src/libenchant.la

Then, let’s not build all the Hunspell sources when we’re using external Hunspell, to do so, we create a commodity variable hunspell_sources to list all the hunspell sources, but only if WITH_EXTERNAL_HUNSPELL is unset:

-libenchant_myspell_la_SOURCES =        
+hunspell_sources = 

and then use that variable in the list of sources for the module:

+libenchant_myspell_la_SOURCES =        
+       $(hunspell_sources)     

Nice, let’s try it then :)

./configure --disable-myspell:

        Build Myspell/Hunspell backend: no


./configure --enable-myspell:

        Build Myspell/Hunspell backend: yes

flame@enterprise enchant-1.3.0 % nm -D --defined-only 
  src/myspell/.libs/libenchant_myspell.so | 
  c++filt | fgrep -c Hunspell::

okay here too…

./configure --enable-myspell=external:

checking for hunspell >= 1.1.9... yes
checking HUNSPELL_CFLAGS... -I/usr/include/hunspell
checking HUNSPELL_LIBS... -L/usr/lib -lhunspell-1.1
        Build Myspell/Hunspell backend: external

flame@enterprise enchant-1.3.0 % nm -D --defined-only 
  src/myspell/.libs/libenchant_myspell.so | 
  c++filt | fgrep -c Hunspell::
flame@enterprise enchant-1.3.0 % scanelf -n 
  src/myspell/.libs/libenchant_myspell.so | 
  fgrep -c libhunspell

Perfect! Now it’s time for the ebuild. First off, I copy over the enchant ebuild in my overlay, and the patch I’ve just made into files/

I add eutils and autotools eclasses to inherit, as I need epatch to apply the patch, and eautoreconf to re-run autotools. Then I add this to src_unpack before elibtoolize:

        epatch "${FILESDIR}/${P}-external-hunspell.patch"
        AT_M4DIR="ac-helpers" eautoreconf

Now it’s time to add an USE flag. As it is, the dependencies of enchant are a bit broken, as foser wanted to put a || dependency over aspell or ispell or hunspell. Up to now, hunspell was not requested at all, at the very best one had to install the myspell dictionaries.

As now hunspell is linked to, we do NOT want it to not be a non-deterministic dependency, otherwise is as good as automagic, so it has to become an USE flag. There’s no hunspell or myspell USE flag in the use.desc and use.local.desc files, which means that we are free to choose what we like. I think hunspell is the best option.

To avoid depending on spell and the other when we have hunspell enabled, we can just move the rest of the || dependency in !hunspell, this way:

        hunspell? ( >=app-text/hunspell-1.1.9 )
        !hunspell? ( || ( virtual/aspell-dict app-text/ispell ) )"

Now we have to enable/disable hunspell as requested. There’s no src_compile already, so I’m going to add one. --disable-dependency-tracking is just a nice option in general, there’s no reason to enable dependency tracking as ebuilds are one-time builds. In most cases it won’t make much difference anyway, I just always add it whenever available.

src_compile() {
                $(use_enable hunspell myspell external) 
                || die "econf failed"
        emake || die "emake failed"

The three parameters use_enable call states that the hunspell USE flag is related to --enable/--disable-myspell and that when we enable it we have to pass external as value, making the two alternatives --enable-myspell=external and --disable-myspell, justlike we need.

Okay, now it’s emerge time, and let’s see the:

flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 36 files, 19 non-files, 1536.433 KB
flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 33 files, 19 non-files, 567.91 KB
flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 37 files, 19 non-files, 719.313 KB
flame@enterprise profiles % qlop -tgH enchant
enchant: Fri Jan 18 14:25:30 2008: 1 minute, 19 seconds [Original]
enchant: Mon Feb  4 10:00:27 2008: 1 minute, 2 seconds  [USE=-hunspell]
enchant: Mon Feb  4 10:02:28 2008: 58 seconds           [USE=hunspell]

As you can see, the size is reduced quite a bit, and the build time also decreased even with the hassle of running autotools. And now hunspell can be shared betwen processes using enchant and OpenOffice, for instance.

The complete patch, and the ebuild, are available as usual on my overlay. I’ll attach them to the bug and open an upstream bug in a few minutes.

The hard return to ruby-hunspell and rust

As you can easily imagine from what I wrote the past few days, I’ve been busy trying to cleanup after myself in old projects that are near to abandoned. I wrote about my resolution to spend more time working starting new year, to save some money for getting my driving license and a car, and in the past days I cleaned up both the rbot bugzilla plugin (as well as rbot’s ebuild) and then nxhtml today, so it was quite obvious that I had also to take a look to long ignored projects like ruby-hunspell and rust (and of course rubytag++).

I started with ruby-hunspell as with that I can also fix the hunspell plugin for rbot (and thus put back in the only feature left over from ServoFlame). The first problem I wanted to tackle down was to remove the cmake dependency. As I said yesterday, I’ve started feeling the power in GNU make, and I also have enough reasons not to use cmake that if I could convert the build of the extensions (they are quite simple after all) to simple GNU make, I would do it gladly.

Indeed switching the build system to simple GNU make with some tricks around was not difficult at all, and the result is good enough to me. It’s not (yet) perfect, but it’s nicer. I also hope to generalise it a bit so that I can reuse it for rubytag++ too, and hide some of the dirty tricks I use.

Thankfully there is a good note about it, in the five releases between the previous time I worked on ruby-hunspell and today (1.1.4 then, 1.1.9 today), hunspell added support for pkg-config files, making the build system quite nicer. Also thanks to git improvements, making the tarball is quite easier. And thanks to the power of GNU make, instead of having a tarball.sh script, it’s now simply make tarball (although I will probably switch to make dist, I thought about this just now).

The problems weren’t laying too far though. First, I changed something on rust some time ago it seems, and now the ruby to C type function changed name, so I had to update the ruby-hunspell description script to suit that change. Then there is the problem that Hunspell now hides some of the functions being experimental (and by the way do they have any consideration for the object ABI? Dropping some functions on a preprocessor conditionals inside a C++ class isn’t the most sound of the ideas, I’m afraid…), so I had to comment those. The biggest problem came with the parsers extension, that used to provide bindings for the libparsers library installed by hunspell.

The libparsers library is not installed only in static form, and its headers are not installed at all. This is probably half intentional, as in they probably consider the libparsers library an internal library that other projects shouldn’t use, so they removed the header files, the problem is that they still install the library at this point, making its possible use a bit ambiguous. At any rate, for now I disabled the parsers extension, it wasn’t very much hunspell related anyway, so I will certainly prefer if they dropped it from being installed entirely. That extension was also the only one that had a testunit, I should write a testsuite for ruby-hunspell and the hunspell extension too, so that at least I have something to test with.

There is one big problem though, to release a new ruby-hunspell, which is a requirement for rbot-hunspell, I need to do a release of rust, too, but I don’t remember much of rust details, it has been almost an year since I last worked on it :( Additionally, my /tmp is noexec now, it wasn’t when I prepared the testsuite, so the tests fail as the shared object built in /tmp can’t be loaded in memory. I’ll have to test tomorrow if TMPDIR environment variable is respected, in which case I’d be using /var/tmp. I’ll also add a make dist target to rust so that I don’t need extra stuff to prepare the packages.

Finally, there is the problem of the git repositories: for some reason pushing to the remote repository accessed through this trick fails like there was nothing to push. Considering I now have my own (v)server, I’ll probably just move rust and ruby-hunspell back together with the other git repositories I export. This will also simplify thins when I’ll put gitarella back too.

Tomorrow will be dedicated to work for most of the time, but if I can squeeze some time for this I’ll try to address the issues, and I promise this time I’ll write more comments and documentation.

Rust is almost ready

Okay, so today I restarted eating almost normally, I probably got the flu that seems to have voyaged through Europe in the last days, listening to what a few people already told me. I wasn’t sure if I could sustain my usual workflow today though, because I was still missing force due to the two days I passed without eating at all, so I sent the day working half time on job stuff and half time on Rust. This latter one came almost to completion in the mean time.

if you forgot about it, Rust is my bindings generator for Ruby, evolution of the bindings generator I used for RubyTag++ and Ruby-Hunspell up to now, that allows you to generate a working Ruby extension binding a C++ library by just describing its interface. While the original bindings generator used a YAML description of the interface, his wasn’t as extensible as I needed, so this time it is using Ruby itself, kinda like ActiveRecord and rails. I’ll show you all the syntax tomorrow when I upload the code, as I’ll need to write some documentation for it anyway.

So what it is working in Rust and what will still require work? First of all, Rust is now up to the same level my previous generator was, which means it’s far from being complete as I want it to be (the target is to be able to write just as easily bindings for C libraries that use OOP interfaces, like Avahi PulseAudio or xine), but it is important to me that it reached this point because this way I won’t have to maintain both Rust and my other generator, and I can test its working state by using Ruby-Hunspell and RubyTag++ as regression tests until I finish the regression tests themselves (I only wrote two up to now).

I’ve now asked hosting on Rubyforge, if all goes well in the next few days I’ll put up a draft of a site on there, and then start pushing my changes to the GIT repository there (sshfusefs is quite slow, but it works nicely for what I need to do). I’ll need a logo as most of the Ruby projects have an appealing one; if anybody have an idea or a proposal, it is welcome, my graphics skills don’t exist at all).

Hopefully, it will be possible through Rust to have bindings for the libraries I named above without too much work. I still wonder if it makes sense to have them as separate projects (as Ruby-Hunspell currently is), or if it would be simpler to leave them all live under Rust itself; but for that there will be time to decide.

For tonight, I can feel happy, and work on a few more testcases. I’d like to be able to watch some Anime too, but this depends on a series of factors, like in which bed I’ll sleep, tonight (while I wasn’t feeling well I took possess of my mother’s bed as it is more stable than mine, and I had enough nausea without adding the mattress deforming every time I moved, and here I don’t have a power socket to connect the laptop to while I watch Anime).

Today is day of releases

As I’ve stated on my site.

First of all, I’ve released gitarella 0.003, after fixing a load of bugs and display issues that now should make gitarella way more solid when compared with the previous versions. I wanted to do this release because I’ll be probably working on SCGI support in the next days. SCGI seems to be an interesting technique, although the Simple part is really not related to its implementation (you actually need to implement more stuff on web-app side); it’s more interesting for the ability of restarting a single webapp without taking the whole webserver down while updating stuff.

Also, I finally released the hunspell rbot plugin that I already blogged about. Grab it while it’s new ;) Please take in mind that it requires ruby-hunspell, that in turn requires the Gentoo-patched hunspell (for now, the patch is merged upstream, or at least it should be at this point), and that it clashes with rbot’s own spell.rb plugin, that should then be deleted or disabled for this to work.

I don’t count on this to be useful to anyone, but it would be good if there was someone interested in it :)

Oh, I’ve also updated typo once again, now that the development seems to be actually going somewhere :) The theme for the admin interface is entirely changed… somehow, I preferred the old one. I hope the default theme will remain available still, as I don’t want to change it with another different default, but I’m not good enough to create my own theme.

And for who’s wondering: to let typo us system’s rails, you just need to remove vendor/rails and remove the definition from vendor/’s svn:externals property. At that point, typo won’t find its own copy of rails and will use system’s one.

Released ruby-hunspell

So today I wanted to spend those few minutes to release ruby-hunspell, prepare an ebuild for it (in my overlay for now) and write the hunspell plugin for rbot, and I’ve done the release.

If you want to try it out, you can grab it from the project page. There you can also find the link to the gitarella to browse the sources and the address to get the GIT repository.

The packaging is done with a custom tarball script that’s derived from the one I used for gitarella.. probably I should clean it up and just share it between the two projects. The build system is CMake as I said, that is incidentally also the one chosen for KDE 4. It has a lot of troubles upstream I’m afraid, and the syntax could have been better (starting from avoiding that damn all-case script this time, instead of making it worse), but it’s not that bad to use after all. I have to replace the FindRuby module with my own (that I wrote already for TagLib) as the one provided by original CMake is pure and simple garbage (hardcodes i386-linux path), but now it should be just fine. The good part of this all is that hunspell and ruby are available for Windows, so ruby-hunspell should work there, too.

The ebuild was easy to write, in all the defects of CMake, there’s not the one of being unable to use it decently from an ebuild. In contrast to qmake (that is an Hell of its own) and scons (that is stupid of its own and difficult to use in an ebuild), cmake requires just a generation of the makefiles, comparable to configure script, and then classic make/make install. It respects CC/CXX/LD variables and CFLAGS and LDFLAGS as-is without doing any edit. On that, it’s fantastic.

The ebuild as I said is in my overlay, currently marked ~amd64 and ~x86-fbsd. I’ll add it to portage soon, most likely, but first I want to release the hunspell.rb script for rbot. Right now it works just fine, but hardcodes the path of en_GB dictionary and affix files in myspell-en, I want to make it configurable before release.

Oh and of course… Grandi Azzurri!, even if I don’t like soccer at all, yesterday’s game was fun to watch, once again. It was quite a while that here in Italy soccer was more played like politics rather than a sport, and they deserved to win after playing again.

Writing ruby extensions: hunspell

So, after my RubyTag++ bindings work, today I wanted to try writing a new set of bindings.

This time it is for a really trivial reason, that is writing a better spell module for rbot, as the default one uses ispell, that kinda sucks. What I had in mind was hunspell, as that is what it’s being used by Openoffice now, and that is quite interesting indeed.

I first had to work on hunspell to fix a problem with PIC on AMD64 that ended up with a patch (currently specific to Gentoo) to make use of libtool and thus building a shared version of libhunspell. This way it also fixed a TEXTREL issue with OpenOffice on x86, and was a step forward for porting OpenOffice 2 to AMD64 (yuppie).

This time then, I decided to give hunspell a good try, and so I started working on an extension. Luckily now that I wrote the bindings generator for RubyTag++ I know well enough how to start writing an extension, so I just fired up emacs and started writing the code.

The problem came when I wanted to put an extconfig.rb script.. as all the checks assume C libraries and C includes, rather than C++ as Hunspell is.
I looked around but I couldn’t find anything useful for that, so I decided to go with cmake, once again. I had already the FindRuby mdoule written for RubyTag++, and I just needed to write the CMakeFile.txt file.

Now I do have everything ready, and I just need to find time to add some documentation and then I’ll make a tarball, and then I’ll write the rbot module I had in mind when I started.

For who wants to follow the development, I have put ruby-hunspell on gitweb.

Update (2017-04-22): The repository is gone, I don’t know if I ahve it in any backup, Rubyforge shut down and I think the only other copy was on Gitorious, which also shut down.

Please note: it needs the patched hunspell from Gentoo right now, until the patch is merged upstream.