A case study: enchant’s internal Hunspell copy

Okay Ryan, let me see if I can write something up to at least help understanding how to solve the problem (as to why it is a problem, well, that I think I explained already a couple of times 😉

I took enchant as that’s the bug you linked to, and it also seems an easy one.

First off, enchant is an abstraction library for spellchecking. It supports ispell, aspell, hspell (all three of them have a CLI interface, so theyw ork “at an arm’s length”) and MySpell (OpenOffice.org spellchecker). As it turns out, nobody really use original MySpell and rather use the Hunspell fork, even our OpenOffice.org.

As Enchant creates plugins (modules), they have to use PIC libraries; up to an year ago, more or less, Hunspell did not install any PIC compatible library, just static archives, unsuitable for shared linking, and especially unsuitable for building shared objects on PIC architectures like almost everything beside x86. This is probably the reason why an internal copy of it was used up to now.

It turns out nowadays Hunspell does install shared libraries, and even a pkg-config datafile. So it should be easy to use those instead. The first step is to give a way to choose whether to use system hunspell or not. The easiest way is usually to add a new --with-system-foo option during build. I sinerely prefer a different approach when possible: rather than having just --enable-myspell and --disable-myspell, provide a --enable-myspell=external option, that allows to use the system copy of hunspell.

So it’s just a matter of changing the current conditional not to think of the $build_myspell variable as a yes/no one, and add a new conditional:

-AM_CONDITIONAL(WITH_MYSPELL, test "x$build_myspell" = "xyes")
+AM_CONDITIONAL(WITH_SYSTEM_HUNSPELL, test "x$build_myspell" = "xexternal")
+AM_CONDITIONAL(WITH_MYSPELL, test "x$build_myspell" != "xno")

Good. By the way, seems like the build system already supposedly had some support for external myspell, but it’s not actuall used, so let’s remove the old cruft, just to avoid messing with obsolete codepaths, and rename MYSPELL_CFLAGS/MYSPELL_LIBS into HUNSPELL_CFLAGS/HUNSPELL_LIBS, I’ll show in a moment why:

-if test "x$with_system_myspell" != "xno"; then

Now of course it isn’t nice to ask for system hunspell if it’s not installed, let’s see if it is there. The easiest way is to use pkg-config. As I’m not sure since which version they fixed Hunspell to install shared libraries, I check what I have installed in my system:

flame@enterprise ~ % pkg-config --modversion hunspell

and then add a test for that:

+if test "x$build_myspell" = "xexternal"; then
+   PKG_CHECK_MODULES([HUNSPELL], [hunspell >= 1.1.9])

This shows why I wanted to rename MYSPELL_CFLAGS into HUNSPELL_CFLAGS: the PKG_CHECK_MODULES macro uses the first parameter both as the prefix of the variables and to output “Checking for FOO…” during configure run. I didn’t want users to see “Checking for MYSPELL…” while Hunspell was checked instead.

Now the changes to the configure are finished, it’s time to start with the changes to the Makefiles, in particular src/myspell/Makefile.am.

First off, let’s replace MYSPELL_CFLAGS and MYSPELL_LIBS:



-libenchant_myspell_la_LIBADD= $(MYSPELL_LIBS) $(ENCHANT_LIBS) $(top_builddir)/src/libenchant.la
+libenchant_myspell_la_LIBADD= $(HUNSPELL_LIBS) $(ENCHANT_LIBS) $(top_builddir)/src/libenchant.la

Then, let’s not build all the Hunspell sources when we’re using external Hunspell, to do so, we create a commodity variable hunspell_sources to list all the hunspell sources, but only if WITH_EXTERNAL_HUNSPELL is unset:

-libenchant_myspell_la_SOURCES =        
+hunspell_sources = 

and then use that variable in the list of sources for the module:

+libenchant_myspell_la_SOURCES =        
+       $(hunspell_sources)     

Nice, let’s try it then 🙂

./configure --disable-myspell:

        Build Myspell/Hunspell backend: no


./configure --enable-myspell:

        Build Myspell/Hunspell backend: yes

flame@enterprise enchant-1.3.0 % nm -D --defined-only 
  src/myspell/.libs/libenchant_myspell.so | 
  c++filt | fgrep -c Hunspell::

okay here too…

./configure --enable-myspell=external:

checking for hunspell >= 1.1.9... yes
checking HUNSPELL_CFLAGS... -I/usr/include/hunspell
checking HUNSPELL_LIBS... -L/usr/lib -lhunspell-1.1
        Build Myspell/Hunspell backend: external

flame@enterprise enchant-1.3.0 % nm -D --defined-only 
  src/myspell/.libs/libenchant_myspell.so | 
  c++filt | fgrep -c Hunspell::
flame@enterprise enchant-1.3.0 % scanelf -n 
  src/myspell/.libs/libenchant_myspell.so | 
  fgrep -c libhunspell

Perfect! Now it’s time for the ebuild. First off, I copy over the enchant ebuild in my overlay, and the patch I’ve just made into files/

I add eutils and autotools eclasses to inherit, as I need epatch to apply the patch, and eautoreconf to re-run autotools. Then I add this to src_unpack before elibtoolize:

        epatch "${FILESDIR}/${P}-external-hunspell.patch"
        AT_M4DIR="ac-helpers" eautoreconf

Now it’s time to add an USE flag. As it is, the dependencies of enchant are a bit broken, as foser wanted to put a || dependency over aspell or ispell or hunspell. Up to now, hunspell was not requested at all, at the very best one had to install the myspell dictionaries.

As now hunspell is linked to, we do NOT want it to not be a non-deterministic dependency, otherwise is as good as automagic, so it has to become an USE flag. There’s no hunspell or myspell USE flag in the use.desc and use.local.desc files, which means that we are free to choose what we like. I think hunspell is the best option.

To avoid depending on spell and the other when we have hunspell enabled, we can just move the rest of the || dependency in !hunspell, this way:

        hunspell? ( >=app-text/hunspell-1.1.9 )
        !hunspell? ( || ( virtual/aspell-dict app-text/ispell ) )"

Now we have to enable/disable hunspell as requested. There’s no src_compile already, so I’m going to add one. --disable-dependency-tracking is just a nice option in general, there’s no reason to enable dependency tracking as ebuilds are one-time builds. In most cases it won’t make much difference anyway, I just always add it whenever available.

src_compile() {
                $(use_enable hunspell myspell external) 
                || die "econf failed"
        emake || die "emake failed"

The three parameters use_enable call states that the hunspell USE flag is related to --enable/--disable-myspell and that when we enable it we have to pass external as value, making the two alternatives --enable-myspell=external and --disable-myspell, justlike we need.

Okay, now it’s emerge time, and let’s see the:

flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 36 files, 19 non-files, 1536.433 KB
flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 33 files, 19 non-files, 567.91 KB
flame@enterprise profiles % qsize enchant
app-text/enchant-1.3.0: 37 files, 19 non-files, 719.313 KB
flame@enterprise profiles % qlop -tgH enchant
enchant: Fri Jan 18 14:25:30 2008: 1 minute, 19 seconds [Original]
enchant: Mon Feb  4 10:00:27 2008: 1 minute, 2 seconds  [USE=-hunspell]
enchant: Mon Feb  4 10:02:28 2008: 58 seconds           [USE=hunspell]

As you can see, the size is reduced quite a bit, and the build time also decreased even with the hassle of running autotools. And now hunspell can be shared betwen processes using enchant and OpenOffice, for instance.

The complete patch, and the ebuild, are available as usual on my overlay. I’ll attach them to the bug and open an upstream bug in a few minutes.