:PostID: 89
:Title: ghc-7.6: weak symbols and ghci
:Keywords: gentoo, ghc, ghci, haskell, bug, stat, glibc
:Categories: news
Time to time internetz stumble upon a **ghci** bug
which is seen as inability to load **base** **haskell** package:
::
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... ghc: /usr/lib64/ghc-7.4.2/base-4.5.1.0/HSbase-4.5.1.0.o: unknown symbol `stat'
ghc: unable to load package `base'
But the bug was most popular across rare **gentoo** users.
An interesting correlation!
This post is about the root of this problem:
the gory implementation details of **ghci** dynamic loader down
to **libC** and even **ELF** symbols!
Not scared? Fasten your belts and Read On!
.. raw:: html
**GHC** (`this one `_) is both:
1. a compiler (**ghc** binary)
2. and **REPL** (**ghci** binary)
To put simplistic **ghc** allows you to create final binaries
out of haskell sources while **ghci** allows runtime loading
of haskell sources.
Typical session starts like that:
::
$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude>
**ghci**'s implementation allows loading arbitrary shared library:
::
$ ghci -lpcre
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (dynamic) /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/../../../../lib64/libpcre.so ... done
final link ... done
Prelude>
and even object file:
::
$ echo 'void foo(){}' > a.c &&
gcc -c a.c -o a.o &&
ghci a.o
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (static) a.o ... done
final link ... done
Prelude>
**ghci** libraries are basically the same object files built of many source files.
It took me a while reproduce the bug mentioned in the very start of the post.
First time I've heard of a bug was in December 2012 by nand but I had no idea
where it comes from.
First I though it was a problem of missing headers somewhere in **C** code due to **glibc**
upgrade, but no matter what combinations of **binutils**/**gcc**/**glibc** I tried
bug did not want to show up.
6 months after after some `bugs `_ got collected I've noticed
dreadful **CFLAGS=-Os** common amongst reporters which was a trigger.
Let's explore exported symbol difference of 2 files:
1. CFLAGS=-O2 **/usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o**
2. CFLAGS=-Os **/usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o**
::
$ nm --undefined-only /usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o > base-O2
$ nm --undefined-only /gentoo/chroots/amd64-unstable//usr/lib64/ghc-7.6.3/base-4.6.0.1/HSbase-4.6.0.1.o > base-Os
$ diff -U0 base-O2 base-Os
- U __fxstat
+ U fstat
- U __lxstat
+ U lstat
- U memset
- U __xstat
+ U stat
Do you see it?
**-Os** build has **stat** call while **-O2** has **__xstat**.
It was the right track.
Let's try simpler example:
::
cat >stat-test.c <<-EOF
#include
#include
#include
int f()
{
struct stat s;
return stat("/", &s);
}
EOF
::
$ gcc -c stat-test.c -Os -o stat-test-Os.o
$ gcc -c stat-test.c -O2 -o stat-test-O2.o
$ nm --undefined-only stat-test-O[2s].o
stat-test-O2.o:
U __xstat
stat-test-Os.o:
U stat
The symbols differ on different types of optimization.
Let's see if **ghci** treats them differently:
::
$ ghci stat-test-O2.o
...
Loading object (static) stat-test-Os.o ... done
final link ... done
$ ghci stat-test-Os.o
...
final link ... ghc: a.o: unknown symbol `stat'
linking extra libraries/objects failed
And it does! It means that **__xstat** comes from **libc.so.6**,
but **stat** codes from somewhere else.
After some investivation I've found it's definition:
::
$ nm --defined-only --extern-only /usr/lib/libc_nonshared.a
...
stat.oS:
00000000 T __i686.get_pc_thunk.bx
00000000 W stat
00000000 T __stat
...
We see here the file defining two symbols for us:
1. global weak **stat** (the one we really need)
2. global **__stat** (useless and potentially harmful as it might
lead to symbol collision)
Now we can build-up working **stat-test-Os.o** by linking
that weak **stat** symbol to our **ghci**:
::
$ ar x /usr/lib/libc_nonshared.a
$ mv stat.oS stat.o # ghci dislikes non-'*.o' extensions for object files
$ ghci a.o stat.o
Loading object (static) a.o ... done
Loading object (static) stat.o ... done
final link ... ghc: stat.o: unknown symbol `stat'
Almost works! Well, no. Nothing changed.
But the reason is missing support for loading weak symbols to **ghci**.
Whick is known as `bug 3333 `_.
I've pulled series of patches by **akio** and actualized it to **ghc-7.6.3**
(`the result `_)
After pulling that patch into **ghc** I've got previous example to load on **x86_64**:
::
$ ar x /usr/lib/libc_nonshared.a
$ mv stat.oS stat.o # ghci dislikes non-'*.o' extensions for object files
$ ghci a.o stat.o
Loading object (static) a.o ... done
Loading object (static) stat.o ... done
final link ... done
Ideally, I should load all those and only those **libc_nonshared.a** symbols into **ghci**
as a first library. I've decided to biggyback on **ghc-prim** module and stuff all those
nonshared symbols `there `_.
Perhaps, that bit of shell is the worst piece of code I have ever written.
It weakens all needed symbols, localizes all the rest, and merges the result into **ghc-prim**.
It also known to break on **i386** as I have hidden **GOT** and module base
required for **PIC** code.
**ghci**'s loader interface used not only for interactive use but also for
**TemplateHaskell** (where we have seen **vector** package failing to compile)
thus.
This post is a great example why using native system's loader is a good idea
proposed in `bug 4244 `_.
I don't know if it fixes all our cases but it will be way more clean than it is now.
If the workaround will show itself as too fragile I'll have to roll back
to force **CFLAGS=-O2** when building **ghc**.
**UPDATE:**
Now **x86** works as well: **libc_nonshared.a** contained **PIC** code thus
I've picked implementation from **libc.a** `directly `_.
Our workaround even passes test from `bug 7072 `_!