By MandyWaite on Jun 19, 2009
We've been working on building Ruby 1.9.1 (p129) packages on Solaris Nevada (the platform that OpenSolaris is mostly built from). We hit a couple of problems on the way, one was easy to fix the other not so.
The first issue was Sun Studio borking when it found a function declared with a return type of void but which actually contained a return statement. gcc actually thinks this ok which seems odd to me, maybe it's just our version of gcc though... When gcc hits this it says:
pty.c:425: warning: `return' with a value, in a function returning void
But with Sun Studio cc you get:
pty.c:425: void function cannot return a value cc: acomp failed for pty.c
The file causing the error is ext/pty/pty.c and line 425 has the offending line in function getDevice() (declared as 'static void'):
If you comment out this line it will build ok. There is a bug for this issue......
The other problem only occured on SPARC initially, we were using one of the public build systems that we have for the SFW consolidation (the part that delivers most of the F/OSS into Solaris Nevada and OpenSolaris) and that was running Solaris Nevada build 114, for x64 we were building on Solaris Nevada build 116 which was the latest version available at time of writing. x64 built fine.
The error seen on SPARC was:
Undefined first referenced symbol in file rb_cFalseClass enc/emacs_mule.o (symbol scope specifies local binding) rb_cTrueClass enc/emacs_mule.o (symbol scope specifies local binding) rb_cFixnum enc/emacs_mule.o (symbol scope specifies local binding) rb_cSymbol enc/emacs_mule.o (symbol scope specifies local binding) rb_cNilClass enc/emacs_mule.o (symbol scope specifies local binding) ld: fatal: symbol referencing errors. No output written to .ext/sparc-solaris2.11/enc/emacs_mule.so
when linking enc/emacs_mule.so. It's not a well documented error, but the implication was obvious, the required symbols had been found but they had been declared as local and so couldn't be used to build this shared object. Using nm on emac_mule.o and on libruby.so.1 seemed to indicate that the symbols were needed by emacs_mule.o (UNDEF) and were available in libruby.so.1. We asked the compiler experts and they thought that perhaps the symbols were declared as HIDDEN. We tried elfdump on both emacs_mule.o and libruby.so.1 and guess what, when analyzing libruby.so.1, elfdump threw up loads of errors of the type:
"bad symbol entry: <address> lies outside of containing section"
This suggested that the shared library was broken in some way.
We isolated the linker lines from the build for libruby.so.1 and ran them individually (after touching ruby.c). There were two lines, the first was the linker line which actually built the shared library. That ran ok and when we ran elfdump on the resultant library there were no errors. The second line was:
/usr/sfw/bin/gobjcopy -w -I "Init_\*" libruby.so.1
After running this manually we saw the same error when using elfdump on the resultant library.
At the same time as this we were running a build on a SPARC system that we'd had upgraded to Nevada build 116 and that completed OK. A check on the version of gobjcopy on the two systems showed that we had gobjcopy 2.15 on the build 114 system and 2.19 on the build 116 system. Further checking showed that gobjcopy was delivered into Solaris Nevada in SUNWbinutils and that had been updated in Nevada build 116. So the problem wasn't the fact we were building on SPARC but that we were building on different OS revs, the problem also exists on x64.
At the moment we haven't looked into what was going wrong when gobjcopy tried to make the Init_\* symbols local, but it was apparently corrupting the library.
At the moment this makes it tough to build Ruby 1.9 on OpenSolaris which is based on Nevada build 111b and we are looking at how best to get around this. Maybe make the packages available from the /webstack repository. In the meantime we'll file a bug against OpenSolaris and come up with a workaround.