Wednesday Sep 03, 2008

strsep() in libc

As of today, strsep() function lives in Nevada's libc (tracked by CR 4383867 and PSARC 2008/305). This constitutes another step in the quest for more feature-full (in terms of compatibility) libc in OpenSolaris. In binary form, the changes will be available in build 99. The documentation will be part of the string(3C) man page.

Here's a small example of how to use it:

#include <stdio.h>
#include <string.h>
#include <err.h>

int parse(const char \*str) {
        char \*p = NULL;
        char \*inputstring, \*origstr;
        int ret = 1;
        
        if (str == NULL)
                errx(1, "NULL string");

        /\*
         \* We have to remember original pointer because strsep()
         \* will change 'inputstr' pointer.
         \*/
        if ((origstr = inputstring = strdup(str)) == NULL)
                errx(1, "strdup() failed");

        printf("=== parsing '%s'\\n", inputstring);
        for ((p = strsep(&inputstring, ",")); p != NULL;
           (p = strsep(&inputstring, ","))) {
                if (p != NULL && \*p != '\\0')
                        printf("%s\\n", p);
                else if (p != NULL) {
                        warnx("syntax error");
                        ret = 0;
                        goto bad;
                }
        }
bad:
        printf("=== finished parsing\\n");
        free(origstr);
        return (ret);
}

int main(int argc, char \*argv[]) {
        if (argc != 2)
                errx(1, "usage: prog ");

        if (!parse(argv[1]))
                exit(1);

        return (0);
}

This example was actually used as a unit test (use e.g. "1,22,33,44" and "1,22,,44,33" as input string) and it also nicely illustrates important properties of strsep() behavior:

  • While searching for tokens, strsep() modifies the original string. This is shared property with strtok().
  • Unlike strtok(), strsep() is able to detect empty fields.

There is a function in Solaris' libc which can do token splitting and does not modify the original string - strcspn(). The other notable property of strsep() is that (unlike strtok()) it does not conform to ANSI-C. Time to draw a table:

 function(s)   ISO C90    modifies     detects
                           input     empty fields
-------------+----------+----------+--------------+
 strsep()        No          Yes         Yes
 strtok()        Yes         Yes         No
 strcspn()       Yes         No        Sort of

None of the above functions is bullet-proof. The bottom line is the user should decide which is the most suitable for given task and use it with its properties in mind.

Tuesday Aug 28, 2007

Getting code into libc

In my previous entry about BSD compatibility gap closure process I have promised to provide a guide on how to get new code into libc. I will use changes done via CR 6495220 to illustrate the process with examples.

Process related and technical changes which are usually needed:

  • get PSARC case done
  • File a CR to create a manual page according to the man page draft supplied with the PSARC case. You will probably need to go through the functions being added and assign them MT-Level according to attributes(5) man page (if this was not done prior to filing the PSARC case).
  • actually add the code into libc
    This includes moving/introducing files from the SCM point of view and doing necessary changes to the Makefiles.
    In terms of symbols, the functions need to be actually delivered twice. Once as underscored (strong) symbol and second as WEAK alias to the strong symbol. This allows libraries use their own private implementation of the functions. (This is because the weak symbol is silently overridden by the private symbol in runtime linker)
  • add entries to c_synonyms.h and synonyms.h
    synonyms.h is used in libc for symbol alias contruction (see above). c_synonyms.h provides access to underscored symbols for other (non-libc) libraries. This provides a way how to call the underscored symbols directly without risking namespace clashes/pollution.
    This step is actually needed to be used in conjunction with the previous step. nm(1) can be used to check this worked as expected:
    $ nm -x /usr/lib/libc.so.1 | grep '|_\\?err$'
    [5783] |0x00049c40|0x00000030|FUNC |GLOB |0  |13 |_err
    [6552] |0x00049c40|0x00000030|FUNC |WEAK |0  |13 |err
    
  • Do the necessary packaging changes
    If you're adding new header file change SUNWhea's prototype_\* files (most probably just prototype_com)
    If the file was previously installed into proto area during build it needs to be removed from the exception files (for i386 and sparc).
  • modify lib's mapfile
    This is needed for the symbols to become visible and versioned. Public symbols belong to the latest SUNW section. After you have compiled the changes you can check this via command similar to the following:
    pvs -dsv -N SUNW_1.23 /usr/lib/libc.so.1 \\
      | sed -n '1,/SUNW.\*:/p' | egrep '((v?errx?)|(v?warnx?));'
                  vwarnx;
                  ...
      
    If you're adding private (underscored) symbols do not forget to add them to the SUNWprivate section. This is usually the case because the strong symbols are accompanied by weak symbols. Weak symbols go to the global part of the most recent SUNW section and strong symbols go to global part of SUNWprivate section.
  • update libc's lint library
    If you are adding private symbols then add them as well. See the entries _vwarnfp et al. for example.
    After you're done it's time to run nightly with lint checks and fix the noise. (see below)
  • Add per-symbol filters
    If you are moving stuff from a library to libc you will probably want to preserve the existing interfaces. To accomplish this per-symbol filters can be added to the library you're moving from. So, if symbol foo is moved from libbar to libc then change the line in the global section of libbar's mapfile to look like this:
    foo = FUNCTION FILTER libc.so.1;
    
    This was done with the \*fp functions in libipsecutils' mapfile. The specialty in that case was that the \*fp functions were renamed to underscored variants while moving them via redefines in errfp.h.
  • Fix build/lint noise introduced by the changes
    There could be the following noises:
    • build noise
      Can be caused by symbol type clash (there is symbol of the same name defined in libc as FUNC and in $SRC/cmd as OBJT) which is not harmful because ld(1) will do due diligence and prefer the non-libc symbol. This can be fixed by renaming the local symbol. There could also be #define clash caused by inclusion of c_synonyms.h. Fixed via renaming as well.
    • lint noise
      In the 3rd pass of the lint checks an inconsistency in function declarations can be found such as this:
      /builds/.../usr/include/err.h", line 43: warning: function argument declared
      inconsistently: warn(arg 1) in utils.c(62) char \* and llib-lc:err.h(43) const char \*
      (E_INCONS_ARG_DECL2)
      
      The problem with this output is that there are cca 23 files named utils.c in ONNV. CR 6585621 is waiting someone to provide remedy for that via adding -F flag to LINTFLAGS in $SRC/lib and $SRC/cmd.
      After the right file(s) are found the fix is usually renaming again. Where the renaming is not possible -erroff=E_FUNC_DECL_VAR_ARG2 can be passed to lint(1).
  • Make sure there are not duplicate symbols in libc after the changes
    This is necessary because it might confuse debugging tools (mdb, dtrace). For err/warn stuff there was one such occurence:
    [6583] | 301316| 37|FUNC |GLOB |0 |13 |_warnx
    [1925] | 320000| 96|FUNC |LOCL |0 |13 |_warnx
    
    This can be usually solved by renaming the local variant.
  • Test thoroughly
    • test with different compilers
      SunStudio does different things than gcc so it is good idea to test the code with both.
    • Try to compile different consolidations (e.g. Companion CD, SFW) on top of the changes. For err/warn project a bug was filed to get RPM build fixed.
    • Test if the WEAK symbols actually work
    • Test the programs in ONNV affected by the changes
      e.g. the programs which needed to be modified because of the build/lint noise.
  • Produce annotated putback list explaining the changes
    This is handy for a CRT advocate and saves time.
  • If the change requires some sort of action from fellow gatelings, send a heads-up, e.g. like heads-up for err/warn.
  • If you are actually adding code to libc (this includes moving code from other libraries to libc) send an e-mail similar to the heads-up e-mail to opensolaris-code mailing list, e.g. like this message about err/warn.
About

blog about security and various tools in Solaris

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today