Monday Mar 09, 2009

Python tips - Handle text file

There are two text files, each are 10 million lines, the size of the text file at about 100M. Now need to know that the two documents there is cross-check the number of lines, in other words, we want to know the the number of lines simultaneously in the two documents exist. Each text file here is unique, so they do not have any duplicate rows. Python set could do this very easy and higher efficient than shell, awk.
#!/usr/bin/python
a = set(open(”data.uniq.1″))
b = set(open(”date.uniq.2″))
print len(a; b)
Here I find a blog in Chinese also description this tips

Thursday Feb 26, 2009

FW:C中的宏处理

好久没发C的文章了,偶然的机会看到一篇有意思的文章,主要讲解了函数式的宏,转载了过来。原文:转自这里。这位博主还有不少对C语言有趣深入的研究文章。

p>宏的预处理这个坑看起来浅,其实还蛮深的。它也是最容易被忽视的几个地方之一。这里斗胆来谈谈,说实话,在写这句话时我也不清楚这坑究竟有多深,没关系,我们摸着石头过河,一起看看到最后这坑能有多深!同时这篇文章也将会是《C语言编程艺术》中的一部分。

从一个相对简单的例子说起吧。

#define f(a,b) a##b
      #define g(a)   #a
      #define h(a) g(a)
      h(f(1,2))
      g(f(1,2))

相信不少人都见过这个例子。我们不妨再仔细分析一下它的解析过程。应该是这样的:

对于g(f(1,2)),预处理器看到的先是g,然后是(,说明这是一个函数式的宏,好,然后替换后面的实参f(1, 2),得到#f(1,2)(注:直接这么写非法,这里只是为了表示方便而已),因为它前面有个#,所以下一步是不会替换f的参数的!所以进一步得到"f(1, 2)",解析结束。而对于h(f(1,2)),预处理器看到的先是h,然后(,对其参数f(1, 2)进行替换,得到g(f(1,2)),注意这里的下一步是,预处理器就继续往后走,处理刚得到的f(1,2),而不是回过头去再处理g!得到12,到了这里我们的得到的是一个:g(12),然后重新扫描整个宏,替换g,最后得到"12"。

标准第6.10.3.1节中对此描述的还比较清晰,它这样写道:

After the arguments for the invocation of a function-like macro have been
identified, argument substitution takes place. A parameter in the replacement
list, unless preceded by a # or ## preprocessing token or followed by a ##
preprocessing token (see below), is replaced by the corresponding argument
after all macros contained therein have been expanded.

注意加粗的部分。到了在这里,我们可以简单总结一下函数式宏的基本替换流程:

首先要识别出这是一个函数式宏,通过什么?通过调用中出现的(,没错是左括号!到这里后下一步是参数替换,就是根据该宏的定义把实参全部替换进去,然后接着向后走,除非是遇到了#和##(正如上面例子中的g),把后面替换后的东西中如果还有已知宏的话,进行替换或者同样的展开,直到解析到末尾:所有的参数都已经替换完(或者#或##已经处理完);最后,预处理器还会对整个宏再进行一次扫描,因为前一轮替换中有可能在前面替换出一些新的东西来(比如上面例子中的h)。

这里咋看之下没什么问题,其实问题很多!为什么?因为宏替换不仅允许发生在“调用”宏的时候,而且还发生在它定义时!

问题1:宏的名字本身会被替换吗?

这个问题也可以这样问:宏允许被重新定义吗?不允许,但是允许相同的重新定义。标准这样写道:

An identifier currently defined as an object-like macro shall not be
redefined by another #define preprocessing directive unless the second definition
is an object-like macro definition and the two replacement lists are identical.
Likewise, an identifier currently defined as a function-like macro shall not be
redefined by another #define preprocessing directive unless the second definition
is a function-like macro definition that has the same number and spelling of
parameters, and the two replacement lists are identical.

问题2:宏的参数(形参)会被替换吗?

先举个例子说明这个问题:

#define foo 1
#define bar(foo) foo + 2
bar(a)

我们是得到a+2还是1+2?a+2!因为形参是不会被替换掉的,你想想啊,如果形参都被替换掉了这个宏就没什么作用了!那实参呢?实参会的,因为实参的替换发生在传递这个参数之前:

Before being substituted, each argument’s preprocessing tokens are
completely macro replaced as if they formed the rest of the preprocessing file

问题3:宏中参数之外的符号会被替换吗?

会,上面提到过“after all macros contained therein have been expanded”,也就是说这个发生在参数替换之前。但是,这里有个非常诡异的问题:如果被替换出来的符号正好和形参一样怎么办?就像下面这个例子:

#define foo bar
#define baz(bar) bar + foo
baz(1)

我们会得到1+1还是1+bar?后者,因为替换出来的那个bar是不会计算在形参之内的,虽然标准并没有明确这一点。想想吧,如果是的话那个宏的定义也会被破坏了!

另一个例子:

#define foo bar
#define mac(x) x(foo)
mac(foo)

根据上面所说,我们首先得到foo(foo),然后foo再被替换成bar,最后得到bar(bar)。

好了,到这里我们终于可以看一下更复杂的例子了:

#define m !(m)+n
#define n(n) n(m)
m(m)

这个例子相当复杂,是我见过的最复杂的一个宏。:-) 刚看到我们可能都有点蒙,没关系,咱们一步一步地来。

第一步很好走,第一个m直接被替换,得到:!(m)+n(m),别犹豫,接着往下走,替换最后一个m,得到:!(m)+n(!(m)+n),这时这一遍扫描已经完成。到这里我们得提出另外一个东西才能继续,你可能知道,递归。标准对此的描述是:

If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file’s preprocessing
tokens), it is not replaced.

在上次替换中,被替换的是m,所以m在这里的再次出现将不会被替换,所以下一步是会替换第一个n,得到:!(m)+!(m)+n(m),注意这里又替换出一个新的m来,这个m会被替换,因为这次扫描���没完成!下一步得到:!(m)+!(m)+n(!(m)+n),第二遍扫描结束,全部的替换完成。

综上,我们可以总结出两条重要的宏替换规则:1)再复杂的宏也只是被扫描两遍,而且递归是不允许发生的,即使在第2遍时;2)一个替换完成后如果还没扫描完,要从被替换的那里继续。

(全文完)

Friday Oct 17, 2008

how to compile with gtk on OpenSolaris supplement - install require package

My previous blog  how to compile with gtk on OpenSolaris  mentioned how to compiled gtk application on OpenSolaris. But, actually, if you install OpenSolaris and Sunstudio only, you still can not compile out gtk application, three packages is required for compile/develop gtk application.:

SUNWgnome-common-devel
SUNWxwinc
SUNWxorg-headers

So you need install these three packages by Package Manager or cmd line: pkg install package_name

BTW, There is a discussion about dependencies of these three packages FYI. http://opensolaris.org/jive/thread.jspa?messageID=259469
. also post bugID for it http://defect.opensolaris.org/bz/show_bug.cgi?id=2561

之前的一篇how to compile with gtk on OpenSolaris介绍如何在Opensolaris编译gtk的程序,但是如果你只是装了Opensolaris和Sunstudio的化,仍然无法编译gtk的程序,因为以下三个包也是必须的,可以使用包管理程序或者命令行装上以下三个包即可。
SUNWgnome-common-devel
SUNWxwinc
SUNWxorg-headers

Monday Oct 13, 2008

Extensions to the C Language - GCC and Sunstudio

My previous blog "C99 new feature: 指定初始化 ( designated initializer )" mentioned one of C99 new feature. Here are more extensions C language, for GNU gcc and Sunstudio.

GNU C:
GNU C provides several language features not found in ISO standard C. (The -pedantic option directs GCC to print a warning message if any of these features is used.) To test for the availability of these features in conditional compilation, check for a predefined macro __GNUC__, which is always defined under GCC.
These extensions are available in C and Objective-C. Most of them are also available in C++. See Extensions to the C++ Language, for extensions that apply only to C++.
Some features that are in ISO C99 but not C89 or C++ are also, as extensions, accepted by GCC in C89 mode and in C++.
See more details click here

Sunstudio C 12
New Language Extensions in the Sun Studio 12 C Compiler
This article gives an overview of the following C-language extensions (part of the GNU C-implementation) introduced in the Sun Studio 12 C compiler. Although these extensions are not part of the latest ISO C99 standard, they are supported by the popular gcc compilers.

  • The typeof keyword which allows references to an arbitrary type
  • Statement expressions that make it possible to specify declarations and statements in expressions
  • Block-scope label names

Saturday Oct 11, 2008

C99 new feature: 指定初始化 ( designated initializer )

参考 http://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
标准C89需要初始化语句的元素以固定的顺序出现,和被初始化的数组或结构体中的元素顺序一样。在ISO C99中,你可以按任何顺序给出这些元素,指明它们对应的数组的下标或结构体的成员名,并且GNU C也把这作为C89模式下的一个扩展。这个扩展没有在GNU C++中实现。其实是gcc有了这个扩展,然后被ISO承认,并且写入C99的标准,但是语法做了一点改动。所以,如果使用这个特性写程序,还是按照C99的语法写才好,这样Sunstudio和gcc都可以编译通过。

GNU gcc对标准的扩展:

标准 C 要求数组或结构变量的初使化值必须以固定的顺序出现,在 GNU C 中,通
过指定索引或结构域名,允许初始化值以任意顺序出现。指定数组索引的方法是在
初始化值前写 '[INDEX] =',要指定一个范围使用 '[FIRST ... LAST] =' 的形式,
例如:

arch/i386/kernel/irq.c
static unsigned long irq_affinity [NR_IRQS] = { [0 ... NR_IRQS-1]
= ~0UL };

将数组的所有元素初使化为 ~0UL,这可以看做是一种简写形式。

要指定结构元素,在元素值前写 'FIELDNAME:',例如:

++++ fs/ext2/file.c
41: struct file_operations ext2_file_operations = {
42:         llseek:         generic_file_llseek,
43:         read:           generic_file_read,
44:         write:          generic_file_write,
45:         ioctl:          ext2_ioctl,
46:         mmap:           generic_file_mmap,
47:         open:           generic_file_open,
48:         release:        ext2_release_file,
49:         fsync:          ext2_sync_file,
50 };

将结构 ext2_file_operations 的元素 llseek 初始化为 generic_file_llseek,
元素 read 初始化为 genenric_file_read,依次类推。我觉得这是 GNU C 扩展中
最好的特性之一,当结构的定义变化以至元素的偏移改变时,这种初始化方法仍然
保证已知元素的正确性。对于未出现在初始化中的元素,其初值为 0。


C99的扩展

为了指定一个数组下标,在元素值的前面写上“[index] =”。比如:
     int a[6] = { [4] = 29, [2] = 15 };

相当于:
     int a[6] = { 0, 0, 15, 0, 29, 0 };

下标值必须是常量表达式,即使被初始化的数组是自动的。

一个可替代这的语法是在元素值前面写上“.[index]”,没有“=”,但从GCC 2.5开始就不再被使用,但GCC仍然接受。 为了把一系列的元素初始化为相同的值,写为“[first ... last] = value”。这是一个GNU扩展。比如:
     int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };

如果其中的值有副作用,这个副作用将只发生一次,而不是范围内的每次初始化一次。
注意,数组的长度是指定的最大值加一。
在结构体的初始化语句中,在元素值的前面用“.fieldname = ”指定要初始化的成员名。例如,给定下面的结构体,
     struct point { int x, y; };

和下面的初始化,
     struct point p = { .y = yvalue, .x = xvalue };

等价于:
     struct point p = { xvalue, yvalue };

另一有相同含义的语法是“.fieldname:”,不过从GCC 2.5开始废除了,就像这里所示:
     struct point p = { y: yvalue, x: xvalue };

“[index]”或“.fieldname”就是指示符。在初始化共同体时,你也可以使用一个指示符(或不再使用的冒号语法),来指定共同体的哪个元素应该使用。比如:
     union foo { int i; double d; }; union foo f = { .d = 4 };

将会使用第二个元素把4转换成一个double类型来在共同体存放。相反,把4转换成union foo类型将会把它作为整数i存入共同体,既然它是一个整数。(参考5.24节向共同体类型转换。)
你可以把这种命名元素的技术和连续元素的普通C初始化结合起来。每个没有指示符的初始化元素应用于数组或结构体中的下一个连续的元素。比如,
     int a[6] = { [1] = v1, v2, [4] = v4 };

等价于

     int a[6] = { 0, v1, v2, 0, v4, 0 };

当下标是字符或者属于enum类型时,标识数组初始化语句的元素特别有用。例如:
int whitespace[256] = { [' '] = 1, ['\\t'] = 1, ['\\h'] = 1, ['\\f'] = 1, ['\\n'] = 1, ['\\r'] = 1 };

你也可以在“=”前面写上一系列的“.fieldname”和“[index]”指示符来指定一个要初始化的嵌套的子对象;这个列表是相对于和最近的花括号对一致的子对象。比如,用上面的struct point声明:

     struct point ptarray[10] = { [2].y = yv2, [2].x = xv2, [0].x = xv0 };

如同一个成员被初始化多次,它将从最后一次初始化中取值。如果任何这样的覆盖初始化有副作用,副作用发生与否是非指定的。目前,gcc会舍弃它们并产生一个警告。

See also:


http://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html

http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=/com.ibm.xlcpp8a.doc/language/ref/designators.htm

how to compile with gtk on OpenSolaris

For people who use gtk+ on OpenSolaris first time. When you compile the application which used gtk+ library, the necessary compile and link options must be used. But for gtk+, the options are too much to list them all. "pkg-config" is useful to help you compile.
This is a simplest gtk application example:

/\*
\* base.c
\*/
#include <gtk/gtk.h>

int main( int argc, char \*argv[] )
{
GtkWidget \*window;
gtk_init (&argc, &argv);
window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
gtk_widget_show (window);
gtk_main ();
return(0);
}

compile as following step: 1>cc -c `pkg-config --cflags gtk+-2.0` base.c 2>cc -o base `pkg-config --libs gtk+-2.0` base.o

Wednesday Feb 27, 2008

dbx点滴-(2 )- 所有可以设置的 dbx 环境变量

(接上片)所有可以设置的 dbx 环境变量:

dbx 环境变量                    变量功能说明
                            如果设置为 on, dbx 将检查数组边界。
array_bounds_check on|off
                            默认值:on。
                            允许给 dbx 指定由自定义类加载器加载的 Java 类文件
CLASSPATHX
                            路径
                            控制 dbx 是否使用 pathmap 设置来定位 “不匹配”核
core_lo_pathmap on|off
                            心文件的正确库。默认值:off。
                            SPARC 平台:为 SPARC V8、V9 或具有可视化指令集的
disassembler_version
                            V9 设置 dbx 内建反汇编程序的版本。默认值是
autodetect|v8|v9|v9vis
                            autodetect,可根据运行 a.out 的机器类型动态地设
                            置模式。
                            IA 平台:有效选项为 autodetect。
                            控制 fix 期间的编译行打印。默认值:off
fix_verbose on|off
                            当跟随子进程时,继承或不继承断点。默认值:off
follow_fork_inherit on|off
                            确定派生之后应跟随哪个进程;即,当前进程何时执行
follow_fork_mode
                            fork、 vfork 或 fork1。如果设置为 parent,则进
parent|child|both|ask
                            程跟随父进程。如果设置为 child,则跟随子进程。如
                            果设置为 both,则进程跟随子进程,但父进程保持活动
                            状态。如果设置为 ask,当检测到派生时,将询问应跟
                            随哪个进程。默认值:parent。
                            在检测到派生后,将 follow_fork_mode 设置为 ask,
follow_fork_mode_inner
                            并选择了 stop 时适用。设置此变量后,无需使用 cont
unset|
                            -follow。
parent|child|both
                            如果设置为 autodetect, dbx 将根据文件的语言自动
input_case_sensitive
                            选择区分大小写:Fortran 文件为 false,否则为 true。
autodetect|
                            如果为 true,变量和函数名区分大小写;否则大小写无
true|false
                            实际意义。
                            默认值:autodetect。
                            指定 dbx 查找 Java 源文件的目录。
JAVASRCPATH
                            存储当前 dbx 模式。它可具有以下设置:java、jni 或
jdbx_mode java| jni| native
                            native。

                              jvm_invocation 环境变量允许自定义 JVMTM 软件的
jvm_invocation
                              启动方式。   (术语“Java 虚拟机”和“JVM”表示 JavaTM
                              平台的虚拟机。     )有关详细信息,请参见第 221 页的“自
                              定义 JVM 软件的启动”    。
                              控制用于分析和计算表达式的语言。
language_mode
autodetect|main|c|            • autodetect 将表达式语言设置为当前文件的语言。
c++|fortran|fortran90           用于调试使用混合语言的程序 (默认)           。
                              • main 将表达式语言设置为程序中主例程的语言。用
                                于调试同类程序。
                              • c、 c++、 c++、 fortran 或 fortran90 将表达式
                                语言设置为选定语言。
                              如果启用, dbx 将限制资源的使用,并可使用 300 个以
mt_scalable on|off
                              上的 LWP 调试进程。下方速度将明显减慢。默认值:
                              off。
                              每次调用后,自动调用 fflush()。默认值:on
output_auto_flush on|off
                              打印整型常量的默认基数。默认值:automatic (指针
output_base
                              是十六进制字符,而其他都是十进制)           。
8|10|16|automatic
                              在打印类成员的值和声明时,用于将类名作为类成员的
output_class_prefix on |
                              前缀。如果设置为 on,便给类成员添加前缀。默认值:
off
                              on。
                              如果设置为 on,打印、显示和检查的默认值是 -d。默
output_dynamic_type on|off
                              认值:off。
                              如果设置为 on,打印、显示和检查的默认值是 -r。默
output_inherited_members
                              认值:off。
on|off
output_list_size num          控制 list 命令打印的默认行数。默认值: 10.
output_log_file_name filename 命令日志文件的名称。
                              默认值:/tmp/dbx.log.uniqueID
                              为 char \*s 设置打印字符的 number。默认值: 512.
output_max_string_length
number
                              将 -p 设置为打印、   显示和检查的默认值。       默认值 off。
                                                                  :
output_pretty_print on|off
                              显示文件的短路径名。默认值:on。
output_short_file_name
on|off
                              对于 C++,如果设置为 on,则启用自动函数重载方案。
overload_function on|off
                              默认值:on。
                              对于 C++,如果设置为 on,则启用自动运算符重载方
overload_operator on|off
                              案。默认值:on。
                              如果设置为 on,当弹出一个帧时,自动为本地调用适当
pop_auto_destruct on|off
                              的析构函数。默认值:on。
                         如果设置为 on,   且已连接其他工具,      将阻止 dbx 连接到
proc_exclusive_attach
                         进程。 Warning: 请注意,如果多个工具连接到某个进程
on|off
                         并试图对其进行控制,则会出现混乱。默认值:on。
                         将错误记录到 rtc_error_log_file_name 并继续。
rtc_auto_continue on|off
                         默认值:off。
                         如果设置为 on,   则只报告一次指定位置的 RTC 错误。       默
rtc_auto_suppress on|off
                         认值:off。
                         在显式或通过 check -all 打开内存使用检查时使用。
rtc_biu_at_exit
                         如果值为 on,在退出程序时会生成一个非冗余内存使用
on|off|verbose
                         (使用的块)报告。如果值为 verbose,则会在程序退
                         出时生成一个冗余内存使用报告。值为 off 时将不产生
                         任何输出。默认值:on。
rtc_error_limit number   要报告的 RTC 错误数目。默认值:1000.
                         记录 RTC 错误的文件名 (如果设置了
rtc_error_log_file_name
filename                                    。默认值:
                         rtc_auto_continue)
                         /tmp/dbx.errlog.uniqueID
                         如果设置为 on,栈跟踪将显示与 RTC 内部机制相对应
rtc_error_stack on|off
                         的帧。默认值:off。
                         如果设置为 on,则启用从调试程序执行的子进程的运行
rtc_inherit on|off
                         时检查,并导致 LD_PRELOAD 环境变量被继承。默认
                         值:off。
                         在内存泄露检查为 on 时使用。如果值为 on,在退出程
rtc_mel_at_exit
                         序 时 将 生 成 一 个 非 冗 余 内 存 泄 露 报 告。如 果 值 为
on|off|verbose
                         verbose,则会在程序退出时生成一个冗余内存泄漏报
                         告。值为 off 时将不产生任何输出。默认值:on。
                         如果在没有活动程序时设置为 on,step、next、stepi
run_autostart on|off
                         和 nexti 将隐式运行程序,并在语言相关的 main 例程
                         处停止。  如果设置为 on,   需要时可通过 cont 开始 run。
                         默认值:off。
                         控制是否将用户程序的输入 / 输出重定向至 dbx 的
run_io stdio|pty
                         stdio 或特定 pty。 pty 由 run_pty 提供。默认值:
                         stdio。
run_pty ptyname          当 run_io 设置为 pty 时,  设置 pty 名称以供使用。  Pty
                         供图形用户界面包装器使用。
                         如果设置为 on,则不会加载任何符号信息。符号信息可
run_quick on|off
                         使用 prog -readsysms 按需加载。直到 dbx 的行为
                         如同所调试的程序被剥离。默认值:off。
                           dbx 与被调试程序之间的多路复用 tty 设置、进程组和
run_savetty on | off
                           键盘设置(如果命令行中使用了 -kbd)          。用于调试编辑
                           器和 shell。  如果 dbx 获取了 SIGTTIN 或 SIGTTOU,并
                           弹回到 shell,则将其设置为 on。将其设置为 off 可稍
                           稍加快速度。如果 dbx 连接到被调试对象,或正在 Sun
                           Studio IDE 中运行,则该设置无关。默认值:on。
                           如果设置为 on,     当程序运行时,setpgrp(2) 将在派生
run_setpgrp on | off
                           后立即执行。默认值:off。
                           如果设置为 on,枚举器将被置于全局范围,而不是文件
scope_global_enums on |off
                           范围。在处理调试信息前设置 (~/.dbxrc)。默认值:
                           off。
                           如果设置为 on,则在当前范围之外查找文件静态符号。
scope_look_aside on | off
                           默认值:on。
                           dbx 记录所有命令及其输出的文件名。输出将被附加至
session_log_file_name
filename                   文件。默认值:"" (无会话记录)        。
                           如果设置为 on,当被调试程序在未使用 -g 编译的函数
stack_find_source on | off
                           中停止时, dbx 将尝试查找并自动激活栈的第一帧。
                           默认值:on。
stack_max_size number      设置 where 命令的大小默认值。默认值: 100.
                           控制 where 中参数和行信息的打印。默认值:on。
stack_verbose on | off
                           如果设置为 stop,则在单步执行时 dbx 在 longjmp()、
step_abflow stop | ignore
                           siglongjmp() 和 throw 语句中停止。如果设置为
                           ignore,则 dbx 不检测 longjmp() 和 siglongjmp()
                           的异常控制流更改。
                           如果设置为 on,在使用 step 和 next 命令单步执行代
step_events on |off
                           码时允许断点。默认值:off。
                           控 制 源 代 码 行 单 步 执 行 的 粒 度。如 果 设 置 为
step_granularity statement
| line                     statement,则以下代码:
                           a() ; b() ;
                           执行两个 next 命令。如果设置为 line,将由单个
                           next 命令执行代码。在处理多行宏时,行的粒度是非常
                           有用的。默认值:statement。
                           设置版本级别,级别以下的启动信息不打印。默认值:
suppress_startup_message
number                     3.01.
                           如果设置为 on,则对于每个 include 文件,只读取一
symbol_info_compression
                           次调试信息。默认值:on。
on|off
trace_speed number         设置跟踪执行的速度。其值是步骤之间暂停的秒数。
                           默认值: 0.50.

 

dbx点滴-(1)

断断续续的看过dbx手册的一些章节,虽然大部分的内容不常用到,但是对某些特定情形的调试还是相当有用。本博的目的就是把一些个人认为有用的东西摘抄出来,方便查询。

1>

要消除库问题并用 dbx 调试 “不匹配”的核心文件,您可以执行以下操作:
1. 将 dbx 环境变量 core_lo_pathmap 设置为 on。
2. 使用 pathmap 命令告知 dbx 核心文件的正确库的位置。
3. 使用 debug 命令加载程序和核心文件。
例如,假定核心主机的根分区已通过 NFS 导出,并且可以通过 dbx 主机上的
/net/core-host/ 访问,应使用下面的命令加载 prog 程序和 prog.core 核心文件
来进行调试:
 (dbx) dbxenv core_lo_pathmap on
 (dbx) pathmap /usr /net/core-host/usr
 (dbx) pathmap /appstuff /net/core-host/appstuff
 (dbx) debug prog prog.core
如果没有导出核心主机的根分区,则必须手动复制这些库。不需要重新创建符号链接。
(例如,  您不必建立从 libc.so 到 libc.so.1 的链接,       只要确保 libc.so.1 可用。) 


注意事项
调试不匹配的核心文件时应注意:
■ pathmap 命令不能识别 “/”路径映射,因此不能使用以下命令:
  pathmap / /net/core-host
■  pathmap 命令的单参数模式不能与加载对象路径名同时使用,因此请使用二元模式
  from-path to-path。
 

■ 如果 dbx 主机使用的 Solaris 操作环境版本与核心主机相同或更新,那么调试核心文件时效果可能会更好,虽然这并不总是必要的。可能需要的系统库是:
   ■ 对于运行时链接程序: 
        /usr/lib/ld.so.1
        /usr/lib/librtld_db.so.1
        /usr/lib/64/ld.so.1
        /usr/lib/64/librtld_db.so.1
   ■ 对于线程库,取决于您所使用的 libthread 执行: 
        /usr/lib/libthread_db.so.1
        /usr/lib/64/libthread_db.so.1
        /usr/lib/lwp/libthread_db.so.1
        /usr/lib/lwp/64/libthread_db.so.1
   /usr/lib/lwp 文件仅适用于在 Solaris 8 操作环境中运行 dbx 的情况,  并且仅在您
   使用交替 libthread 库时适用。
   如果 dbx 在支持 64 位的 Solaris OS 版本上运行,则需要 xxx_db.so 库的 64 位版
   本,因为这些系统库是作为 dbx 的一部分而不是目标程序的一部分装入和使用的。
   ld.so.1 库是核心文件映像的一部分,与 libc.so 或其他任何库一样,因此需要与
   创建该核心文件的程序相匹配的 32 位 ld.so.1 库或 64 位 ld.so.1 库。
■ 如果正在查看来自某个线程程序的核心文件,并且 where 命令未显示栈,请尝试使用
   lwp 命令。例如:
  (dbx) where
  current thread: t@0
  [1] 0x0(), at 0xffffffff
  (dbx) lwps
  o>l@1 signal SIGSEGV in _sigfillset()
  (dbx) lwp l@1
  (dbx) where
  =>[1] _sigfillset(), line 2 in "lo.c"
     [2] _liblwp_init(0xff36291c, 0xff2f9740, ...
     [3] _init(0x0, 0xff3e2658, 0x1, ...
  ...
   缺少线程栈表明 thread_db.so.1 有问题,因此,需要尝试从核心主机中复制适当
   的 libthread_db.so.1 库。
 

2>
启动加载配置文件的顺序:
dbx  -s可以指定加载的初始化配置文件。
寻找.dbxrc 的顺序:
/installation_directory/lib ->
当前目录./.dbxrc-> $HOME: $HOME/.dbxrc
 在 Solaris 平台上,默认 installation_directory 为
/opt/SUNWspro ;在 Linux 平台上,默认 installation_directory 为
/opt/sun/sun/sunstudio10u1

3>
创建 .dbxrc 文件:
(dbx) help .dbxrc>$HOME/.dbxrc

Tuesday Sep 18, 2007

How To Set Env For Using GNU Gcc In Solaris (English&中文)

How to using gcc to compile in Solaris:

Some GNU software must be build by gcc, and for most of  Solaris developer, default compiler is SunStudio, so,for using gcc in Solaris, you should set some env variable.

export CPP="/usr/sfw/bin/gcc -E"

export CC=/usr/sfw/bin/gcc

export CXX=/usr/sfw/bin/g++


在Solaris中编译gnu的软件,有时候必须使用gcc.对于Solaris上的开发者来说,通常默认使用的是SunStutio,所以直接编译这些GNU软件可能行不通,需要进行一些设置才能使用gcc来编译。

首先介绍一些常用的内置变量:

CC:用来指定c编译器

CPP:c编译器预编译

CXX:用来指定cxx编译器

PKG_CONFIG_PATH:用来指定pkg-config用到的pc文件的路径,默认是/usr/lib/pkgconfig,pc文件是文本文件,扩展名是.pc,里面定义开发包的安装路径,Libs参数和Cflags参数等等。

因为Solaris通常将GNU的编译器已经安装到/usr/sfw/bin/中,(当然你也可以自己安装gcc到你喜欢的目录,设置时制定你所设置的目录即可),所以通常需要做的设置如下:

export CPP="/usr/sfw/bin/gcc -E"

export CC=/usr/sfw/bin/gcc

export CXX=/usr/sfw/bin/g++

Tuesday Jun 12, 2007

Call C++ function from C & Call C function from C++ (C和C++函数互相调用)

  • C++中调用C函数

这个比较容易,使用关键字extern就可以,有以下两种格式:

extern "language_name" declaration ;
extern "language_name" { declaration ; declaration ; ... }

例如:

int C_Coding(int a, float b);

做法是,

/\* cpp_a.h \*/

extern "C" {

#include "a.h"

}

/\* cpp_a.h \*/

extern "C" {

int C_Coding(int a, float b); /\* 重定义所有的C函数 \*/

}

这样在C++程序中就可以调用C函数C_Coding了。

  • C中调用C++函数

调用一个类的public函数,

extern "C" int call_M_foo(M\* m, int i) { return m->foo(i); }

分解开来主要以下几步:

1.         C++头文件中,声明有public函数的C++

2.         C++文件中,实现一个有public函数的C++

3.         进行二次封装:在另一个C++文件中实现一个C函数接口,在这个 C函数中,对类的public函数进行调用。

4.         将这个接口函数用extern导出,使得其他C函数可以调用该函数。

网上有很多例子可以参考:

http://blog.donews.com/xzwenlan/archive/2005/05/31/405799.aspx

http://blog.csdn.net/AbnerChai/archive/2006/11/29/1419214.aspx

 

推荐这里一篇讲如何混合C&C++代码的:

Reference:http://developers.sun.com/sunstudio/articles/mixing.html

Monday Jun 11, 2007

memcpy VS memmove

memcpy() 和 memmove() 有什么区别?

如果源和目的参数所指向的内存有重叠区域, memmove() 提供有保证的行为。而 memcpy()  则不能提供这样的保证, 因此可以实现得更加有效率。如果有疑问, 最好使用 memmove()。

Reference :
http://c-faq.com/ansi/memmove.html
http://blog.csdn.net/swguru/archive/2002/08/05/17175.aspx 

bzero VS memset

bzero:

The bzero is not in standard (ANSI) C function, it appeared in 4.3BSD (Berkeley UNIX C library). Its prototype existed previously in <string.h> before it was moved to <strings.h> for IEEE Std 1003.1-2001 (``POSIX.1'') compliance.

To use it in ANSI C you have to define it first(Some system do this work). memset, on the other hand, pre-exists and has a known meaning.
This function is deprecated (marked as LEGACY in POSIX.1-2001): use
memset in new programs.

This is a partially obsolete alternative for memset, derived from BSD. Note that it is not as general as memset, because the only value it can store is zero.

Reference:
http://www.opengroup.org/onlinepubs/000095399/functions/bzero.html

 

http://people.redhat.com/drepper/posix-option-groups.html

memset:

function conforms to ISO/IEC 9899:1990 (``ISO C90'').It was adopted by the System V, ANSI C and POSIX standards and bzero deprecated.

Reference:http://www.opengroup.org/onlinepubs/000095399/functions/memset.html


When porting to Solaris, here is possible ANSI/POSIX/SVR4 replacements for popular BSD functions:

Reference:http://ns.uoregon.edu/portability-faq/soltopic4.html

Friday Jun 08, 2007

Printing integer in Binary format (打印一个整数的二进制形式)

我写的一段打印整数的二进制的代码,因为经常用到,就把它贴出来了。

 void printBinary(int nNum)
{
    int i = 0, j = 0;
    int \*nOutBin;
    
    j =i = sizeof(int)\*8;
    nOutBin = (int\*)malloc(i+1);
    memset(nOutBin, 0, i+1);
    while (nNum!=0) {
        nOutBin[--i] = nNum % 2;
        nNum = nNum / 2;
    }
    while (i<j)
        printf("%d", nOutBin[i++]);
    printf("\\n");


    return;
}

 
转换原理:
假设要将十进制数N转换为d进制数,一个简单的转换算法是重复下述两步,直到N等于零:
X = N mod d  (其中mod为求余运算)
N = N div d   (其中div为整除运算 )
在上述计算过程中,第一次求出的X值为d进制数的最低位,最后一次求出的X值为d进制数的最高位,
所以上述算法是从低位到高位顺序产生d进制数各个数位上的数。

 

Tuesday Jun 05, 2007

Duff's Device (达夫设备)

这是个很棒的迂回循环展开法, 由 Tom Duff 在 Lucasfilm 时所设计。 它的 ``传统" 形态, 是用来复制多个字节:

    register n = (count + 7) / 8;   /\* count > 0 assumed \*/
switch (count % 8)
{
case 0: do { \*to = \*from++;
case 7: \*to = \*from++;
case 6: \*to = \*from++;
case 5: \*to = \*from++;
case 4: \*to = \*from++;
case 3: \*to = \*from++;
case 2: \*to = \*from++;
case 1: \*to = \*from++;
} while (--n > 0);
}

这里 count 个字节从 from 指向的数组复制到 to 指向的 内存地址 (这是个内存映射的输出寄存器, 这也是为什么它没有被增加)。它把  swtich 语句和复制 8 个字节的循环交织在一起, 从而解决了剩余字节的 处理问题 (当 count 不是 8 的倍数时)。相信不相信, 象这样的把  case 标志放在嵌套在 swtich 语句内的模块中是合法的。 当他公布这个技巧给 C 的开发者和世界时, Duff 注意到 C 的 swtich  语法, 特别是 ``跌落" 行为, 一直是被争议的, 而 ``这段代码在争论中形成了某种 论据, 但我不清楚是赞成还是反对"

函数包含一个switch语句,它的case语句同时位于一个while循环体内(有一个case语句在外面)。switch内的表达式计算被八除的余 数。执行开始于while循环内的哪个位置由这个余数决定,最终循环退出,(没有break)。Duff's Device这样就简单漂亮地解决了边界条件的问题。顺便提一下,为什么“case 0”标记在循环外面呢?这样不是打破了对称的美观吗?这样做的唯一理由是为了处理空序列。当余数为零,“case 0”内就需要执行一个多余的测试来判断空序列的可能性。总之,这是个很酷的算法。

达夫设备是一个加速循环语句的C编码技巧。其基本思想是--减少循环测试的执行次数

如果在一个for循环中,其中操作执行得如果足够快(比如说,一个赋值)——那么测试循环条件占用了循环所用时间的很大部分。循环应该被部分解开,这样数个操作一次完成,测试操作也做的较少。其实,是通过switch语句将要进行的连续循环操作的次数进行了预判(根据擦case语句的位置)然后依次执行,而不必每次都去进行测试条件。
在这里Duff's Device是个新颖的,有创造力的解决方案。这里有一个使用该模型的一个实例:快速拷贝和填充。

Duff's Device对效率的负面影响可能来自于代码膨胀(一些处理器更善于处理紧凑的循环而不是大的循环)和特别的结构。优化器被做成当遇一些更加技巧性的结构时可能会不知所措从而生成比较保守的代码。
 

除了这个一般形态,以下给出一个简化形态,可以帮助理解。循环打印n个星号\* (来自Steve’s ‘Cute Code’ collection):

 int a = some_number ;
int n = ( a + 4 ) / 5 ;
switch ( a % 5 )
{
case 0: do
{
putchar ( '\*' ) ;
case 4: putchar ( '\*' ) ;
case 3: putchar ( '\*' ) ;
case 2: putchar ( '\*' ) ;
case 1: putchar ( '\*' ) ;
} while ( --n ) ;
}
printf ( "\\n" ) ;

FYI:
duffs-device page :http://www.lysator.liu.se/c/duffs-device.html

C language knowledge (4) -- Library (C语言 基本知识4)

C language knowledge (4) -- Library (C语言 基本知识4)

C (programming language)

From Wikipedia, the free encyclopedia

Design

The name and characteristic of each function are included into a computer file called a header file but the actual implementation of functions are separated into a library file. The naming and scope of headers have become common but the organization of libraries still remains diverse. The standard library is usually shipped along with a compiler. Since C compilers often provide extra functionalities that are not specified in ANSI C, a standard library with a particular compiler is mostly incompatible with standard libraries of other compilers.

Much of the C standard library has been shown to have been well-designed. A few parts, with the benefit of hindsight, are regarded as mistakes. The string input functions gets() (and the use of scanf() to read string input) are the source of many buffer overflows, and most programming guides recommend avoiding this usage. Another oddity is strtok(), a function that is designed as a primitive lexical analyser but is highly "fragile" and difficult to use.

History

The C programming language, before it was standardized, did not provide built-in functionalities such as I/O operations (unlike traditional languages such as Cobol and Fortran). Over time, user communities of C shared ideas and implementations of what is now called C standard libraries to provide that functionality. Many of these ideas were incorporated eventually into the definition of the standardized C programming language.

Both Unix and C were created at AT&T's Bell Laboratories in the late 1960s and early 1970s. During the 1970s the C programming language became increasingly popular. Many universities and organizations began creating their own variations of the language for their own projects. By the beginning of the 1980s compatibility problems between the various C implementations became apparent. In 1983 the American National Standards Institute (ANSI) formed a committee to establish a standard specification of C known as "ANSI C". This work culminated in the creation of the so-called C89 standard in 1989. Part of the resulting standard was a set of software libraries called the ANSI C standard library.

Later revisions of the C standard have added several new required header files to the library. Support for these new extensions varies between implementations.

The headers <iso646.h>, <wchar.h>, and <wctype.h> were added with Normative Addendum 1 (hereafter abbreviated as NA1), an addition to the C Standard ratified in 1995.

The headers <complex.h>, <fenv.h>, <inttypes.h>, <stdbool.h>, <stdint.h>, and <tgmath.h> were added with C99, a revision to the C Standard published in 1999.

ANSI Standard

The ANSI C standard library consists of 24 C header files which can be included into a programmer's project with a single directive. Each header file contains one or more function declarations, data type definitions and macros. The contents of these header files follows.

In comparison to some other languages (for example Java) the standard library is minuscule. The library provides a basic set of mathematical functions, string manipulation, type conversions, and file and console-based I/O. It does not include a standard set of "container types" like the C++ Standard Template Library, let alone the complete graphical user interface (GUI) toolkits, networking tools, and profusion of other functionality that Java provides as standard. The main advantage of the small standard library is that providing a working ANSI C environment is much easier than it is with other languages, and consequently porting C to a new platform is relatively easy.

Many other libraries have been developed to supply equivalent functionality to that provided by other languages in their standard library. For instance, the GNOME desktop environment project has developed the GTK+ graphics toolkit and GLib, a library of container data structures, and there are many other well-known examples. The variety of libraries available has meant that some superior toolkits have proven themselves through history. The considerable downside is that they often do not work particularly well together, programmers are often familiar with different sets of libraries, and a different set of them may be available on any particular platform.

 

ANSI C library header files

<assert.h> Contains the assert macro, used to assist with detecting logical errors and other types of bug in debugging versions of a program.
<complex.h> A set of functions for manipulating complex numbers. (New with C99)
<ctype.h> This header file contains functions used to classify characters by their types or to convert between upper and lower case in a way that is independent of the used character set (typically ASCII or one of its extensions, although implementations utilizing EBCDIC are also known).
<errno.h> For testing error codes reported by library functions.
<fenv.h> For controlling floating-point environment. (New with C99)
<float.h> Contains defined constants specifying the implementation-specific properties of the floating-point library, such as the minimum difference between two different floating-point numbers (_EPSILON), the maximum number of digits of accuracy (_DIG) and the range of numbers which can be represented (_MIN, _MAX).
<inttypes.h> For precise conversion between integer types. (New with C99)
<iso646.h> For programming in ISO 646 variant character sets. (New with NA1)
<limits.h> Contains defined constants specifying the implementation-specific properties of the integer types, such as the range of numbers which can be represented (_MIN, _MAX).
<locale.h> For setlocale() and related constants. This is used to choose an appropriate locale.
<math.h> For computing common mathematical functions
<setjmp.h> Declares the macros setjmp and longjmp, which are used for non-local exits
<signal.h> For controlling various exceptional conditions
<stdarg.h> For accessing a varying number of arguments passed to functions.
<stdbool.h> For a boolean data type. (New with C99)
<stdint.h> For defining various integer types. (New with C99)
<stddef.h> For defining several useful types and macros.
<stdio.h> Provides the core input and output capabilities of the C language. This file includes the venerable printf function.
<stdlib.h> For performing a variety of operations, including conversion, pseudo-random numbers, memory allocation, process control, environment, signalling, searching, and sorting.
<string.h> For manipulating several kinds of strings.
<tgmath.h> For type-generic mathematical functions. (New with C99)
<time.h> For converting between various time and date formats.
<wchar.h> For manipulating wide streams and several kinds of strings using wide characters - key to supporting a range of languages. (New with NA1)
<wctype.h> For classifying wide characters. (New with NA1)

The C standard library in C++

The C++ programming language includes the functionality of the ANSI C standard library, but makes several modifications, such as changing the names of the header files from <xxx.h> to <cxxx> (however, the C-style names are still available, although deprecated), and placing all identifiers into the std namespace.

Common support libraries

While not standardized, C programs may depend on a runtime library of routines which contain code the compiler uses at runtime. The code that initializes the process for the operating system, for example, before calling main(), is implemented in the C Run-Time Library for a given vendor's compiler. The Run-Time Library code might help with other language feature implementations, like handling uncaught exceptions or implementing floating point code.

The C standard library only documents that the specific routines mentioned in this article are available, and how they behave. Because the compiler implementation might depend on these additional implementation-level functions to be available, it is likely the vendor-specific routines are packaged with the C Standard Library in the same module, because they're both likely to be needed by any program built with their toolset.

Though often confused with the C Standard Library because of this packaging, the C Runtime Library is not a standardized part of the language and is vendor-specific.

Compiler built-in functions

Some compilers (for example, GCC[1]) provide built-in versions of many of the functions in the C standard library; that is, the implementations of the functions are written into the compiled object file, and the program calls the built-in versions instead of the functions in the C library shared object file. This reduces function call overhead, especially if function calls are replaced with inline variants, and allows other forms of optimisation (as the compiler knows the control-flow characteristics of the built-in variants), but may cause confusion when debugging (for example, the built-in versions cannot be replaced with instrumented variants).


About

williamxue

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today