Monday Oct 20, 2008

Another common task for Python

I'm in the midst of debugging a snoop implementation and I wanted to recompile with gcc and use gdb. I saved the output from the make command and basically used vi to put each .o file on a single line:

[th199096@jhereg snoop]> more files.make 
nfs4_xdr.o
snoop.o
snoop_aarp.o
snoop_adsp.o
snoop_aecho.o
snoop_apple.o

Note that I could strip off the '.o's manually, but typically I would leave them there. What I want to do is take the filename and use it twice in command. I.e.

% gcc -g -c -o nfs4_xdr.o nfs4_xdr.c

So I decided to use Python to learn a bit more about it:

[th199096@jhereg snoop]> more tran.py 
#!/usr/sfw/bin/python

l1 = []
print "#!/bin/sh -x"
print "# Make no changes here, machine generated!"
print "rm snoop \*.o"

for line in open("files.make"):
        [name, ext] = line.split('.')
        print "gcc -g -DUSE_FOR_SNOOP -c -I/builds/th199096/snoop/proto/root_i386/usr/include" \\
                " -I. -I/builds/th199096/snoop/usr/src/common/net/dhcp -o %s.o %s.c" % (name, name)
        l1.append(name)

print 'gcc -g -DTEXT_DOMAIN="SUNW_OST_OSCMD" -D_TS_ERRN -Bdirect -o snoop ',
for name in l1:
        print "%s.o" % (name),

print "-L/builds/th199096/snoop/proto/root_i386/lib -L/builds/th199096/snoop/proto/root_i386/usr/lib" \\
        " -ldhcputil -ldlpi -lsocket -lnsl -ltsol"

One thing that jumped out was since I threw away the 'ext', I didn't have to worry abotu stripping off the '\\n'. I also made use of the ',' on the end of the print statements to keep a line going.

I liked the ease of adding to the list of filenames. And in general, I found it easy to make a quick change and retest.

Could I have done this another way, say with the Makefile? Sure, but it wouldn't have been a learning experience. And off I go, the gdb prompt is calling me!


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Wednesday Oct 08, 2008

Another reader suggestion for the Python script

So Justin suggested:

try

template =  """"%(started)s - %(ended)s: %(title)s for      
%(company)s
    %(jobdesc)s"""


print template % row

I'm trying to capture what he put in, the line wrap on the first 'template' line is probably the comment system.

So I entered this, making a reformat tweak, as:


template = """"%(started)s - %(ended)s: %(title)s for %(company)s %(description)s"""

if __name__ == "__main__":
        for row in main(sys.argv[1],'!'):
                print template % row

And I got the wrong output. The error is of course mine:

> ./justin.py r4.txt
"1/05 - present: Staff Engineer Software for Sun Microsystems NFS development
"6/01 - 12/05: File System Engineer for Network Appliance WAFL and NFS development
"4/01 - 6/01: Manager for Network Appliance Manager of Engineering Internal Test
"10/99 - 4/01: System Administrator for Network Appliance Perl hacker and filer administrator

I shouldn't have joined the lines in the 'template' format. But why the extra '"'?

Ahh, Justin had one in his entry.

Okay, so the new changes work, here is the full program:

#!/usr/bin/env python

import csv, sys

def main(dfile,format,delimiter=","):
        db=open(dfile,'U')
        start=0
        for line in db:
                if line.startswith(format):
                        db.seek(start+len(format))
                        return csv.DictReader(db,delimiter=delimiter)
                else:
                        start+=len(line)+(len(db.newlines)==2) #windows hackery
        raise "There is no %s header line in %s" % (format,dfile)

template = """%(started)s - %(ended)s: %(title)s for %(company)s
        %(description)s"""

if __name__ == "__main__":
        for row in main(sys.argv[1],'!'):
                print template % row

And now I need to go figure out what Justin did...

Okay, the '%(name)s' has to be a formatting option. Can I duplicate it?

>>> for row in csv.DictReader(file("r4.txt")):
...     print "%(title)s" % row
...
Staff Engineer Software
File System Engineer
Manager
System Administrator

And now I do understand it - 3.6.2 String Formatting Operations:

A conversion specifier contains two or more characters and has the following components, which must occur in this order:

   1. The "%" character, which marks the start of the specifier.
   2. Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)). 

So, this code would be good for printing, but not necessarily for doing more complex manipulations.

By the way, I'm having fun figuring this stuff out...


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Tuesday Oct 07, 2008

Finally, the Python version of the old Perl script

I played about in the interactive Python shell trying to understand the data and how to tie it together. I learned about the difference between exec and eval for Python. I learned about capturing stdio and stdout for exec, but I couldn't figure out a way to automatically create variables in the proper scope in Python.

I even finally found a good quote on this at http://mail.python.org/pipermail/tutor/2005-January/035253.html:

> This is something I've been trying to figure out for some time.  Is
> there a way in Python to take a string [say something from a
> raw_input] and make that string a variable name?  I want to to this so
> that I can create class instances on-the-fly, using a user-entered
> string as the instance name.

This comes up regularly from beginners and is nearly always a bad
idea!

The easy solution is to use a dictionary to store the instances.

Nice to know I'm not the first to want to do this. But it did get me thinking, I have been calling this set of Perl scripts 'data dictionaries' for longer than I care to remember. And the code is not very legible at times. So, I decided to redo the script as:

#!/usr/bin/python

import sys

first_line = True

lang = []
iCounter = 0
for line in open(sys.argv[1]):
        line2 = line.lstrip()
        iCounter += 1

        if line2.startswith("!") or line2.startswith("#"):
                if first_line:
                        lang = line2[1:].split(",")
                        first_line = False
                continue
        splity = line2.split(",")
        dtemp = {}

        if len(splity) != len(lang):
                print "Error - args do not match header on line %d" % (iCounter)
                continue

        for i in range(len(splity)):
                dtemp[lang[i]] = splity[i]

        print "%s - %s: %s for %s\\n\\t%s\\n" % (
                dtemp['started'],
                dtemp['ended'],
                dtemp['title'],
                dtemp['company'],
                dtemp['description'])

dtemp['started'] is more verbose than $started, but it is clearer how I am generating the data. And I have more error checking (which I have yet to sanity check :->).

Anyway, this fails and I knew why almost right off the bat:

> ./r3.py r2.txt
Traceback (most recent call last):
  File "./r3.py", line 33, in 
    dtemp['description'])
KeyError: 'description'

I was suspicious about that extra newline I mentioned way back in The simple version of the old perl script. I suspected that the entry line still had an extra one that I needed to remove. I.e., the data dictionary has a key for 'dictionary\\n' and not 'dictionary'.

The following change proved that:

for line in open(sys.argv[1]):
        line1 = line.lstrip()
        line2 = line1.rstrip()
        iCounter += 1

And some quick sanity checking of removing a column in one row and adding one in another row shows that my error checking works:

> ./r3.py r3.txt
Error - args do not match header on line 2
Error - args do not match header on line 3
4/01 - 6/01: Manager for Network Appliance
        Manager of Engineering Internal Test

10/99 - 4/01: System Administrator for Network Appliance
        Perl hacker and filer administrator

So I learned what I set out to do. I may never use this script, but it helped me learn some things the hard way. I didn't show all of the little syntax errors I had to fix (forgetting the ':', not indenting in the interactive shell, etc). But hopefully, I'll remember them.

I'll also claim that the script does meet my needs as did the old one. If I add a new field to the flat file, I won't have to change the script to get the current output! And yes, I just tried that and I didn't have a problem.

I could do some more error checking (i.e., don't access an entry unless it is set), but I've already gone above the error checking in the Perl script.

Final Copy

#!/usr/bin/python

import sys

first_line = True

lang = []
iCounter = 0
for line in open(sys.argv[1]):
        line1 = line.lstrip()
        line2 = line1.rstrip()
        iCounter += 1

        if line2.startswith("!") or line2.startswith("#"):
                if first_line:
                        lang = line2[1:].split(",")
                        first_line = False
                continue
        splity = line2.split(",")
        dtemp = {}

        if len(splity) != len(lang):
                print "Error - args do not match header on line %d" % (iCounter)
                continue

        for i in range(len(splity)):
                dtemp[lang[i]] = splity[i]

        print "%s - %s: %s for %s\\n\\t%s\\n" % (
                dtemp['started'],
                dtemp['ended'],
                dtemp['title'],
                dtemp['company'],
                dtemp['description'])

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Python strings are Immutable

So I added some code to my simple script that wasn't in the Perl:

for line in open(sys.argv[1]):
        line.lstrip()

And my intent was to strip out all of the leading spaces. I didn't have to, but I created a simple test case with the first line pushed over by a tab and the second line pushed over by 8 spaces. The first one worked correctly and the second did not:

Update, I wasn't thinking correctly here, I knew I had two bad lines here and I didn't know why. After solving the coding problem, I can see that both lines of input failed. The header line is being treated here as if it were a normal line and being processed.

> ./simple2.py r2.txt
        !started - ended: title for company
        description



     1/05 - present: Staff Engineer Software for Sun Microsystems
        NFS development

Well d'oh, even in Perl I just told it to strip out one character. I've got to tell it that while there is whitespace, strip it out:

for line in open(sys.argv[1]):
        while line.isspace(): line.lstrip()

And this doesn't work either. At which point I realize it must be because strings are immutable, right? I mean it is never changing! Note I get to the right conclusion, but for the wrong reasons. If it were immutable and the string had whitespace at the start, I should be stuck in an endless loop here. See the ending section for that analysis.

It also points out that I never did anything with that line.lstrip(). It never changes line, but does create a reference to a new string. Which we can see here:

>>> st2 = "     This is the radio clash!"
>>> print st2.lstrip()
This is the radio clash!
>>> print st2
     This is the radio clash!
>>>

See, st2.lstrip() actually works!

I've fixed up the script (in a boring way) and it works:

for line in open(sys.argv[1]):
        line2 = line.lstrip()
        if line2.startswith("!") or line2.startswith("#"): continue
        print "%s - %s: %s for %s\\n\\t%s\\n\\n" % tuple(line2.split(","))

Another mistake I just made

Okay, to try to understand this, I did the following in the shell:

>>> st2 = "     This is the radio clash!"
>>> while st2.isspace():
...     print st2
...     st2.lstrip()
...

Which should be an endless loop according to what I know now. But nothing gets done. Which means that st2.isspace() is FALSE. And a help(st2.isspace) shows that:

Help on built-in function isspace: isspace(...) S.isspace() -> bool Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.

I.e., my misunderstanding of st2.lstrip() being immutable made me think that st2.isspace() worked on the first character of the string. Actually, I made a bad assumption based on what I thought C would do. My bad.

So I don't ever want to do that while loop on a string which is really all whitespace.

All the reading in the world about Python strings will not help me understand the immutability of them as much as this simple example.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

The simple version of the old perl script

My first pass at a Python version of An old perl script reveals my inner C programmer. I've restricted the program to the simple version which does not generate the column name as local variables - first I want to get my proof of concept correct:

#!/usr/bin/python

for line in open("resume.txt"):
        line.lstrip()
        if line.startswith("!") or line.startswith("#"): continue
        (started, ended, title, company, description) = line.split(",")
        print "%s - %s: %s for %s\\n\\t%s\\n\\n", started, ended, title, company, description

It looks like it will reformat, but I've messed up the print statement:

> ./simple_1.py
%s - %s: %s for %s
        %s

1/05 present Staff Engineer Software Sun Microsystems NFS development

%s - %s: %s for %s
        %s

6/01 12/05 File System Engineer Network Appliance WAFL and NFS development

%s - %s: %s for %s
        %s

4/01 6/01 Manager Network Appliance Manager of Engineering Internal Test

%s - %s: %s for %s
        %s

10/99 4/01 System Administrator Network Appliance Perl hacker and filer administrator

I.e., I treated print like a C printf. Okay, I can try again with this one:

#!/usr/bin/python

for line in open("resume.txt"):
        line.lstrip()
        if line.startswith("!") or line.startswith("#"): continue
        (started, ended, title, company, description) = line.split(",")
        print "%s - %s: %s for %s\\n\\t%s\\n\\n" % (started, ended, title, company, description)

And get more of what I want to see:

> ./simple.py
1/05 - present: Staff Engineer Software for Sun Microsystems
        NFS development



6/01 - 12/05: File System Engineer for Network Appliance
        WAFL and NFS development



4/01 - 6/01: Manager for Network Appliance
        Manager of Engineering Internal Test



10/99 - 4/01: System Administrator for Network Appliance
        Perl hacker and filer administrator

I'm getting an extra line I don't want and I have to hard code the file to process. I can easily fix these both up:

#!/usr/bin/python

import sys

for line in open(sys.argv[1]):
        line.lstrip()
        if line.startswith("!") or line.startswith("#"): continue
        (started, ended, title, company, description) = line.split(",")
        print "%s - %s: %s for %s\\n\\t%s\\n" % (started, ended, title, company, description)

Feeling adventurous

Okay, with this simple example, I could get rid of the names in Perl and make it really simple. Can I do so in Python?

#!/usr/bin/python

import sys

for line in open(sys.argv[1]):
        line.lstrip()
        if line.startswith("!") or line.startswith("#"): continue
        print "%s - %s: %s for %s\\n\\t%s\\n\\n" % line.split(",")

No, not as I have tried:

> ./simple2.py resume.txt
Traceback (most recent call last):
  File "./simple2.py", line 8, in 
    print "%s - %s: %s for %s\\n\\t%s\\n\\n" % line.split(",")
TypeError: not enough arguments for format string

I've got a type error, hmm, I'm going to try this by hand:

> python
>>> st1 = "This is the radio clash!"
>>> st1.split()
['This', 'is', 'the', 'radio', 'clash!']
>>>

So I have a '[]' instead of a '()'. What does that mean? It means I have a list versus a tuple. And I find a converter called strangely enough, tuple:

        print "%s - %s: %s for %s\\n\\t%s\\n\\n" % tuple(line.split(","))

And that works.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Monday Oct 06, 2008

Trying to figure out printing and variables in Python

I'm pretty used to referencing variables inside print blocks in Perl. I'm not at all comfortable with Python. I have a block of code that I want to change the 'onhg' to come out of a config file. So I set up a scratch directory and make a bare bones implementation:

[th199096@jhereg etc]> ls -laiR ~/scratch/
/home/th199096/scratch/:
total 42
       749 drwxr-xr-x   3 th199096 staff          4 Oct  6 16:36 .
         3 drwxr-xr-x  39 th199096 staff         55 Oct  6 16:35 ..
       752 drwxr-xr-x   2 th199096 staff          6 Oct  6 16:42 etc
       750 -rwxr-xr-x   1 th199096 staff       1297 Oct  6 16:42 updateoso.py

/home/th199096/scratch/etc:
total 25
       752 drwxr-xr-x   2 th199096 staff          6 Oct  6 16:42 .
       749 drwxr-xr-x   3 th199096 staff          4 Oct  6 16:36 ..
       753 -rw-r--r--   1 th199096 staff       1052 Oct  6 16:40 __init__.py
       754 -rw-r--r--   1 th199096 staff        243 Oct  6 16:41 __init__.pyc
       751 -rwxr-xr-x   1 th199096 staff         94 Oct  6 16:42 config.py
       756 -rw-r--r--   1 th199096 staff        257 Oct  6 16:42 config.pyc

Where the config file simply has:

GATE_USER = "onhg"
GATE_GROUP = "gk"

OSOREPO = "ssh://hg.opensolaris.org/hg/onnv/onnv-gate"

And the updateoso has:

import os, pwd, subprocess, sys

from mercurial import hg, repo
from mercurial.node import hex

sys.path.insert(1,
    os.path.realpath(os.path.join(os.path.dirname(__file__), "..")))
from etc import config

__USAGE = """
updateoso.py [-n] <-R repo root>
    -n: dry run, no email sent (displayed on stdout)
    -R: root dir of repo (where .hg is)

Attempt to send changes to 
%s

This script must be run as user "onhg".
You should set up RBAC and use pfexec(1).
""" % (config.OSOREPO)
__USAGE = __USAGE.strip()

print >> sys.stderr, __USAGE

Well, in isolation, I can already see what I am going to have to do. All I need to do is replace the 'ohng' with a %s and add a second argument:

[th199096@jhereg ~/scratch]> diff updateoso.py updateoso.py.first 
39c39
< This script must be run as user "%s".
---
> This script must be run as user "ohng".
41c41
< """ % (config.OSOREPO, config.GATE_USER)
---
> """ % (config.OSOREPO)

And we get:

[th199096@jhereg ~/scratch]> ./updateoso.py
updateoso.py [-n] <-R repo root>
    -n: dry run, no email sent (displayed on stdout)
    -R: root dir of repo (where .hg is)

Attempt to send changes to 
ssh://hg.opensolaris.org/hg/onnv/onnv-gate

This script must be run as user "onhg".
You should set up RBAC and use pfexec(1).

I ought to be able to test this inside the interactive shell:

[th199096@jhereg ~/scratch]> python
Python 2.4.4 (#1, Aug 25 2008, 03:30:42) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import updateoso
updateoso.py [-n] <-R repo root>
    -n: dry run, no email sent (displayed on stdout)
    -R: root dir of repo (where .hg is)

Attempt to send changes to 
ssh://hg.opensolaris.org/hg/onnv/onnv-gate

This script must be run as user "onhg".
You should set up RBAC and use pfexec(1).
>>> config.GATE_USER = "duke"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'config' is not defined

Okay, I should have known that wasn't going to work. It would probably work in the code (we'll see later), but for now this will work:

>>> updateoso.config.GATE_USER = "duke"
>>> reload(updateoso)
updateoso.py [-n] <-R repo root>
    -n: dry run, no email sent (displayed on stdout)
    -R: root dir of repo (where .hg is)

Attempt to send changes to 
ssh://hg.opensolaris.org/hg/onnv/onnv-gate

This script must be run as user "duke".
You should set up RBAC and use pfexec(1).
<module 'updateoso' from 'updateoso.pyc'>

To be honest, I knew the reference would work, but I expected it to be reset. In retrospect, I can see that I reloaded updateoso and etc/config. Just something to get used to. I could force it to 'reset' via:

>>> reload(etc/config)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'etc' is not defined
>>> from etc reload(config)
  File "<stdin>", line 1
    from etc reload(config)
                  \^
SyntaxError: invalid syntax
>>> reload(updateoso.config)
<module 'etc.config' from 'etc/config.pyc'>
>>> reload(updateoso)
updateoso.py [-n] <-R repo root>
    -n: dry run, no email sent (displayed on stdout)
    -R: root dir of repo (where .hg is)

Attempt to send changes to 
ssh://hg.opensolaris.org/hg/onnv/onnv-gate

This script must be run as user "onhg".
You should set up RBAC and use pfexec(1).
<module 'updateoso' from 'updateoso.pyc'>

Took me a bit to figure out the syntax.

Okay, can I see the change from the script:

[th199096@jhereg ~/scratch]> diff updateoso.py updateoso.py.second 
45,48d44
< 
< config.GATE_USER = "gark"
< print >> sys.stderr, __USAGE
< 

I don't expect this to work. And it doesn't.

This script must be run as user "onhg".
...
This script must be run as user "onhg".

How about a test driver script?

[th199096@jhereg ~/scratch]> more test.py 
import updateoso

print "Now change the user"

updateoso.config.GATE_USER = "nark"

reload(updateoso)

And that works:

This script must be run as user "onhg".
...
Now change the user
...
This script must be run as user "nark".

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Sunday Oct 05, 2008

Learning a new language - python

So I decided to learn python - why? because it is used by Mercurial. And there is at least one of the gatekeeping scripts which I needed to hack for the nfs41-gate.

I bought Learning Python, Third Edition by Mark Lutz because the local Borders did not have Programming Python, Third Edition. From some reviews I read, it would probably have been a better fit for me.

I know I can find most of what I want on the net, but I wanted a printed resource.

Anyway, I had a question right off the bat - about whether if file a imports modules b and c, what happens if c also imports b? Deeper in the book that I've read, it does state that an import is equivalent to load the file if it is not already loaded. But that doesn't help me learn the language. :->

The following example is quite simple, but effective in answering the question for me:

a.py

> cat a.py
#!/usr/bin/python

title = "This is the file a.py!"

print title

print "importing b from a"
import b

print "importing c from a"
import c

b.py

> cat b.py
#!/usr/bin/python

title = "This is the file b.py!"

print title

c.py

> cat c.py
#!/usr/bin/python

import b

title = "This is the file c.py!"

print title

Test 1

> ./a.py
This is the file a.py!
importing b from a
This is the file b.py!
importing c from a
This is the file c.py!

So we see that if a has loaded it, then c will not. How about the other way?

a2.py

> cat a2.py
#!/usr/bin/python

title = "This is the file a.py!"

print title

print "importing c from a"
import c

print "importing b from a"
import b

print "and b's title is"
print b.title

Test 2

> ./a2.py
This is the file a.py!
importing c from a
This is the file b.py!
This is the file c.py!
importing b from a
and b's title is
This is the file b.py!

We see that c loads b and that b's attributes are visible from a.

What would really help me here is if b could state the call stack of what is importing it.

Test 3

A simple change fails:

> cat b.py
#!/usr/bin/python

title = "This is the file b.py!"

print title

print "Called from", __file__

but it does show the effect of byte code compilation:

> ./a.py
This is the file a.py!
importing b from a
This is the file b.py!
Called from /home/tdh/python/b.py
importing c from a
This is the file c.py!
> ./a2.py
This is the file a.py!
importing c from a
This is the file b.py!
Called from /home/tdh/python/b.pyc
This is the file c.py!
importing b from a
and b's title is
This is the file b.py!

I can see the "nesting" if I pop into an interactive session:

> python
>>> import a2
This is the file a.py!
importing c from a
This is the file b.py!
Called from b.pyc
This is the file c.py!
importing b from a
and b's title is
This is the file b.py!
>>> a2.__dict__.keys()
['c', 'b', 'title', '__builtins__', '__file__', '__name__', '__doc__']
>>> a2.c.__dict__.keys()
['b', 'title', '__builtins__', '__file__', '__name__', '__doc__']
>>> a2.c.b.__dict__.keys()
['__builtins__', '__name__', '__file__', '__doc__', 'title']

But this doesn't answer my question of how to figure this out recursively. I.e., I guess I am looking for a parent "pointer" and I could walk it to get my answer.

But I've still learned more than just reading the book linearly.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
About

tdh

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today