By templedf on Aug 17, 2005
I'm still working on my new test for the Grid Engine test suite. The test is supposed to confirm that we're using backslashes correctly for continuation of long lines, both in user input and in tool output. As part of the test, I read in output from a tool, like this:
name1 NONE name2 value1,value2, \\ value3 name3 NONE
and then use the following line of Tcl to extract the values from only name2:
set tokens [lrange [split [join [lrange [split $output "\\n"] 1 end-1]] ", "] 1 end]
That line takes the output from the tool (stored in the variable output), splits it on newlines, removes the first and last entries (the name1 and name3 lines), reassembles the output, splits it on commas and spaces, removes the first entry (the word, name2), and stores the resulting list in the variable, tokens. The result should be an array that contains:
value1 value2 \\ value3. I can then walk through that list and make sure the values and the backslashes appear where I expect them. In theory.
As I was working on this test, I had a strange problem. I added debug output that would print the value of output before being parsed and the value of tokens after being parsed. The pre-parse output was fine, but the post-parse output was completely broken. Words were spliced together. Parts were missing. Things were out of order. In short, it looked like a nasty memory overrun. The problem is that Tcl doesn't have pointers.
On the suggestion of my office mate, Stephan, I tried a different Tcl interpreter. Same result. Obviously, if it was a pointer problem, it was inherent in the language.
I dug a little deeper and discovered that the strange output problems didn't start until I encountered a token which was a backslash. Sounded fishy. I figured that Tcl must somehow be interpreting the backslash and combining it with the following character, to result in something that causes pointer problems for the interpreter. I quoted all of the occurances of the token variable and made sure that each was followed by an innocuous character. No difference.
I dug deeper. I decided to print out the lengths of the tokens. Sure enough, the backslash token was two characters long, not one. I hunted through the Tcl man pages until I found the scan and format functions, which allowed me to read in the individual characters of the token as decimal numbers and print them out as hexidecimal integers. Guess what it printed! "
5c 0a" A quick glace at LookupTables.com confirmed my suspicion.
5c is the code for backslash.
0a is the code for carriage return! That's why my output was so jumbled!
I still don't know why there are carriage returns in my tokens. I used od(1) to confirm that they're not in original output from the tool. It may be a quirk of our test suite or a quirk of Tcl. Either way, I spent several hours banging my head against my desk over some carriage returns that weren't supposed to be there. God, I love programming!