Python tips - Handle text file

There are two text files, each are 10 million lines, the size of the text file at about 100M. Now need to know that the two documents there is cross-check the number of lines, in other words, we want to know the the number of lines simultaneously in the two documents exist. Each text file here is unique, so they do not have any duplicate rows. Python set could do this very easy and higher efficient than shell, awk.
#!/usr/bin/python
a = set(open(”data.uniq.1″))
b = set(open(”date.uniq.2″))
print len(a; b)
Here I find a blog in Chinese also description this tips
Comments:

I don't believe that Python is faster than AWK, and even if it were, which I \*know\* he isn't, I can always compile AWK code into a straight binary executable, so as long as Python doesn't get a state of the art compiler, he will \*NEVER\* be faster than AWK!

Posted by UX-admin on March 10, 2009 at 07:19 PM CST #

please thex coz it works but
how can i print number of times each word appears in each of lines taking for example in file a.
please email me on your idea
thex.

Posted by ojulla julius on March 15, 2010 at 06:40 AM CST #

Post a Comment:
Comments are closed for this entry.
About

williamxue

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today