C++ STL: still paying for what I don't use.

One of the great tenets of C++ has always been: "you don't pay for what you don't use". Bjorn seems to use this very line everytime he bashes Java or LISP-like languages. The later is especially noteworthy since he is usually pretty honest in acknowledging that functional languages do posses a number of important qualities such as having an expressive power of writing qsort as a trivial two liner
 qsort []     = []
 qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
at the expense of wasting quite a lot of precious resources such as RAM and CPU cycles.

Now, from time to time I see folks showing examples of C++ code bordering on the same expressive power. Take this little word counter ($ wc -w) for example:
int main() {
   std::cout << std::distance(std::istream_iterator(std::cin),
                              std::istream_iterator());
   return 0;
}
It looks impressive if nothing else, and since it is, after all, C++ everybody expects it to perform quite well. But does it?

To answer this question without dragging the reader into the dark realms of assembly language or black art of performance measurements I would really love to have a good old Cfront around. Or any other tool for that matter that would be able to retrace what exactly all the templates and overloaded functions got expanded into. Alas, I don't know of any such tool (if you do -- please leave a comment!). So bear with me while I'll be using my stop watch ;-)

For the speed trial lets compare it to the similar code written in C (and to make it fair I am going to even use scanf instead of a handcrafted code):
int main()
{
    int count;
    char buf[65535];
    for (count = 0; scanf("%s", buf) != EOF; count++)
         ;
    return printf("count %d\\n", count);
}
Not that I am surprised, but C++ version ended up being 1.5 slower on my machine. And if you compile the above example into .s file and look at what main() turned out to be you can see a reason why. There is about 6 function calls there. Pretty much nothing got inlined or computed in-place.

Bad implementation of a fine idea? Perhaps (I tried two: G++ and Sun Studio). But it makes one wonder why in 28 years the world hasn't yet seen a good implementation. It's not that the industry hasn't tried, you know.

Am I hearing the ghostly murmur of Algol 68 or is it just my imagination?
Comments:

After 28 years, you ought to move on. What does it matter to non-C++ programmers what Bjorn has to say about languages other than C++?

Posted by guest on June 16, 2007 at 06:19 PM PDT #

I hear you. But move on to what? I'm really comfortable with C, but I have a nagging feeling that there might be something beyond it as well.

Posted by Roman Shaposhnik on June 19, 2007 at 04:18 AM PDT #

I'm very tempted to have a line that is 65536 characters long.

Posted by Sohail on July 06, 2007 at 05:00 PM PDT #

To: Sohail
That's ok you can have a string that is 65536 characters long. What you can not have is a word that is 65536 characters long (and that would be quite a reasonable assumption if you ask me). But your comment brings me to another problem with C++ You see, with C++ code you can not really tell whether something like words longer than X characters are allowed or now. And I bet you wouldn't be able to figure it out by reading C++ standard. At least not quickly enough. With C -- it is all out there for you.

Posted by Roman Shaposhnik on July 07, 2007 at 05:04 AM PDT #

I was thinking more along the lines of typical C code with malicious input = the world as it is today. Its ok to expect your word to not be 65536 characters long, but if I want to break your code, I can make one of those :) I like the simplicity of C, but one thing I cannot fathom is why it is normal not to use the "n" equivalent of the bugs-gonna-happen functions (sprintf vs snprintf). Also not using scanf with field width. That and fixed-size buffers. That kills me. Disclaimer: I write C++ code for a living with a little bit of C thrown in for compiler friendly library interfaces.

Posted by Sohail on July 07, 2007 at 05:15 AM PDT #

To: Sohail
I disagree. It is not that difficult to write secure code in C. It is more a question of practice I guess. Curiously enough modern C++ (think STL and template metaprogramming in general) actually makes it more difficult to analyze the source code (you still haven't answered my question about what kind of restrictions the C++ version might have ;-)). In fact Bjorn acknowledges this very problem in his latest article on past, present and future of C++: the language complexity makes it quite difficult to write comprehensive tools analyzing code for you (he doesn't acknowledge a human side of the problem though, which too me is even more ominous).
I guess I don't buy security argument (buffers and 'n' functions) these days for one simple reason: the majority of the most critical pieces of software such as OS kernels and security/crypto libraries are written in C.
Now, one point I agree with you on is that the classical C library seems to be showing its age in the 21st century. I don't perceive it as too much of a problem though -- you can always stick with basics for control I/O and write your own wrappers. After all, even with C++ everybody seems to write their own wrappers anyway.
Disclaimer: I started at Sun ~10 years ago as an engineer developing a C++ front end (now known as Sun Studio). Now I do have a sausage problem with C++ ;-)

Posted by Roman Shaposhnik on July 07, 2007 at 05:48 AM PDT #

I didn't mean to say its not possible to write secure code in C, just that the average C programmer doesn't know how.
As far as the restrictions in the C++ sample go, I agree with your assertion that its difficult to be sure. However, you can be sure that you are limited by std::numeric_limits<ptrdiff_t>::max() :-) I also agree with you about the complexity of the language being a problem.
Now as far as the crypto/kernels being written in C: by the very nature of the project, you are NOT going to put an average programmer on the job. So you don't get silly things like buffer overruns and everything else no matter what language you write it in.
To sum up my pov: I am focused on making things correct first and fast second. Thirdly, it is usually the design that makes a thing fast or slow. So in my mind, if you have a bottleneck due to a misunderstanding of how C++ works, its not a big deal and you can fix it.
10 years eh, thats a long time!

Posted by Sohail on July 07, 2007 at 06:35 AM PDT #

To: Sohail
Well put. Personally, I wish C++ stopped at being C with classes plus some syntax sugaring. But! They decided to swing for two languages in one (for which they finally are supposed to pay the full price with concepts). I am, however, very curious to find out whether given your set of priorities (making things correct first and fast second) C++ is really an ideal choice. Wouldn't Java or something like Haskell (in case you're into functional languages) be a better fit? If not, why?

Thanks,
Roman.

Posted by Roman Shaposhnik on July 07, 2007 at 07:20 AM PDT #

They are both still requirements that need to be satisfied most of the time. It is pretty useless to have a fast program that doesn't work! So, if the program works, and its fast enough, I'm pretty much done. A lot of times the choice usually comes down to Python or C++, but I find that these are complementary choices. I may write computationally intensive code in C++, but then I'll export it to Python as an extension and write a bunch of tests there. In an ideal world, I would use Common Lisp, but thats another conversation!

Posted by Sohail on July 07, 2007 at 05:00 PM PDT #

hello,
a little note, to get the best performance from c++ i/o system your program
must call sync_with_stdio() before perform any i/o operation.
on my system the version with sync_with_stdio(false) is more fast that the
version with sync_with_stdio(true) (the default behavior).
obviously the c version is fasten too ;-)

#include <ios>
#include <iterator>
#include <string>
#include <fstream>
#include <iostream>

int main(int argc, char \*\*argv) {
std::ios_base::sync_with_stdio(false);

std::cout << std::distance(
std::istream_iterator<std::string>(argc > 1 ? std::ifstream(argv[1]) : std::cin),
std::istream_iterator<std::string>()
) << std::endl;

return 0;
}

Posted by lorenzo on October 23, 2007 at 08:11 PM PDT #

Sohail, you will also get memory leaks with that code, run that baby through valgrind

Posted by jn on November 20, 2008 at 12:06 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

rvs

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today