Maps in the STL

I was looking at some code with a colleague and we observed a bunch of time in some code which used the std::map to set up mappings between strings. The source code looked rather like the following:

#include <map>
#include <string>

using namespace std;

int func(map<string,string>&mymap, string &s1, string &s2)
{
  mymap.insert(pair<string,string>(s1,s2));
  return 0;
}

When compiled with CC -O -c -library=stlport4 map.cc this expands to a horrendous set of calls, here's the first few:

$ er_src -dis func map.o|grep call 
        [?]     188d:  call    std::basic_string...::basic_string
        [?]     189f:  call    std::basic_string...::basic_string
        [?]     18b2:  call    std::basic_string...::basic_string
        [?]     18c2:  call    std::basic_string...::basic_string
        [?]     18d8:  call    std::_Rb_tree...::insert_unique
        [?]     18f8:  call    std::__node_alloc...::_M_deallocate
        [?]     190c:  call std::_STLP_alloc_proxy...::~_STLP_alloc_proxy
                ...

What's happening is that the act of making a pair object is causing copies to be made of the two strings that are passed into the pair constructor. Then the pair object is passed into the insert method of std::map and this results in two more copies of the strings being made. There's a bunch of other stuff going on, and the resulting code is a mess.

There's an alternative way of assigning the mapping:

#include <map>
#include <string>

using namespace std;

int func(map<string,string>&mymap, string &s1, string &s2)
{
  mymap[s1]=s2;
  return 0;
}

When compiled the resulting code looks a lot neater:

$ er_src -dis func map.o|grep call
        [?]     28e6:  call    std::map...::operator[]
        [?]     2903:  call    std::basic_string...::_M_assign_dispatch

Of course a neater chunk of code is nice, but the question is whether the code for ::operator[] contains the same ugly mess. Rather than disassembling to find out, it's simpler to time the two versions and see which does better. A simple test harness looks like:

int main()
{
  map<string,string>mymap;
  string s1,s2;
  long long i;
  s1="123456789";
  s2="987654321";
  for (i=0; i<100000000; i++)
  {
    func(mymap,s1,s2);
  }
} 

It's a less than ideal harness since it uses constant strings, and one version of the code might end up bailing early because of this. The performance of the two codes is quite surprising:

real           6.79
user           6.77
sys            0.00


real        1:03.53
user        1:03.26
sys            0.01 

So the version that creates the pair object is about 10x slower!

Comments:

You missed some &lt; and &gt; (update, you still have a < instead of >).

In the first version, replacing pair<string,string> by map::value_type (ie pair<const string,string)) may help a bit. In the second one, if you compile with -xO5, it will inline operator[] so you can see better what string operations are done. Now the benchmark is extremely specific, as you only ever insert the same element again and again (the map never grows).

By the way, most of the time seems to be spent in mutex, as mentionned in one of your previous blog entries :-(

Posted by Marc on August 28, 2009 at 05:59 AM PDT #

Thanks.

Yes, the fact that the time disappears in mutex locks is _very_frustrating.

Posted by Darryl Gove on August 28, 2009 at 06:10 AM PDT #

The point about the map never growing is very telling; the second version of the code can take advantage of cache locality of the reference being returned by operator[], and of course the string assignment operator might be optimized to skip the copy if the string values are equal (presumably faster to compute than copying the string. The first version on the other hand copies the strings first, always, before having the opportunity to check. It'd be nice to see results from a set of random strings.

Posted by Christopher Currie on August 31, 2009 at 07:45 AM PDT #

@Christopher

The modified code looks like:

int main()
{
map<string,string>mymap;
string s1,s2;
long long i;
for (i=0; i<100000000; i++)
{
s1=i;
s2=100000000-i;
func(mymap,s1,s2);
}
}

The timing for the original code on the system I've just run on was:
operator[]
real 18s
mymap.insert(pair...) 74s
About 5x.

The modified code takes:
operator[] 38s
mymap.insert(pair...) 97s
About 2.5x.

Note that this is adding about 20s of over head, and the overhead is more obvious for the shorter running code.

Darryl.

Posted by Darryl Gove on August 31, 2009 at 08:18 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs