OpenSolaris Beats Linux on memcached Sun Fire X2270

OpenSolaris provided 25% better performance on memcached than Linux on the Sun Fire X2270 server. memcached 1.3.2 using OpenSolaris gave a maximum throughput of 352K ops/sec compared to the same server running RHEL5 (with kernel 2.6.29) which produced a result of 281K ops/sec.

memcached is the de-facto distributed caching server used to scale many web2.0 sites today. With the requirement to support a very large number of users as sites grow, memcached aids scalability by effectively cutting down on MySQL traffic and improving response times.

  • memcached is a very light-weight server but is known not to scale beyond 4-6 threads. Some scalability improvements have gone into the 1.3 release (still in beta).
  • As customers move to the newer, more powerful Intel Nehalem based systems, it is important that they have the ability to use these systems efficiently using appropriate software and hardware components.

Performance Landscape

memcached performance results: ops/sec (bigger is better)

System C/C/T Processors Memory Operating System Performance
Ops/Sec
GHz Type
Sun Fire X2270 2/8/16 2.93 Intel X5570 QC 48GB OpenSolaris 2009.06 352K
Sun Fire X2270 2/8/16 2.93 Intel X5570 QC 48GB RedHat Enterprise Linux 5 (kernel 2.6.29) 281K

C/C/T: Chips, Cores, Threads

Results and Configuration Summary

Sun's results used the following hardware and software components.

Hardware:

    Sun Fire X2270
    2 x Intel X5570 QC 2.93 GHz
    48GB of memory
    10GbE Intel Oplin card

Software:

    OpenSolaris 2009.06
    Linux RedHat 5 (on kernel 2.6.29)

Benchmark Description

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. The memcached benchmark was based on Apache Olio - a web2.0 workload.

The benchmark initially populates the server cache with objects of different sizes to simulate the types of data that real sites typically store in memcached :

  • small objects (4-100 bytes) to represent locks and query results
  • medium objects (1-2 KBytes) to represent thumbnails, database rows, resultsets
  • large objects (5-20 KBytes) to represent whole or partially generated pages

The benchmark then runs a mixture of operations (90% gets, 10% sets) and measures the throughput and response times when the system reaches steady-state. The workload is implemented using Faban, an open-source benchmark development framework. It not only speeds benchmark development, but the Faban harness is a great way to queue, monitor and archive runs for analysis.

Key Points and Best Practices

OpenSolaris Tuning

The following /etc/system settings were used to set the number of MSIX:

  • set ddi_msix_alloc_limit=4
  • set pcplusmp:apic_intr_policy=1

For the ixgbe interface, 4 transmit and 4 receive rings gave the best performance :

  • tx_queue_number=4, rx_queue_number=4

The crossbow threads were bound:

dladm set-linkprop -p cpus=12,13,14,15 ixgbe0

Linux Tuning

Linux was more complicated to tune, the following Linux tunables were changed to try and get the best performance:

  • net.ipv4.tcp_timestamps = 0
  • net.core.wmem_default = 67108864
  • net.core.wmem_max = 67108864
  • net.core.optmem_max = 67108864
  • net.ipv4.tcp_dsack = 0
  • net.ipv4.tcp_sack = 0
  • net.ipv4.tcp_window_scaling = 0
  • net.core.netdev_max_backlog = 300000
  • net.ipv4.tcp_max_syn_backlog = 200000

Here are the ixgbe specific settings that were used (2 transmit, 2 receive rings):

  • RSS=2,2 InterruptThrottleRate=1600,1600

Linux Issues

The 10GbE Intel Oplin card on Linux resulted in the following driver and kernel re-builds.

  • With the default ixgbe driver from the RedHat distribution (version 1.3.30-k2 on kernel 2.6.18)), the interface simply hung during the benchmark test.
  • This led to downloading the driver from the Intel site (1.3.56.11-2-NAPI) and re-compiling it. This version does work and we got a maximum throughput of 232K operations/sec on the same linux kernel (2.6.18). However, this version of the kernel does not have support for multiple TX rings.
  • The kernel version 2.6.29 includes support for multiple TX rings but still doesn't have the ixgbe driver which is 1.3.56.11-2-NAPI. So we downloaded, built and installed these versions of the kernel and driver. This worked well giving a maximum throughput of 281K with some tuning.

See Also

Disclosure Statement

Sun Fire X2270 server with OpenSolaris 352K ops/sec. Sun Fire X2270 server with RedHat Linux 281K ops/sec. For memcached information, visit http://www.danga.com/memcached. Results as of June 8, 2009.

Comments:

Post a Comment:
Comments are closed for this entry.
About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today