« April 2009 | Main | June 2009 »

May 2009 Archives

May 9, 2009

Migrating to Exadata / HP Oracle DB Machine

Intro
Presuming you have to migrate your data to Exadata and you already did your capacity planning.

Then, probably, your next step is plan how to actually move your existing database to Exadata / HP Oracle DB-Machine (from now on called Exadata in this blog).

For this scenario, there are some small things to take care of.

My personal experience is that, preparing such a migration is all about knowing Oracle's MAA strategy. With that knowledge you know what options you have and don't have.

Off-line or On-line
First of al one has to decide whether the migration is 'off-line' or on-line.
'On-line' in this case is a matter of a couple of minutes downtime, you always have to switch from source database 'A' to target database 'B'.

Consider this time, for example, as the time it takes to switch over to a standby database.

'Off-line' is the most comfy solution, but only of course when the off-line window is large enough to migrate your data. In case of migrations to Exadata the data bulk you have to migrate will probably be 'large', so consider yourself lucky if you have a 'off-line' option and a window large enough to do the action.

For Exadata it is recommend to have an extent size that is a multiple of 4M. The ASM AU size should be 4M also. This is to make sure that at database level (for the extents) and at ASM level (for the AU's) at least 4M of contiguous chunks will be read. This is required to make Exadata perform best.

So, if you want to stick to this recommendation, you have to check the current size of your AU's and extents.
When you find out the size is not the recommend size then choice number two pops-up: will you do a 'logical' migration and stick to the recommendation or not and do a 'physical' migration. Of course, when your source system 'A' is not on ASM there isn't much AU's to check.

For the extents sizes the query should look like this:

select segment_name,
from dba_exents
where bytes < (4*1024*1024)
and owner = 'the name of the schema you are migrating';

If the extents already have the recommended size, you of course still can choose for a logical migration, but my personal feeling would be to do a physical migration in that case.

Physical migration
Physical migration to me means 'transferring the datafiles from platform A to platform B'. This situation leads us to new choices, like the choice of how to transfer the datafiles from A to B. This depends because platform A en B can both have a different architecture.

As we know Exadata/DB-Machine is based on Linux on Intel X86-64 and the source can be RISC (Sun Sparc, IBM P-series) for example (big endian). (where Intel and Itanium is little endian)

Fortunately Oracle has some good answers for this kind of questions:
- transportable tablespaces
- DBMS_STREAMS_TABLESPACE_ADM.PULL_TABLESPACES

To a certain level physical migrations can be done 'on-line' (or at least with as less downtime as possible). Think of dataguard for example.

As a last step; which solution you choose also depends on the infrastructure you are working in. Question you may ask yourself are:
- Do I have staging space on the source system
- Do I have network attached storage (NAS) available that I can use.
- Can this NAS be connected to source and target ?
- Are platform A en B actually on the same network anyway and
- If on the same network: can I safely use that network to tranfer my terabytes of data (not hurting the performance of other systems)

These are all realistic questions you have to deal with when you are talking about migrations.

Logical migration
You have to investigate into available solutions for logical migrations as well.
Before I continue I should first explain what my definition of logical migration is.

To me logical migration is 'exporting' the data from the source and 'importing' that into the target. Export can be any kind of tool such as: datapump, exp or even SQL*Loader or CTAS over a db-link. A 'logical' export can even be done 'on-line' if you would consider solutions like for example logical Standby database (same endian) or Oracle Streams (endian independent). 'Logical' migrations often need 'staging' (a location where you temporary dump you data). Note that datapump exports/imports can be done over the network without dumping anything to file.

As said, if you set yourself the target to change the extent sizes to the recommended values, you automatically end up in a logical migration if you not already have extents of this size.

There may be even more things to consider. For example.
- you need to build more then one replica of the source database on Exadata.
- you want to put as less load on the source system as possible because it's a highly critical production system.

When dealing with these kind of questions always know that Exadata can do the job quicker, because it so powerful.

Say for instance: you may realize exporting a 4TB database with datapump will cause to much load on the prodution system. But you still want to do a logical migration. For these kind of questions my answer would be to just transfer the datafiles to Exadata (and endian convert them if needed) and perform the export (expdp) and import (impdp) there. You probably don't even need to transfer the files, perhaps you can retrieve them from backup.

Exporting data with datapump is something you prefer to do in a read consistent way. For that think of the expdp arguments:
- flashback_time
- undo_rentention
And undo guarentee on tablespace level.

Note 1. Note that exporting lobs in a read consistent way can not be accomplished only by tuning the undo retention of the database. For lobs undo is a property of the column. In order to make sure the retention of these objects is set well you may have to alter your table first. For more information on this see "The application developers guide - Large objects"

Note 2. transferring or exporting indexes to Exadata seems rather useless to me. First, you have to consider yourself if the index is still needed in the first place. And if so, why not just recreate it on the target, since Exadata has probably got the muscles for it to recreate it in a small fraction of the time it took on your source database !

Summary
Migrations to Exadata are not more complicated then normal migrations. When dealing with large amount of data one should always think of a strategy before htting the keybord. As production environments have their limitations it proofed to be helpful to workout alternative scenarios as well. With expdp, transportable tablespaces, transportable databases, Streams, physical, logical standby databases, Oracle has all thinkable options available to migrate to this wonderful piece of database !

Helpful documents:
- Technical White Paper: Best Practices for Migrating to HP Oracle Exadata Storage Server
- Oracle Database High Availability Documentation - Features and Best Practices

Rene Kundersma
Oracle Expert Services, The Netherlands

dbmachine01.png

May 15, 2009

Jumbo Frames for RAC Interconnect

At the moment for a customer I am investigating whether it's a good idea to use "Jumbo Frames" within a RAC environment. In the next couple of days I will post some results here.

Until that time, I like to share some thoughts.

First: why do you possibly want "Jumbo Frames" ?

Answer: Jumbo Frames (in my Oracle world) are Ethernet frames that do not have a conventional payload of 1500 bytes, but 9000 bytes.

As you can imagine the standard 1500 bytes is too small for our 'regular' Oracle database block of 8k. When you make sure the frames are 9000 byte, a block should fit in and this way eliminate the overhead.

Eliminating the overhead would potentially improve performance and decrease CPU usage. This sound like a good theory and very applicable to use for the RAC interconnect where we do sent 8k (UDP) blocks from instance A to instance B.

Investigation tells us Jumbo Frames are not (yet) an IEEE standard. This means, you could (I am not saying you will) have problems between the devices that supposed to be configured to handle frames of this size.

The OS, network card as well as the switch all need to 'talk' the same size of "Jumbo Frames". Note that some vendors may call 4000 bytes Jumbo, some call 9000 bytes Jumbo.

Theory tells us properly configured Jumbo Frames can eliminate 10% of overhead on UDP traffic.

So how to test ?

I guess an 'end to end' test would be best way. So my first test is a 30 minute Swingbench run against a two node RAC, not too much stress in the begin.

The MTU configuration of the network bond (and the slave nics will be 1500 initially).

After the test, collect the results on the total transactions, the average transactions per second, the maximum transaction rate (results.xml), interconnect traffic (awr) and cpu usage. Then, do exactly the same, but now with an MTU of 9000 bytes. For this we need to make sure the switch settings are also modified to use an MTU of 9000.

B.t.w.: yes, it's possible to measure network only, but real-life end-to-end testing with a real Oracle application talking to RAC feels like the best approach to see what the impact is on for example the avg. transactions per second.

In order to make the test as reliable as possible some remarks:
- use guaranteed snapshots to flashback the database to its original state.
- stop/start the database (clean the cache)

B.t.w: before starting the test with an MTU of 9000 bytes the correct setting had to be proofed.

One way to do this is using ping with a packet size (-s) of 8972 and prohibiting fragmentation (-M do).
One could send Jumbo Frames and see if they can be sent without fragmentation.

[root@node01 rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
8980 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.914 ms

As you can see this is not a problem. While for packages larger then 9000 bytes, this is a problem:

[root@node01 rk]# ping -s 8973 -M do node02-ic -c 5
--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING node02-ic. (192.168.23.32) 8973(9001) bytes of data.
From node02-ic. (192.168.23.52) icmp_seq=0 Frag needed and DF set (mtu = 9000)

Bringing back the MTU size to 1500 should also prohibit sending of fragmented 9000 packages:

[root@node01 rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
--- node02-ic. ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

Bringing back the MTU size to 1500 and sending 'normal' packages should work again:

[root@node01 rk]# ping node02-ic -M do -c 5
PING node02-ic. (192.168.23.32) 56(84) bytes of data.
64 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.174 ms

--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.174/0.186/0.198/0.008 ms, pipe 2

An other way to verify the correct usage of the MTU size is the command 'netstat -a -i -n' (the column MTU size should be 9000 when you are performing tests on Jumbo Frames):

Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 10371535 0 0 0 15338093 0 0 0 BMmRU
bond0:1 1500 0 - no statistics available - BMmRU
bond1 9000 0 83383378 0 0 0 89645149 0 0 0 BMmRU
eth0 9000 0 36 0 0 0 88805888 0 0 0 BMsRU
eth1 1500 0 8036210 0 0 0 14235498 0 0 0 BMsRU
eth2 9000 0 83383342 0 0 0 839261 0 0 0 BMsRU
eth3 1500 0 2335325 0 0 0 1102595 0 0 0 BMsRU
eth4 1500 0 252075239 0 0 0 252020454 0 0 0 BMRU
eth5 1500 0 0 0 0 0 0 0 0 0 BM

As you can see my interconnect in on bond1 (build on eth0 and eth2). All 9000 bytes.

Not finished yet, no conclusions yet, but here is my first result.
You will notice the results are not that significantly.

MTU 1500:
TotalFailedTransactions : 0
AverageTransactionsPerSecond : 1364
MaximumTransactionRate : 107767
TotalCompletedTransactions : 4910834

MTU 9000:
TotalFailedTransactions : 1
AverageTransactionsPerSecond : 1336
MaximumTransactionRate : 109775
TotalCompletedTransactions : 4812122

In a chart this will look like this:
udp_traf01.png

As you can see, the number of transactions between the two tests isn't really that significant, but the UDP traffic is less ! Still, I expected more from this test, so I have to put more stress to the test.

I noticed the failed transaction, and found "ORA-12155 TNS-received bad datatype in NSWMARKER packet". I did verify this and I am sure this is not related to the MTU size. This is because I only changed the MTU size for the interconnect and there is no TNS traffic on that network.

As said, I will now continue with tests that have much more stress on the systems:
- number of users changed from 80 to 150 per database
- number of databases changed from 1 to 2
- more network traffic:
- rebuild the Swingbench indexes without the 'REVERSE' option
- altered the sequences and lowered increment by value to 1 and cache size to 3. (in stead of 800)
- full table scans all the time on each instance
- run longer (4 hours in stead of half an hour)

Now, what you see is already improving. For the 4 hour test, the amount of extra UDP packets sent with an MTU size of 1500 compared to an MTU size of 9000 is about 2.5 to 3 million, see this chart:

udptraf02.png

Imagine yourself what an impact this has. Each package you not send save you the network-overhead of the package itself and a lot of CPU cycles that you don't need to spend.

The load average of the Linux box also decreases from an avg of 16 to 14.
load_avg01.png

In terms of completed transactions on different MTU sizes within the same timeframe, the chart looks like this:

trans01.png

To conclude this test two very high load runs are performed. Again, one with an MTU of 1500 and one with an MTU of 9000.

In the charts below you will see less CPU consumption when using 9000 bytes for MTU.

Also less packets are sent, although I think that number is not that significant compared to the total number of packets sent.

cpu_load_01.png

packets_01.png

My final thoughts on this test:

1. you will hardly notice the benefits of using Jumbo on a system with no stress
2. you will notice the benefits of Jumbo using Frames on a stressed system and such a system will then use less CPU and will have less network overhead.

This means Jumbo Frames help you scaling out better then regular frames.

Depending on the interconnect usage of your applications the results may vary of course. With interconnect traffic intensive applications you will see the benefits earlier then with application that have relatively less interconnect activity.

I would use Jumbo Frames to scale better, since it saves CPU and reduces network traffic and this way leaves space for growth.

Rene Kundersma
Oracle Expert Services, The Netherlands

About May 2009

This page contains all entries posted to Oracle XPS The Netherlands On HA in May 2009. They are listed from oldest to newest.

April 2009 is the previous archive.

June 2009 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle