Disk relocation in SVM

open_solaris_blog

Disk relocation in SVM (Solaris Volume Manager)

Blogging probably is the best place that I can share my experience with SVM and what I know about SVM internals. My name is Steve Peng and I have been with Sun since 1994 and came from BigBlue when they downsized the AIX development division. During my 11 years at Sun, I spend most of that time working on the Solaris Volume Manager (was called Solstice Disksuite before its integration into Solaris in Solaris 9) and being a key developer on most of cool projects such as 64 bit Solaris support, disk relocation support, 64 bit SVM and import of named disksets. In this debut blogging, I like to talk a bit about the disk relocation support.

So what happens if user uses old SDS (Solstice Disksuite) releases and moves disks around such as recabling the disks? The best thing he/she can do is to pray for the devts and ctds names to remain the same and if that is not the case then they are dead. Why? In the old SDS releases, it is the driver name and minor number along with the device name which is stored in the private configuration database and used to bring up its configuration during the system reboot. When disk is moved around and has a different devt and name as a result of movement then SDS simply can not locate the disk and will fail to bring up any existing configuration. This lack of ability to relocate the disks can result in a catastrophic situation if a wrong disk is located and configured.

When SDS is integrated into Solaris 9, the disk relocation support is put in by storing the unique disk device id such as WWN into the private databse. Now when a configuration is booted up, those stored unique device ids are used to locate the disks instead of using the stored devt/name tuples. This cool feature actually boosts the flexibility of SVM and makes the upgrade story even greater. If you know the upgrade story on the old SDS releases then you probably know what I meant when I say 'great'.

So one may wonder how the devt and ctds name of a disk device can change when the disk is moved around. When a disk is moved from one controller to another, the device instance number can change and since the disk now is attached to the new controller the device name will also change. One thing will not change is the disk unique device id such as WWN. So, exactly how SVM addresses this disk relocation issue? Let's use the simple stripe as example to see how SVM attack this issue internally. Says user creates a simple stripe d1 on top of /dev/dsk/c1t2d0s0 and when d1 is created the following database records will be created to store its configuration. Dump of the database shows the following configuration information:

RecId 0x00000003: Type:NM [0002] Type2: 1 Size = 512
sizeof(struct nm_rec)=52
sizeof(struct nm_rec_hdr)=24
sizeof(struct nm_name)=28
r_rec_hdr->r_alloc_size=512
r_rec_hdr->r_used_size=60
r_rec_hdr->r_next_recid=0x00000000
r_rec_hdr->r_next_key=0
n_side=0
n_key=0x00000001 <--- matched key
n_count=1
n_minor=0x00000010
n_drv_key=0x000003ef
n_dir_key=0x000003f0
n_namlen=9
n_name="c1t2d0s0"


RecId 0x00000007: Type:MD_STRIPE [1005] Type2: 0 Size = 208
==== ms_unit
==== mdc_unit
un_revision: 0 (MD_32BIT_META_DEV)
un_type: 1 (MD_DEVICE)
un_status: 0x00000000
un_self_id: 1(0x00000001)
un_record_id: 0x00000007
un_size: 208
un_flag: 0000
un_total_blocks: 35358848
un_actual_tb: 35358848
un_nhead: 19
un_nsect: 248
un_rpm: 7200
un_wr_reinstruct: 119
un_rd_reinstruct: 0
un_vtoc_id: 0x00000000
un_capabilities: 0x0000000b
un_parent: 0xffffffff (MD_NO_PARENT)
un_user_flags: 0x00000000
--
un_hsp_id: -1
un_nrows: 1
un_ocomp: 136
row: 0
un_icomp: 0
un_ncomp: 1
un_blocks: 35358848
un_cum_blocks: 35358848
un_interlace: 32
comp: 0
un_key: 0x1 <--- used to search for the match
un_dev: 0x2000000010
un_start_block: 0
un_mirror:
ms_flags: 0x0
ms_state: 1 (CS_OKAY)
ms_lasterrcnt: 0
ms_orig_dev: 0x0
ms_orig_blk: 0
ms_hs_key: 0x00000000
ms_hs_id: 0
ms_timestamp: Wed Jun 8 15:03:53 2005

As you can see that RecId 7 is used to describe the stripe d1 that jsut created. You can see that stripe d1 has only one row and one component and the un_key is used to locate the underlying device. In this case the '1' is used to locate its component by scanning the "NM" record for an entry matches the value of 1. In this case, it is c1t2d0s0. When an entry is located, its stored minor number will then be used to construct the devt for the component and the devt then will be used to access the device.

The above information is sufficient enough to bring up stripe d1 as long as minor number is not changed. As mentioned above, the minor number can change whenever the underlying disk is moved around. So you can see that some kind of persistent information needs to be stored to resolve the disk relocation problem. The approach that taken by SVM is to use the disk's unique device id (WWN). Whenever a SVM device is created, the device ids of all the underlying disk components are stored in its database also. When a metadevice is snarf'd, the stored device id is used instead of traditional devt/name to locate all the component devices and this gurantee the snarf operation.

The stored unique device id will have information looks like this:

RecId 0x00000006: Type:DID_SHR_NM [0008] Type2: 1 Size = 1024
sizeof(struct devid_shr_rec)=40
sizeof(struct nm_rec_hdr)=24
sizeof(struct did_shr_name)=16
did_rec_hdr->r_alloc_size=1024
did_rec_hdr->r_used_size=96
did_rec_hdr->r_next_recid=0x00000000
did_rec_hdr->r_next_key=0
did_key=0x00000001(1)
did_count=1
did_data=0x00000001
did_size=56
did_devid=696400010002002c73640000534541474154452053543331383430344c53554e31384720334254324c50433430303030323230314141364b
ascii did_devid=id1,sd@SSEAGATE_ST318404LSUN18G_3BT2LPC400002201AA6K



Technorati Tag:
Technorati Tag:
Comments:

Congrats on joining the blogosphere! (and who says nobody ever reads these things? :)

Posted by Dave Linder on June 14, 2005 at 03:03 AM PDT #

So, I have a question/something to ponder: what do you do when the WWN on a device changes? I know what I do--I get to run "metareplace -e ..." everytime the machine reboots and hope that the other submirror doesn't fail in the meantime. :-) This is on a non-Sun drive that we have in one of our PC servers, and it is also a refurbished drive. Here's what I see from metastat across reboots: /var/tmp/metastat.1:c1t5d0 Yes id1,sd@f0d50cf9542b5d335000de5a00000 /var/tmp/metastat.2:c1t5d0 Yes id1,sd@f0d50cf9542b652ac0003f8d70000 /var/tmp/metastat.3:c1t5d0 Yes id1,sd@f0d50cf9542cd5ce3000ddb700000

Posted by DJ Gregor on July 16, 2005 at 04:49 AM PDT #

Dont you blog anymore?

Posted by Delta on December 01, 2005 at 02:52 PM PST #

So if I understand what you are saying using SVM in Solaris 9 or later I can move disks off of and onto other controllers and SVM will update appropriately? Let me give you an example: ON EMC sometimes EMC frames are replaced. In this scenario using Veritas I would deport the disk groups, then emc would copy the data over to the new frame(The cxtxdxsx will change). Once the data migration is complete I would them inport the diskgroups using the -C flag and then all of my data is on the new disk, no fuss or mess. Is this a manual step in SVM or does SVM handle it automatically. I have never tried it with SVM so I am not sure how it works or if it would work. Thanks, Bob

Posted by guest on December 08, 2005 at 09:46 PM PST #

To answer the first question asked - What happens if WWN changes ? If WWN changes but the devt is the same (i.e. disks have not moved) then on boot up, metadevadm runs and detects this. It provides directions on how to update the metadb. Are you saying that metadevadm did not detect this ? To answer the questions by Bob: If you move controllers or disks, SVM will automatically detect this and update the database so that the device names are now consistent with the change that has occurrred. However I am not sure if I understand what effect you are getting by deporting and importing the diskgroup.

Posted by sanjay on April 18, 2006 at 06:55 AM PDT #

To follow up with Bob's question, the ultimate goal is to swing disks from one host to another host and disks are just plain SCSI, no WWP. What should we do? VXVM could deport on the old host, enable them on the new host and just import. It's a very usful feature. HPUX's LVM also has this feature. Thanks. Wei

Posted by Wei Gao on March 23, 2007 at 03:23 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

stevep

Search

Recent Posts
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today