Powersaving with SDM1.0u5

 To use the power saving feature of Sun Grid Engine you have to install the Service Domain Manager (SDM). Please watch the video of Marcin and Andre that explains the basic steps to get a SDM system running.

SDM can boot up and shutdown host automatically if the host has an service controller that support IPMI 2.0. The Lan interface of the service controller must be enabled an reachable from the master host of the SDM system.

SDM uses the external tool ipmitool to communicate with the service controller. Please install ipmitool via the package manager of your systm. On OpenSolaris use:

# pkg install SUNWipmi

The power saving feature in SDM is implemented in the IPMI cloud adapter. To leverage from this feature a new IPMI cloud service must installed. This can be accomplished with the sdmadm command:

% sdmadm add_clould_service -s powersave -j cs_vm -h master_host -ct ipmi -start

The -l switch specifies the name of the jvm where the IPMI cloud service should run, -h switch defines the host. -ct defines the type of the cloud service (SDM 1.0u5 supports ec2 and ipmi). -s defines the name of the service. The -start switch defines that the component shoud be started by the command.

The add_could_service command opens the default configuration of the IPMI cloud service in an editor (vi). The default configuration should mostly fit to your environment. You must defines only the path to the file that contains the password for the IPMI access:

<common:componentConfig ...>
  ...
  <gef:param value="@@@path to ipmi password file@@@" name="ipmiPasswordFile"/>
  ...
 </common:componentConfig>

 Open a new terminal and create the password file. Ensure that only authorized users have access to this file.

# touch /var/spool/sdm/.ipmipw
# chmod 700 /var/spool/sdm/.ipmipw
# echo "secret" > /var/spool/sdm/.ipmipw  

Replace the place holder @@@path to ipmi password file@@@ in the configuration of the cloud adapter by the path to the IPMI password file.

 Eventually you have to adjust the path to the ipmitool (default is /usr/bin/ipmitool). If you have it in a different path please add the parameter ipmiTool to the configuration:

<common:componentConfig ...>
  ...

<gef:param value="/var/spool/sdm/.ipmipw" name="ipmiPasswordFile"/>

  <gef:param value="/usr/sbin/ipmitool" name="ipmiTool"/>

  ...

</common:componentConfig>

Save your changes and close the editor. The power saving cloud service will be added to your system and finally started.

In a next step add all your host that should participate on power saving as resources to the power saving cloud service. Please note that the host must be powered off before adding it. For each host define the following resource properties:

unbound_name hostname of the host
 ipmiHostname  hostname of the service controller.

The command adds the resource for host foo to the power saving cloud service. The hostname where the IPMI service controller is reachable is ipmi-foo:

% printf "unbound_name=foo\\nipmiHostname=ipmi-foo\\n" | sdmadm ar -s powersaving -f -
resource message
-----------------------------------------------------------
foo      Resource was added to the system.

The resource has been added to the system. The sdmadm show_resource command displays it as unbound resource (U in the flags column):

% sdmadm sr -s powersave
service   id     name state    type flags usage annotation
--------------------------------------------------------------------------
powersave res#91 foo  ASSIGNED host U     2     New virtual resource added

For IPMI cloud service the U flag indicates the host is powered off. To power the host name move the resource into another service, e.g the spare_pool:

% sdmadm mvr -r foo -s spare_pool -static
resource message
-----------------------------------------------------------
res#126  Resource move triggered


Depending on your SLO setup it can happen that after the resource arrived at spare_pool the resource is automatically moved back to powersave service. To avoid this situation I added the -static option to the mvr command. This makes the resource static when arrived in spare_pool. SLOs can not request static resources.

After a while you will see that the resource ended up in spare_pool. The U flag disappeared. The host has been startup up:

% sdmadm sr -r foo
service    id      name   state    type flags usage annotation
--------------------------------------------------------------
spare_pool res#126 foo    ASSIGNED host S     1
% ping -s foo
PING foo: 56 data bytes
64 bytes from foo (192.168.4.12): icmp_seq=0. time=0.244 ms
 

To power off the host move it back into powersave service. Depending on your SLO setup it will happen automatically once you clear the static flag. I my system (which is a default system) spare_pool has a PermanentRequestSLO with urgency 1 and powersaving service has a PermanentRequestSLO with urgency2:

% sdmadm sslo
service    slo                 quantity urgency request
--------------------------------------------------------------------------------------------------
powersave  PermanentRequestSLO 10       2       type = "host" & owner = "powersave"
spare_pool PermanentRequestSLO 10       1       type = "host"

Resource foo will be automatically moved back to the powersave service once the static flag has been cleared.

% sdmadm mr -r foo
---- editor output -----
static = true    <-- change this line to static=false
unbound_name = foo
powerCycleCount = 1
owner = powersave
ipmiHostname = ipmi-foo
:x!
--- end of editor output ---  

SDM moves the resource to the powersave service. The service shuts down the host.

% sdmadm sr -r foo
service   id      name   state    type flags usage annotation
-------------------------------------------------------------------------
powersave res#126 foo ASSIGNED host U     2     Resource was shut down

That is it. For more information please have a look at http://wikis.sun.com/display/gridengine62u5/Service+Domain+Manager.   

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

rhierlmeier

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today