How SDM Manages To Share Resources Among Two Stand-alone Clusters

I have set up two SGE services for a stand-alone SGE cluster installation, called ge_blue and ge_red, respectively.
Here are the XML descriptions for these two SGE services:

ge_blue service:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:componentConfig xsi:type="ge_adapter:GEServiceConfig"
                        mapping="default"
                        xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                        xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
                        xmlns:security="http://hedeby.sunsource.net/hedeby-security"
                        xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
                        xmlns:common="http://hedeby.sunsource.net/hedeby-common"
                        xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter">
    <common:slos>
        <common:slo xsi:type="common:FixedUsageSLOConfig"
                    urgency="50"
                    name="fixed_usage"/>
        <common:slo xsi:type="ge_adapter:MaxPendingJobsSLOConfig"
                    averageSlotsPerHost="10"
                    max="10"  
/\* When there are more than 10 pending jobs, SDM generate additional host request \*/
                    urgency="70"
                    name="maxPendingJobsBlue"/>
        <common:slo xsi:type="ge_adapter:MaxPendingJobsSLOConfig"
                    averageSlotsPerHost="10"
                    max="5"
/\* When there are more than 5 pending jobs, SDM generate additional host request \*/
                    urgency="99"
                    name="maxPendingJobsGroupBlue">
            <common:request>owner = "groupBlue"</common:request>
/\* Only groupBlue owned resources \*/
            <common:resourceFilter>owner = "groupBlue"</common:resourceFilter>
/\* Need these two lines at the same time \*/
        </common:slo>
    </common:slos>

    <ge_adapter:connection keystore="/var/sgeCA/port6236/default_blue/userkeys/sdmadmin/keystore"
                           password=""
                           username="sdmadmin"
                           jmxPort="6238"
                           execdPort="6237"
                           masterPort="6236"
                           cell="default_blue"
                           root="/opt/sge/6.2"
                           clusterName="blue"/>
    <ge_adapter:sloUpdateInterval unit="minutes"
                                  value="3"/>
    <ge_adapter:jobSuspendPolicy suspendMethods="reschedule_jobs_in_rerun_queue reschedule_restartable_jobs suspend_jobs_with_checkpoint">
        <ge_adapter:timeout unit="minutes"
                            value="2"/>
    </ge_adapter:jobSuspendPolicy>

    <ge_adapter:execd adminUsername="root"
                      defaultDomain=""
                      ignoreFQDN="true"
                      rcScript="false"
                      adminHost="true"
                      submitHost="false"
                      cleanupDefault="true">
        <ge_adapter:localSpoolDir>/var/spool/sge/execd</ge_adapter:localSpoolDir>
        <ge_adapter:installTemplate executeOn="exec_host">
            <ge_adapter:script>/opt/sdm/latest/util/templates/ge-adapter/install_execd.sh</ge_adapter:script>
            <ge_adapter:conf>/opt/sdm/latest/util/templates/ge-adapter/install_execd.conf</ge_adapter:conf>
        </ge_adapter:installTemplate>
        <ge_adapter:uninstallTemplate executeOn="exec_host">
            <ge_adapter:script>/opt/sdm/latest/util/templates/ge-adapter/uninstall_execd.sh</ge_adapter:script>
            <ge_adapter:conf>/opt/sdm/latest/util/templates/ge-adapter/uninstall_execd.conf</ge_adapter:conf>
        </ge_adapter:uninstallTemplate>
    </ge_adapter:execd>
</common:componentConfig>

ge_red service:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:componentConfig xsi:type="ge_adapter:GEServiceConfig"
                        mapping="default"
                        xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                        xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
                        xmlns:security="http://hedeby.sunsource.net/hedeby-security"
                        xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
                        xmlns:common="http://hedeby.sunsource.net/hedeby-common"
                        xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter">
    <common:slos>
        <common:slo xsi:type="common:FixedUsageSLOConfig"
                    urgency="50"
                    name="fixed_usage"/>
        <common:slo xsi:type="ge_adapter:MaxPendingJobsSLOConfig"
                    averageSlotsPerHost="10"
                    max="10"
                    urgency="70"
                    name="maxPendingJobsRed"/>
        <common:slo xsi:type="ge_adapter:MaxPendingJobsSLOConfig"
                    averageSlotsPerHost="10"
                    max="5"
                    urgency="99"
                    name="maxPendingJobsGroupRed">
            <common:request>owner = "groupRed"</common:request>
            <common:resourceFilter>owner = "groupRed"</common:resourceFilter>
        </common:slo>
    </common:slos>

    <ge_adapter:connection keystore="/var/sgeCA/port62360/default_red/userkeys/sdmadmin/keystore"
                           password=""
                           username="sdmadmin"
                           jmxPort="62380"
                           execdPort="62370"
                           masterPort="62360"
                           cell="default_red"
                           root="/opt/sge/6.2"
                           clusterName="red"/>
    <ge_adapter:sloUpdateInterval unit="minutes"
                                  value="3"/>
    <ge_adapter:jobSuspendPolicy suspendMethods="reschedule_jobs_in_rerun_queue reschedule_restartable_jobs suspend_jobs_with_checkpoint">
        <ge_adapter:timeout unit="minutes"
                            value="2"/>
    </ge_adapter:jobSuspendPolicy>
    <ge_adapter:execd adminUsername="root"
                      defaultDomain=""
                      ignoreFQDN="true"
                      rcScript="false"
                      adminHost="true"
                      submitHost="false"
                      cleanupDefault="true">
        <ge_adapter:localSpoolDir>/var/spool/sge/execd</ge_adapter:localSpoolDir>
        <ge_adapter:installTemplate executeOn="exec_host">
            <ge_adapter:script>/opt/sdm/latest/util/templates/ge-adapter/install_execd.sh</ge_adapter:script>
            <ge_adapter:conf>/opt/sdm/latest/util/templates/ge-adapter/install_execd.conf</ge_adapter:conf>
        </ge_adapter:installTemplate>
        <ge_adapter:uninstallTemplate executeOn="exec_host">
            <ge_adapter:script>/opt/sdm/latest/util/templates/ge-adapter/uninstall_execd.sh</ge_adapter:script>
            <ge_adapter:conf>/opt/sdm/latest/util/templates/ge-adapter/uninstall_execd.conf</ge_adapter:conf>
        </ge_adapter:uninstallTemplate>
    </ge_adapter:execd>
</common:componentConfig>


In the above GE service configuration, there are two different MaxPendingJobs SLO's are defined per each GE service: maxPendingJobsBlue and maxPendingJobsGroupBlue for the ge_blue service and maxPendingJobsRed and maxPendingJobsGroupRed for the ge_red service, respectively. The difference the two SLOs is that maxPendingJobsBlue and maxPendingJobsRed will generate a need to provide additional host resources, if needed, to the corresponding service in need regardless of who owns the particular resources but the other SLOs are only limited to those hosts that are owned by the corresponding groups.

As soon as the number of pending jobs reaches "5", the maxPendingJobGroupBlue and maxPendingJobGroupRed SLO's will request additional host resources, which are only owned by the corresponding owner groups. This request has the highest urgency value (=99). So any host resources being used by the other groups but owned by the group who requested this request will be pulled from the current SGE service and moved to the SGE service of the corresponding owner group.  The resource movement will be done very quickly if the jobs running on the particular host were rerunnable, restartable, or checkpoint-enabled as defined in the jobSuspendPolicy configuration. Otherwise, the queue instance on the host will be disabled and SDM will wait until all running jobs are completed before moving the host resource.

If the number of pending jobs exceeds "10", then SDM will try to provide any available resources from the spare_pool service. However, this request has a lower urgency value, it  can not be used to take any resources already being used for any GE services.

If there are not many jobs running in the SGE clusters, SDM spare_pool service can pool the unnecessary host resources in the spare_pool service by taking them out of service from the SGE services. In order to that,  as shown in the spare_pool service description below, the urgency value (=60) should be greater than that value (=50) of the FixedUsageSLO defined in the GE services.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:componentConfig xsi:type="spare_pool:SparePoolServiceConfig"
                        xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                        xmlns:spare_pool="http://hedeby.sunsource.net/hedeby-sparepool"
                        xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
                        xmlns:security="http://hedeby.sunsource.net/hedeby-security"
                        xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
                        xmlns:common="http://hedeby.sunsource.net/hedeby-common"
                        xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter">
    <common:slos>
        <common:slo xsi:type="common:PermanentRequestSLOConfig"
                    quantity="10"
                    urgency="60"
                    name="PermanentRequestSLO">
            <common:request>type = "host"</common:request>
        </common:slo>
    </common:slos>

</common:componentConfig>

In this demonstration, as shown below, there are five nodes: one node (node1) for the qmaster host for the ge_blue service, one node (node4) for the qmaster host for the ge_red service and three remaining nodes are assigned to the spare_pool resources.


node0-391# sdmadm sr
service    id    state    type flags usage annotation            
-----------------------------------------------------------------
ge_blue    node1 ASSIGNED host S     50    Got execd update event
ge_red     node4 ASSIGNED host S     50    Got execd update event
spare_pool node2 ASSIGNED host       60                          
           node3 ASSIGNED host       60                          
           node5 ASSIGNED host       60    

As SGE jobs are submitted to each SGE cluster,  SDM will monitor the total number of pending jobs as described in the GE service definition and will try to provide more nodes if needed.

If both clusters becomes very busy and the number of pending jobs exceed the limit defined by maxPendingJobsGroupBlue or maxPendingJobsGroupRed, SDM will start to add resources owned by each respectful owner group.  If a node owned by group Red is being used by the ge_blue service (cluster), SDM will disable the queue instances on the particular node and depending on the jobs already running on the queue instances, it would either reschedule, checkpoint, or wait to finish as described before. Once the jobs on the queue instance are cleared, SDM will uninstall the execution host and move the resource to the group Red, which is the owner of the resource.

This will ensure that, when both GE services becomes very busy, each group will have higher priority on their own resources. Otherwise, all the shared resources are freely available for the other group.

It should be noted that when filtering a specific property, both the request and resourceFilter should be defined in the SLO as shown below:

            <common:request>owner = "groupBlue"</common:request>
            <common:resourceFilter>owner = "groupBlue"</common:resourceFilter>

The following test shows how SDM manages two GE services with various job loads.  After submitting lots of jobs to the ge_blue cluster, the following status is achieved:

 node1-1374# qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@node1                    BIP   0/64/64        0.21     sol-sparc64   
    596 0.55500 sleep      root         r     10/10/2008 17:15:27    16        
    597 0.55500 sleep      root         r     10/10/2008 17:15:45    16        
    598 0.55500 sleep      root         r     10/10/2008 17:15:48    16        
    599 0.55500 sleep      root         r     10/10/2008 17:15:48    16        

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    600 0.00000 sleep      root         qw    10/10/2008 17:15:31    16        
    601 0.00000 sleep      root         qw    10/10/2008 17:15:31    16        
    602 0.00000 sleep      root         qw    10/10/2008 17:15:32    16        
    603 0.00000 sleep      root         qw    10/10/2008 17:15:33    16        
    604 0.00000 sleep      root         qw    10/10/2008 17:15:34    16        
    605 0.00000 sleep      root         qw    10/10/2008 17:15:35    16        
    606 0.00000 sleep      root         qw    10/10/2008 17:15:36    16        
    607 0.00000 sleep      root         qw    10/10/2008 17:15:37    16        
    608 0.00000 sleep      root         qw    10/10/2008 17:15:38    16        
    609 0.00000 sleep      root         qw    10/10/2008 17:15:42    16        
    610 0.00000 sleep      root         qw    10/10/2008 17:16:30    16        
    611 0.00000 sleep      root         qw    10/10/2008 17:16:32    16        
    612 0.00000 sleep      root         qw    10/10/2008 17:16:33    16        
    613 0.00000 sleep      root         qw    10/10/2008 17:16:34    16     

Since the number of pending jobs already exceeds the value 5, SDM will try to provide more resources to the ge_blue cluster. After a while, the following status was observed.

node1-1380# qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@node1                    BIP   0/64/64        0.35     sol-sparc64  
    596 0.55500 sleep      root         r     10/10/2008 17:15:27    16       
    597 0.55500 sleep      root         r     10/10/2008 17:15:45    16       
    598 0.55500 sleep      root         r     10/10/2008 17:15:48    16       
    599 0.55500 sleep      root         r     10/10/2008 17:15:48    16       
---------------------------------------------------------------------------------
all.q@node2                    BIP   0/64/64        0.36     sol-sparc64  
    608 0.55500 sleep      root         r     10/10/2008 17:19:56    16       
    609 0.55500 sleep      root         r     10/10/2008 17:19:56    16       
    610 0.55500 sleep      root         r     10/10/2008 17:19:56    16       
    611 0.55500 sleep      root         r     10/10/2008 17:19:56    16       
---------------------------------------------------------------------------------
all.q@node3                    BIP   0/64/64        0.35     sol-sparc64  
    604 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    605 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    606 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    607 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
---------------------------------------------------------------------------------
all.q@node5                    BIP   0/64/64        0.35     sol-sparc64  
    600 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    601 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    602 0.55500 sleep      root         r     10/10/2008 17:18:36    16       
    603 0.55500 sleep      root         r     10/10/2008 17:18:36    16       

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    612 0.00000 sleep      root         qw    10/10/2008 17:16:33    16       
    613 0.00000 sleep      root         qw    10/10/2008 17:16:34    16       

If more jobs are submitted to the ge_red cluster, the ge_red service will need additional host resources.

 node4-672# qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@node4                    BIP   0/64/64        0.34     sol-sparc64   
    196 0.55500 sleep      root         r     10/10/2008 17:20:51    16        
    197 0.55500 sleep      root         r     10/10/2008 17:21:01    16        
    198 0.55500 sleep      root         r     10/10/2008 17:21:01    16        
    199 0.55500 sleep      root         r     10/10/2008 17:21:04    16        

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    200 0.00000 sleep      root         qw    10/10/2008 17:21:02    16        
    201 0.00000 sleep      root         qw    10/10/2008 17:21:04    16        
    202 0.00000 sleep      root         qw    10/10/2008 17:21:05    16        
    203 0.00000 sleep      root         qw    10/10/2008 17:21:06    16        
    204 0.00000 sleep      root         qw    10/10/2008 17:21:07    16        
    205 0.00000 sleep      root         qw    10/10/2008 17:21:10    16 

 Since the number of pending jobs exceeds the pre-defined number (5), SDM will try to provide additional resource. However, at this moment, all the resources in the spare_pool are already serving the ge_blue cluster.  So SDM will disable the ge_blue queue instance on the node5,  reschedule jobs running on the queue instance since the queue is set for "rerun=TRUE", and uninstall the execd host. Then, the node5 will be moved to the ge_red cluster to server their jobs.

 Please note the Rq status of  JOBID's ranging from 600 to 603, which are requeued on the ge_blue cluster by SDM.  Also note that the node5 is now part of the ge_red cluster.

Before the action:

 node0-394# sdmadm sr
service id    state     type flags usage annotation            
---------------------------------------------------------------
ge_blue node1 ASSIGNED  host S     70    Got execd update event
        node2 ASSIGNING host       inf   Installing execd      
        node3 ASSIGNED  host       inf   Got execd update event
        node5 ASSIGNED  host       inf   Got execd update event
ge_red  node4 ASSIGNED  host S     50    Got execd update event




After the action: 

 node1-1382# qstat -f   [ge_blue cluster]
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@node1                    BIP   0/64/64        0.35     sol-sparc64   
    596 0.55500 sleep      root         r     10/10/2008 17:15:27    16        
    597 0.55500 sleep      root         r     10/10/2008 17:15:45    16        
    598 0.55500 sleep      root         r     10/10/2008 17:15:48    16        
    599 0.55500 sleep      root         r     10/10/2008 17:15:48    16        
---------------------------------------------------------------------------------
all.q@node2                    BIP   0/64/64        0.35     sol-sparc64   
    608 0.55500 sleep      root         r     10/10/2008 17:19:56    16        
    609 0.55500 sleep      root         r     10/10/2008 17:19:56    16        
    610 0.55500 sleep      root         r     10/10/2008 17:19:56    16        
    611 0.55500 sleep      root         r     10/10/2008 17:19:56    16        
---------------------------------------------------------------------------------
all.q@node3                    BIP   0/64/64        0.34     sol-sparc64   
    604 0.55500 sleep      root         r     10/10/2008 17:18:36    16        
    605 0.55500 sleep      root         r     10/10/2008 17:18:36    16        
    606 0.55500 sleep      root         r     10/10/2008 17:18:36    16        
    607 0.55500 sleep      root         r     10/10/2008 17:18:36    16        

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    600 0.55500 sleep      root         Rq    10/10/2008 17:15:31    16        
    601 0.55500 sleep      root         Rq    10/10/2008 17:15:31    16        
    602 0.55500 sleep      root         Rq    10/10/2008 17:15:32    16        
    603 0.55500 sleep      root         Rq    10/10/2008 17:15:33    16        
    612 0.00000 sleep      root         qw    10/10/2008 17:16:33    16        
    613 0.00000 sleep      root         qw    10/10/2008 17:16:34    16    

node4-673# qstat -f [ge_red cluster]
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@node4                    BIP   0/64/64        0.35     sol-sparc64  
    196 0.55500 sleep      root         r     10/10/2008 17:20:51    16       
    197 0.55500 sleep      root         r     10/10/2008 17:21:01    16       
    198 0.55500 sleep      root         r     10/10/2008 17:21:01    16       
    199 0.55500 sleep      root         r     10/10/2008 17:21:04    16       
---------------------------------------------------------------------------------
all.q@node5                    BIP   0/64/64        0.34     sol-sparc64  
    200 0.55500 sleep      root         r     10/10/2008 17:23:13    16       
    201 0.55500 sleep      root         r     10/10/2008 17:23:13    16       
    202 0.55500 sleep      root         r     10/10/2008 17:23:13    16       
    203 0.55500 sleep      root         r     10/10/2008 17:23:13    16       

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    204 0.00000 sleep      root         qw    10/10/2008 17:21:07    16       
    205 0.00000 sleep      root         qw    10/10/2008 17:21:10    16      

node0-395#  sdmadm sr [host resource assignment details]
service id    state    type flags usage annotation           
--------------------------------------------------------------
ge_blue node1 ASSIGNED host S     70    Got execd update event
        node2 ASSIGNED host       99    Got execd update event
        node3 ASSIGNED host       99    Got execd update event
ge_red  node4 ASSIGNED host S     70    Got execd update event
        node5 ASSIGNED host       99    Got execd update event


Once all jobs are completed, except for those resources marked as static,  the available resources will be moved into the spare_pool service so that they can be provided to any GE services if needed in the future.

node0-391# sdmadm sr
service    id    state    type flags usage annotation            
-----------------------------------------------------------------
ge_blue    node1 ASSIGNED host S     50    Got execd update event
ge_red     node4 ASSIGNED host S     50    Got execd update event
spare_pool node2 ASSIGNED host       60                          
           node3 ASSIGNED host       60
           node5 ASSIGNED host       60   


Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Chansup Byun

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today