Friday Nov 16, 2007

Two Storage Commands I Don't Know How I Lived Without

I am working on an issue which involves a 3510 and two dual connected hosts (Home grown Active/Active configuration). The customer's equipment has just been moved within the cage. When the systems were rebooted one of them reported multipath failures and both SCSI errors.


While I was investigating the problems I used cfgadm, luxadm and sccli the first two are common to Solaris the last is an additional package for management of arrays including the 3510. These commands are not new to me; while I was searching for alternative solutions to my problems I found fcinfo and mpathadm two fairly new (and definitely new to me) commands.


Using luxadm to display the state of the ports:
(One of the ports was not connected but I wasn't thinking about blogging it so I missed my chance to capture the output.)

luxadm -e port
/devices/pci@1d,0/pci1022,7450@1/pci1077,100@1/fp@0,0:devctl CONNECTED
/devices/pci@1d,0/pci1022,7450@2/pci1077,100@1/fp@0,0:devctl CONNECTED

Using luxadm to show link errors:
luxadm -e rdls /dev/es/ses0 

Link Error Status information for loop:/dev/es/ses0
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
9e 0 1 1 0 2794 0
9f 0 0 0 0 243 0
1 0 0 0 0 0 0

Link Error Status information for loop:/dev/es/ses0
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
a3 0 2 2 0 65535 0
a5 0 0 0 0 28481 0
1 0 0 0 0 0 0

Using cfgadm to see the configuration of the devices:
(In the original investigation c2 was displaying type fc and Occupant unconfigured)
cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 disk connected configured unknown
c0::es/ses1 processor connected configured unknown
c1 fc-private connected configured unknown
c1::256000c0ffc86cfb disk connected configured unknown
c1::256000c0ffd86cfb ESI connected configured unknown
c2 fc-private connected configured unknown
c2::226000c0ffa86cfb ESI connected configured unknown
c2::226000c0ffb86cfb ESI connected configured unknown
I tried using 'cfgadm -c configure c2' and 'cfgadm -f -c configure c2' and finally 'cfgadm -o force_update -c configure c2' none of which succeeded in letting me recover the path. I just now found a bug for path shows NOT CONNECTED. It appears that I might have been able to recover using 'luxadm -e forcelip'. Since I needed to clear the 3510 error counters it was decided to take the systems down and power cycle the 3510.


Now on to the hook for this post (fcinfo and mpathadm). While looking at various documentation I found fcinfo and mpathadm!
fcinfo was added in S10u1. Using fcinfo I saw the some of the same information that I got from 'luxadm -e rdls' and more.
Using fcinfo to see local hba-port info:
fcinfo hba-port -l
HBA Port WWN: 210000e08b1cdb34
OS Device Name: /dev/cfg/c1
Manufacturer: QLogic Corp.
Model: QLA2340
Firmware Version: 3.3.117
FCode/BIOS Version: N/A
Type: L-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b1cdb34
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
HBA Port WWN: 210000e08b1124bf
OS Device Name: /dev/cfg/c2
Manufacturer: QLogic Corp.
Model: QLA2340
Firmware Version: 3.3.117
FCode/BIOS Version: N/A
Type: L-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b1124bf
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0

Nothing extremely interesting on the hba-port side, however fcinfo also shows remote-port information.
Using fcinfo to see remote-port info: (the -p option shows information visible from the hba-port WWNs seen above)
fcinfo remote-port -l -p 210000e08b1124bf
Remote Port WWN: 226000c0ffb86cfb
Active FC4 Types:
SCSI Target: yes
Node WWN: 206000c0ff086cfb
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 2
Loss of Signal Count: 2
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 65535
Invalid CRC Count: 0
Remote Port WWN: 226000c0ffa86cfb
Active FC4 Types:
SCSI Target: yes
Node WWN: 206000c0ff086cfb
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 28481
Invalid CRC Count: 0

fcinfo remote-port -l -p 210000e08b1cdb34
Remote Port WWN: 256000c0ffd86cfb
Active FC4 Types:
SCSI Target: yes
Node WWN: 206000c0ff086cfb
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 1
Loss of Signal Count: 1
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 2794
Invalid CRC Count: 0
Remote Port WWN: 256000c0ffc86cfb
Active FC4 Types:
SCSI Target: yes
Node WWN: 206000c0ff086cfb
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 243
Invalid CRC Count: 0


Fcinfo and luxadm are clearly showing me that there are problems reported for the remote port in the 'Invalid Tx Word Count'.
The primary recommendation is to reseat the cables SPFs and blow out the ports. We are moving along with the process now, having replaced one of the cables, reseated everything and blown out the ports.

On to mpathadm, it was added in S10u3 and lets you discover and manage multipathing (shocking given its name).
I am using mpathadm to display information about the current configuration. Prior to the reboot the system with only one link in CONNECTED state showed only one path to all devices.
Output from 'mpathadm list lu':
mpathadm list lu
/scsi_vhci/enclosure@g600c0ff000000000086cfb0000000000
Total Path Count: 3
Operational Path Count: 3
/dev/rdsk/c3t600C0FF000000000086CFB359771241Bd0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c3t600C0FF000000000086CFB359771241Ad0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c3t600C0FF000000000086CFB3597712419d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c3t600C0FF000000000086CFB3597712418d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c3t600C0FF000000000086CFB3597712417d0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c3t600C0FF000000000086CFB3597712416d0s2
Total Path Count: 2
Operational Path Count: 2

Specific detail from 'mpathadm show lu' on path:
mpathadm show lu /dev/rdsk/c3t600C0FF000000000086CFB3597712416d0s2
Logical Unit: /dev/rdsk/c3t600C0FF000000000086CFB3597712416d0s2
mpath-support: libmpscsi_vhci.so
Vendor: SUN
Product: StorEdge 3510
Revision: 415G
Name Type: unknown type
Name: 600c0ff000000000086cfb3597712416
Asymmetric: no
Current Load Balance: round-robin
Logical Unit Group ID: NA
Auto Failback: on
Auto Probing: NA

Paths:
Initiator Port Name: 210000e08b1cdb34
Target Port Name: 256000c0ffc86cfb
Override Path: NA
Path State: OK
Disabled: no

Initiator Port Name: 210000e08b1124bf
Target Port Name: 226000c0ffb86cfb
Override Path: NA
Path State: OK
Disabled: no

Target Ports:
Name: 256000c0ffc86cfb
Relative ID: 0

Name: 226000c0ffb86cfb
Relative ID: 0
This is in no way a full exploration of the capabilities of fcinfo and mpathadm.
I hope that the next time you are (or I am) looking at FC or multipath issues these commands will be helpful.
Please see the links to the manual pages below for more specific information and examples from the fcinfo and mpathadm commands.

References:
fcinfo – Fibre Channel HBA Port Command Line Interface
mpathadm – multipath discovery and administration

EDIT: Fixed some strange formatting issues
EDIT1: A bit more touch up

Thursday Feb 15, 2007

Thumper Zones and Clones

I have access to a fully loaded thumper till about the end of the month.


I have had access to a thumper before and after flailing at it and using cfgadm to disable large numbers of disks in a zpool while writing to them I got it to panic to protect my data.


That was fun, but I what I want now is Solaris with support for cloning zones. To see how many zones I can pack in while playing with resource controls before: 

  1. It goes back
  2. It becomes silly slow

 

Now I am off to live upgrade!

Tuesday Feb 13, 2007

Solaris 10 all AMPed up

The latest news in (S/L)AMP system is pre-built packaged coolstack software. To quote the coolstack site discussing compile time optimization "This results in anywhere between 30-200% performance improvement (depending on workload/application) over standard binaries." To which I say WOOT!

I have to say Solaris AMP or AMPS. SAMP really doesn't do it for me.

 

In other vaguely related news, I saw mail recently that PHP 5.5 should be available by default in OpenSolaris some time soon.

 

Also if you are quick about it, it appears that you can have a DVD of the latest SXDE (Solaris Express Developer Edition) sent to you for free. The DVD includes the Solaris AMP packages (so does the download) so you don't have to get them separately.

 

I am currently one rev prior to the SXDE release in my parallels instance, I guess I should start downloading a new iso.

Monday Feb 12, 2007

The Solaris 10 telnet exploit


So at this point you have probably heard about the "0-day" telnet exploit which appears to be a problem with user authentication with in.telnetd. I have seen one proposed work around for the problem that I think may cause some heartburn if implemented.

In Another Good Reason to Stop Using Telnet Donald Smith reports a work around that appears to work. However in simple testing this appears to break normal applications of telnet.
e.g. if you ARE USING PASSWORD BASED USER AUTHENTICATION you will no longer be able to login

The mitigation of the vulnetability which allows logins as any user including root to login without a password.
inetdadm -m svc:/network/telnet:default exec="/usr/sbin/in.telnetd -a user"

I don't have a kerberos enabled environment to test with so I don't know if ticket based authentication would still work in this configuration.

My general thought process would be:

If you are still using telnet, hopefully it is because you absolutely need to use it.
e.g. hard coded legacy application that uses telnet

If you need to use telnet, enable tcp_wrappers
and allow only telnet from your trusted and required hosts.
inetdadm -m svc:/network/telnet:default tcp_wrappers=TRUE


UPDATE 1 (02/13 9:48):

Interim Security Relief (ISRs)  Patches are available in the Sun Alert document. The README does not seem to indicate that a reboot is required. If you need telnet it would seem appropriate to install these patches ASAP. (No really read the README, installing an IDR limits your ability to re-patch the affected areas without first removing the IDR)


Sun Alert ID: 102802 Security Vulnerability in the in.telnetd(1M) Daemon May Allow Unauthorized Remote Users to Gain Access to a Solaris Host

 US-CERT VU#881872 Sun Solaris telnet authentication bypass vulnerability


Thursday Sep 07, 2006

CEC 2006: Procrastinating

I am currently working on our presentation for the Sun Continuing Engineering Conference 2006.

We are presenting "Managing Systems at Grid Scale"

In December 2005 Sun Managed Operations (aka SevenSpace) took over the infrastructure management for commercial and retail sungrid deployments. Managing at grid scale raises many challenges to tools and operations mind-sets when compared to traditional enterprise systems management.

I am going to go home and try to work on my slides.

Things to do at CEC:
Find Bill Walker and try to get a Very cool fridge magnet...

I will have to bring my oh so tiny Solaris Laptop with me.

Technorati Tags:

Friday Mar 17, 2006

Dynamic Ipfilter Rules for RPC Services via SMF

How do you allow access to rpc services through ipf?


Use SMF with custom methods and dependencies to create Dynamic Ipfilter Rules for RPC Services.


Searching found a number of people with the same questions and no good answers Darren Reed: SunRPC proxy,OpenSolaris Forums in which Darren states "There is a proxy, of sorts, in the IPFilter source code at present, but it is of questionable integrity". Unfortunately questionable integrity is right out in this environment.


A simple solution was implemented, a startup script was written by Borgan Chu to parse rpcinfo -p and create ipfilter rules to allow traffic from the desired source addresses to the dynamic rpc service port. The script uses a configuration file with the following syntax, similar to the syntax of hosts.allow/hosts.deny.

rpcservice: addrmask addrmask
rpcservice: addrmask addrmask

It created rules using the following logic:


Split each line into service and source
For each source add a rule to ipf to allow the desired traffic
e.g. pass in quick proto rpcproto from source to dest port = rpcport keep state | ipf -f -

pass in quick proto udp from pool/1001 to any port = 32782 keep state
pass in quick proto tcp from pool/1001 to any port = 32808 keep state
pass in quick proto udp from pool/1002 to any port = 32782 keep state
pass in quick proto tcp from pool/1002 to any port = 32808 keep state


The previous code acknowledged one Major problem. The script only runs at boot, any restart of rpcbind an rpc service could result in different port assignment invalidating the previous rules. From an operational standpoint I pictured repeatedly troubleshooting the same issue: A service mysteriously stops working, Tier 2 Engineers look into the problem and find that the service is running and can be reached locally and possibly from other hosts but not from the problem host.


One other issue with the script was apparent to me, future manual script execution would continue to add entries to the rules with no way to clean up without flushing the exiting rule set. This required a change to both the script and the default ipf.conf rules. With  the following changes the script supports both a stop and start method, as well as creating slightly different rules.

New base ipf rules:

# Allow Dynamic RPC entries
pass in on bge0 all head 100
# useless rule to allow for deletion of all inbound dynamic pass rules
# i.e. you can't delete the first rule in a group
pass in on lo0 all group 100


start()
Split each line into service and source
For each source add a rule to ipf to allow the desired traffic
e.g. pass in quick proto rpcproto from source to dest port = rpcport keep state group 100 | ipf -f -

stop()
/usr/sbin/ipfstat -i | /usr/bin/grep "group 100" | /usr/sbin/ipf -r -f -

pass in quick proto udp from pool/1002 to any port = 626 keep state group 100
pass in quick proto tcp from pool/1002 to any port = 35913 keep state group 100
pass in quick proto udp from pool/1001 to any port = 626 keep state group 100
pass in quick proto tcp from pool/1001 to any port = 35913 keep state group 100

The major problem of maintaining dynamic rules can be resolved using SMF, creating a service for our ipf rules script with require_all dependencies on ipfilter causes the new service to run only when ipfilter is enabled.

dependency   require_all/refresh svc:/network/ipfilter:default (online)
dependency   require_any/refresh svc:/network/rpc/bind:default (online)
The execution can be further tuned by creating require_any dependencies on various rpc services.
svcs "\*rpc\*"
After the manifest is loaded the properties of the service can be bulk updated to cover most standard rpc services with the following command, or manually updated to require_any other specific rpc services.
svccfg -s ipfilter:rpcbind setprop "rpc_services/entities = fmri: (`svcs -H \\\*rpc\\\* \\\*nis\\\* \\\*nfs\\\* | awk '$NF !~ /ipfilter|bind:default/{ print $3 }'`)"

Replaces the manifest defined rpc_services dependencies with the following:
svc:/network/rpc/keyserv:default
svc:/network/rpc/nisplus:default
svc:/network/rpc/bootparams:default
svc:/network/rpc/gss:default
svc:/network/rpc/mdcomm:default
svc:/network/rpc/metamed:default
svc:/network/rpc/metamh:default
svc:/network/rpc/rex:default
svc:/network/rpc/rusers:default
svc:/network/rpc/spray:default
svc:/network/rpc/wall:default
svc:/network/rpc-100235_1/rpc_ticotsord:default
svc:/network/nfs/server:default
svc:/network/rpc/meta:default
svc:/network/rpc/smserver:default
svc:/network/rpc/rstat:default
svc:/network/nfs/rquota:default
svc:/network/nfs/client:default
svc:/network/nis/passwd:default
svc:/network/nis/update:default
svc:/network/nis/client:default
svc:/network/nis/server:default
svc:/network/nfs/cbd:default
svc:/network/nfs/mapid:default
svc:/network/nis/xfr:default
svc:/network/nfs/status:default
svc:/network/nfs/nlockmgr:default

Using a script similar to the one described above you could specifically limit the dependent services to the ones specified in your configuration file.

The Service Manifest:
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--
Shawn Ferry yakshaving <@> sun.com
Service manifest for maintaining dynamic ipfilter rules
-->
 
<service_bundle type='manifest' name='ipfilter:dynamic'>

  <service
          name='application/ipfilter/dynamic'
    type='service'
    version='1'>

   
<!-- maybe more than one if this gets complex -->
    <single_instance /> 
   
        <!-- Require ipfilter, without an online ipfilter, don't try to add rules -->
        <dependency
            name='ipfilter'
            grouping='require_all'
            restart_on='refresh'
            type='service'>
            <service_fmri value='svc:/network/ipfilter:default' />
        </dependency>

 
      <!--
       An instance for rpcbind, additional instances
       could be created for additional services requiring
       dynamic rules (this would be the time to disable single_instance)
      -->
      <instance name="rpcbind" enabled="true">
              <!-- If rpcbind is offline, no rules to add, don't do anything -->
              <dependency
                      name='rpc_bind'
                      grouping='require_all'
                      restart_on='refresh'
                      type='service'>
                      <service_fmri value='svc:/network/rpc/bind:default' />
              </dependency>

              <!-- If rule creation config file is missing, don't do anything -->
              <dependency
                      name='ipfilter_rpcbind_config'
                      grouping='require_all'
                      restart_on='refresh'
                      type='path'>
                      <service_fmri value='file://localhost/etc/ipf/ipfilter-dynamic_rpcbind.cfg' />
              </dependency>

              <!--
               All of the dependencies in the rpc_services group
               are "require_any".
               
               With a "refresh" directive, on stop/start/refresh of those
               services the
ipfilter/dynamic:rpcbind service will be restarted
               keeping the ipf rules up to date
              -->

              <dependency
                      name='rpc_services'
                      grouping='require_any'
                      restart_on='refresh'
                      type='service'>
                      <service_fmri value='svc:/network/nfs/server:default' />
                      <service_fmri value='svc:/network/nfs/client:default' />
              </dependency>

        <!-- On "start" run the ipf rule script with the argument start -->
        <exec_method
            type='method'
            name='start'
            exec='/lib/svc/method/ipfilter-dynamic_rpcbind start'
            timeout_seconds='30' />

        <!-- On "stop" run the ipf rule script with the argument stop -->
        <exec_method
            type='method'
            name='stop'
            exec='/lib/svc/method/ipfilter-dynamic_rpcbind stop'
            timeout_seconds='60' />

        <!--
          This is a transient service we are looking for a clean
          exit code. i.e. a svcs -p shows no associated processes
        -->
        <property_group name='startd' type='framework'>
                <propval name='duration' type='astring' value='transient' />
        </property_group>

        <!--
         Useful Info:
svcs -xv dynamic:rpcbind
svc:/application/ipfilter/dynamic:rpcbind (Dynamic rpc service rules for ipfilter)
 State: online since Fri Mar 17 19:41:14 2006
   See: man -M /usr/share/man -s 1M ipf
   See: man -M /usr/share/man -s 1M ipfstat
   See: man -M /usr/share/man -s 4 ipf.conf
   See: /var/svc/log/application-sungrid-ipfilter:rpcbind.log
Impact: None.
        -->
        <template>
            <common_name>
                <loctext xml:lang='C'>
                        Dynamic rpc service rules for ipfilter
                </loctext>
            </common_name>
            <description>
              <loctext xml:lang='C'>
                      Add Dynamic rpc services rules to ipfilter refresh rules on
                      restart/refresh of various services to maintain rules.

                      Manually clearing and reloading ipfilter rules with ipf will not
                      trigger this service to restart/refresh.
                      e.g. ipf -Fa -f /etc/ipf/ipf.conf
              </loctext>
            </description>
            <documentation>
                <manpage title='ipf' section='1M' manpath='/usr/share/man' />
                <manpage title='ipfstat' section='1M' manpath='/usr/share/man' />
                <manpage title='ipf.conf' section='4' manpath='/usr/share/man' />
            </documentation>
        </template>
      </instance>
 
      <stability value='Unstable' />
  </service>

</service_bundle>

References:
Liane Praza's: smf(5) fault/retry models
Service Developer Introduction
smf(5)

Notes:
Any IPF rules that are actively in use when the service restarts or is disabled are not removed. That particular aspect is not an issue for us as it is assumed that if the rule is in use the service is still listening.

Tuesday Sep 06, 2005

SMF services

I will post the last part of my creating an SMF service saga shortly.

However, I feel that I should comment on a very obvious difference between a couple of commands.

smf: not a command

svcs: smf service status command

svn: a version control system and command

Various combinations of svn -l "\*fmri\*", svcs commit, smf status are not helping me get work done.

I find that I am having occasional issues trying to use svn track changes to smf services.

Friday Jul 29, 2005

Creating an SMF service (Part 3)

If you havn't already read them you might want to start with Part 1 or continue with Part 2.
It took me a bit longer to get this written than I intended, now that I have figured out that textile and code/pre tags don't play well this should easier.

In short however, it does what I need it to do now. The whole thing wouldn't suffer from some more work though.

In particular it is currently very brittle. If my relocatable package is relocated parts of this will break. I particularly don't know what to do with installations on a root server.

The following is the first part of the manifest.

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!-- Service manifest for the Sysedge monitoring program(s) -->

<service_bundle type='manifest' name='SUNWsmcsysedge'>

  <!-- A milestone to collect dependencies for both sysedge and sysedgeplus -->
  <service 
    name="application/monitoring/sysedge_deps"
    type="milestone"
    version="1">

      <!-- common sysedge dependencies -->
      <instance name="default" enabled="false">
          <dependency
              name='filesystem'
              grouping='require_all'
              restart_on='none'
              type='service'>
              <service_fmri value='svc:/system/filesystem/local' />
          </dependency>
          <dependency
              name='network'
              grouping='require_all'
              restart_on='refresh'
              type='service'>
              <service_fmri value='svc:/network/initial' />
          </dependency>
          <dependency
              name='sysedge_cf'
              grouping='require_all'
              restart_on='refresh'
              type='path'>
              <service_fmri
                value='file://localhost/etc/opt/SUNWsmcsysedge/sysedge.cf' />
          </dependency>
      </instance>

      <!-- sysedgeplus dependencies -->
      <instance name="plus" enabled="false">
        <dependency
            name='sysedgeplus'
            grouping='require_all'
            restart_on='refresh'
            type='path'>
            <service_fmri
              value='file://localhost/opt/SUNWsmcsysedge/plus/sysedgeplus' />
        </dependency>
      </instance>
      <stability value='Unstable' />
  </service>

The brief rundown including how I generally read it to myself:

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!-- Service manifest for the Sysedge monitoring program(s) -->

This is XML, and the DTD can be found here /usr/share/lib/xml/dtd/service_bundle.dtd.1

<service_bundle type='manifest' name='SUNWsmcsysedge'>

This is a manifest called SUNWsmcsysedge

  <!-- A milestone to collect dependencies for both sysedge and sysedgeplus -->
  <service 
    name="application/monitoring/sysedge_deps"
    type="milestone"
    version="1">

The first "service" in the manifest is application/monitoring/sysedge_dep and it is a milestone.
(It is a milestone because somewhere the docs said roughly: A milestone is a syntectic service that collects dependencies)

      <!-- common sysedge dependencies -->
      <instance name="default" enabled="false">
          <dependency
              name='filesystem'
              grouping='require_all'
              restart_on='none'
              type='service'>
              <service_fmri value='svc:/system/filesystem/local' />
          </dependency>

As stated in the comment this is an instance of the service/milestone called default and it has a dependency called filesystem.
The dependency is part of the require_all groups of dependencies it has restart_on=none set which indicates that the local filesystems are reqiured to start the service, but after it is started changes to the filesystem do not require a restart to this service automatically.

This service has multiple instances with different instances being used or re-used to fulfill the requirements pf various other services.

          <dependency
              name='network'
              grouping='require_all'
              restart_on='refresh'
              type='service'>
              <service_fmri value='svc:/network/initial' />
          </dependency>

The second dependency is a requirement for networking support to be enabled. Again it is part of the require_all grouping, however in this case if the network configuration is refreshed the dependent service is restarted. Effectively I believe that this actually propagates down through the dependencies.

      <dependency
              name='sysedge_cf'
              grouping='require_all'
              restart_on='refresh'
              type='path'>
              <service_fmri
                value='file://localhost/etc/opt/SUNWsmcsysedge/sysedge.cf' />
      </dependency>

The third dependency in sysedge_defs:default or application/monitoring/sysedge_deps:default is also part of the requireall grouping. Note that the type here is \*path\*. The path type indicates that the \*servicefmri\* is pointing at a path that must exist to meet the dependency requirement.

</instance>

The closing instance tag indicates that the first defined instance (In this case named default) is finished.

      <!-- sysedgeplus dependencies -->
      <instance name="plus" enabled="false">
        <dependency
            name='sysedgeplus'
            grouping='require_all'
            restart_on='refresh'
            type='path'>
            <service_fmri
              value='file://localhost/opt/SUNWsmcsysedge/plus/sysedgeplus' />
        </dependency>
      </instance>

The second instance exists to validate the sysedgeplus instance.

      <stability value='Unstable' />
  </service>

The stability Unstable indicates that I am still making up my mind about how all of this should work and I am apt to change it at any time. (Sun has actual definitions of the different stability levels)

Stay tuned, in our next episode I will define a service that actually runs a process.

Wednesday Jul 20, 2005

Creating an SMF service (Part 1)


This is Part 1 of an attempt to document creating a new service in SMF.


Resources:
smf(5) and other related manpages
/usr/share/lib/xml/dtd/service_bundle.dtd.1
BigAdmin SelfHealing Site
BigAdmin Developer SMF Intro
BigAdmin SMF QuickStart
docs.sun.com Solaris 10 Admin Guide
Ben Rockwood's SMF Manifest Cheatsheet


The goal is to create a SMF manifest to start a monitoring daemon. (maybe one of those mugs as well)


Before going doing much poking around. I was expecting that I would create a binary instance and a script instance. I would like to be able to have a single manifest that contains all of the services/instances/bits I need.


There are two complementary functions a binary and a script. Recently the script has been modified it now has two operation modes. A "Full" mode and a compatability mode.


I will try to document the problems I run into and the solutions I find.

Part 2

Live Upgrade Rocks

I know that live upgrade has been available for a while...What I don't know is why I never really used it before.

Conceptually it makes sense but it was cool how easy it was to upgrade. I need to see it in larger environemnts with complex application deployments, but so far so good


# lucreate -c "Nevada16" -m /dev/dsk/c0d0t4:/:ufs -n "Nevada18"
# lustatus (see output below)
# lofiadm -a /var/tmp/solarisdvd.iso /dev/lofi/1
# mount -F hfsf  /dev/lofi/1 /mnt
# luupgrade -u -n "Nevada18" -s /mnt
# luactivate Nevada18
# init 6
# lustatus (See output below)




Before: 

lustatus
Boot Environment           Is       Active Active    Can    Copy     
Name                       Complete Now    On Reboot Delete Status   
-------------------------- -------- ------ --------- ------ ----------
Nevada16                   yes      yes    yes       no     -        
Nevada18                   yes      no     no        yes    -        



After:

lustatus
Boot Environment           Is       Active Active    Can    Copy     
Name                       Complete Now    On Reboot Delete Status   
-------------------------- -------- ------ --------- ------ ----------
Nevada16                   yes      no     no        yes    -        
Nevada18                   yes      yes    yes       no     -        

Monday Jul 18, 2005

Creating an SMF service (Part 2)

If you haven't alreay read it you might be interested in statrting with Creating an SMF service (Part 1)

The first major problem...the DTD

Or maybe not so much a problem with the DTD as with my understanding of the DTD.

Which generally means that I have either been lucky before when writing Validated XML or I knew and forgot it.

I spent a good 30 minutes trying to figure out what I was doing wrong.
I had valid XML but it didn't validate.
My first thought was that I was miss-remembering the DTD occurrence syntax.

W3 Schools: DTD Elements Indicates:

When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document.

Error message line breaks for readability.

svccfg validate sysedge.xml
sysedge.xml:94: element service: validity error :
Element service content does not follow the DTD, expecting (create_default_instance? , single_instance? ,
 restarter? , dependency\* , dependent\* , method_context? , exec_method\* , property_group\* , instance\* ,
 stability? , template?), got (instance create_default_instance instance stability )
svccfg: Document is not valid.

Now, that error is clear. I have my children out of order (The above error is synthetic).

I have come up with the following "services":

svc:/application/monitoring/sysedgedeps:plus svc:/application/monitoring/sysedgedeps:default svc:/application/monitoring/sysedge:default svc:/application/monitoring/sysedge:concord svc:/application/monitoring/sysedge:compat svc:/application/monitoring/sysedge:plus

sysedge_deps:\* are classified as milestones

I don't understand why sysedge:default exists with <create_default_instance enabled='false'/> set. Or is that just "The default instance will be created but not enabled"?

I may try and reduce the possible confusion and break the regular and plus code into different services instead of instances. Particularly since I want/need to set <single_instance/> for sysedgeplus.

It also seems that "application property groups" might do some good things.

I may also see if I can figure out how Dan Price did his dependency graphing. Stephen Hahn has a cool example of smf dependency graphing


Part 1
Part 3

Since I wrote this I have split the services and done a bunch of package integration.
I will go into details of the design and implementation in future installments.


Tuesday Jun 14, 2005

Disapointed and Excited All at Once

I downloaded snv_16 which is not quite as uptodate as the downloadable OpenSolaris code.

PXE booting to a grub menu was really way too cool!

Not getting an address for the actual boot was lame and now not being able to repeat anything except dhcp not working it is disapointing.

I was hoping to get snv_16 installed tonight and start compiling OpenSolaris tomorrow.

Taking one last look at my configuration shows that somehow I lost the client idnetifier for my statically assigned DHCP address. I have re-modified the entry and it is time for another shot.

Technorati Tags:

Friday Jun 10, 2005

Nevada Build 16

I downloaded Nevada Build 16 last night. Hopefully I can get things together enough to try and rebuild my little laptop this weekend.

Actually I think I will try an upgrade to see if I can get it to work, I think I allocated a partition to try live upgrade when I built it.

I really want to see "New Boot"

Vaguely on that note, if anyone has an external vga connector for a Sony PCG-U3 I wouldn't mind if you wanted to send it to me.

Wednesday May 25, 2005

Solaris 10 ipfilter ipnat and PPTP

I have been having a problem setting up PPTP tunnels since I upgraded my firewall to Solaris 10 (nv_12).
Some basic tests clearly indicated that the problem was with my configuration.
  1. My BSD firewall with ipfilter worked
  2. My laptop direct worked
As part of the migration I copied the ipf.conf and ipnat.conf files that I had been using.
Once the firewall was up on Solaris 10, I installed the files and changed the interface names to match.

After installing the rules, I had to edit pfil.ap and add a new interface type. svcadm start ipfilter and everything started working...almost. All of my web browsing, inbound/outbound mail, inbound http and ssh worked. The only thing that I couldn't do was create a PPTP tunnel.

I have been poking the config for a few weeks never making the time to sit down and really think about the problem. Last night I took some time to start at the beginning and see if I could work it out.

After reading through the Section 4 of the ipf and ipnat man pages a few more times to make sure I wasn't doing anything obviously wrong. I practiced my googlescholar skills and looked at a bunch of mailing-list posts, the pptp rfc and piles of other stuff. The trigger was seeing a post indicating that all GRE traffic needed to be redirected to the PPTP server.

Kicking off a number of snoops an ipmon and finally (and I don't know why I didn't do this a while ago) I ran a tcpdump for proto gre on my laptop.

  • From the external snoop I was able to see the inbound and outbound traffic
  • From the ipmon I was able to see the inbound and outbound traffic
  • From my laptop, I could only see the outbound gre
The "fix" is to specifically route all gre traffic to the address of my laptop.

I need to see if I can do it without the hard coding of the IP addresses that part is lame.  

The rules that make everything work are:

:::::::: ipf.conf ::::::::
pass out quick on extint proto tcp from any to any port = 1723 flags S keep state
pass out quick on exitint proto 47 from any to any
pass in  quick on extint proto 47 from any to any keep state
   
:::::::: ipnat.conf ::::::::
rdr extint PPTPserverip/32 port 0 -> laptopip port 0 gre 
      
    
    
    
    
    

Friday May 20, 2005

Solaris 10 Zones and N1GE6

I am trying to decide if it is cool or if I have no life(this is generally rhetorical) I recently (last night) created some more zones on one of my machines. Subsequently I installed N1 Grid Engine 6. The install was surprisingly easy. Literally 1) Install Packages 2) run $SGE_ROOT/install_qmaster 3) share $SGE_ROOT via nfs 4) mount shared $SGE_ROOT at $SGE_ROOT on each node 5) run $SGE_ROOT/install_execd on each node 6) run jobs $SGE_ROOT/examples/jobs/pascal.sh 200 Things I have found out: 1) 50000 jobs in simple queuing results in horrible io wait on an underpowered PC         e.g. qstat may as well never respond for how long it takes at 99% io wait 2) 20000 jobs in BerkeleyDB queuing isn't to bad, but it will be a while before they are done running.         e.g. qstat takes 3s to return the list (15727 entries currently) Things to try: 1) add the little PCG-U3 laptop as an execution host 2) add my powerbook as an execution host 3) add C-'s ibook as an execution host 4) pascal.sh 500, just to see if 125250 jobs will kill it Remaining Jobs at 60s + qstat run time intervals Fri May 20 17:40:56 EDT 2005 | 15688 Fri May 20 17:42:04 EDT 2005 | 15673 Fri May 20 17:43:14 EDT 2005 | 15661 Fri May 20 17:44:26 EDT 2005 | 15646 Fri May 20 17:45:37 EDT 2005 | 15631 Fri May 20 17:46:44 EDT 2005 | 15616 Fri May 20 17:47:56 EDT 2005 | 15604 Fri May 20 17:49:07 EDT 2005 | 15589 Fri May 20 17:50:15 EDT 2005 | 15574 Fri May 20 17:51:27 EDT 2005 | 15562 About the server: s10_69 (still haven't gotten around to the upgrade to Nevada Build 14 I want to see New Boot) System Configuration: Sun Microsystems i86pc Memory size: 768 Megabytes AMD: K6 600MHz Currently Running 5 zones(3 execution hosts, apache, torrus collector) Technorati Tag: Technorati Tag: Technorati Tag: Technorati Tag:
About

yakshaving

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
Sun Managed Operations