Thursday Apr 10, 2008

Will I ever escape NIS+?

Todays blast from the past is what do to when your NIS+, yes I did say NIS+, name space does not do what you are expecting. Contrary to popular myth NIS+ can be reliable and can scale to large deployments, so much so that there are a number of customers that do have large deployments and that does not include the two that I'm aware of in Sun. That said, even I would not advocate anyone setting up a NIS+ namespace now. LDAP is the future and the way to go.

Now back to NIS+. Today's problem was not atypical of the kind of issues you can see with NIS+ and was also an interesting as the SGRT questions or at a least the answers to the SGRT questions did not immediately lead to a resolution. The problem statement was “New users are not correctly authenticated”. So when they logged in “nisdefautlts -p” would say they were “nobody”. Having them keylogin would then, it was claimed, resolve the issue.

After a bit of questioning it was clear that either I was asking the wrong questions or the answers I was getting were not accurate or someone had installed some randomizing function into the system. Shared Shell to the rescue. I could now see with my own eyes what was going on and then suggest the next command to run without worrying about translation. It became clear that the problem was indeed random. Successive calls to “nisdefaults -p” would give different results and I would hazard a guess, although I did not confirm this, this effected all the users and all the systems.

The key to tracking this down is the NIS_OPTIONS envirnment variable which allows you to see each NIS+ call and it's return status and more interestingly in this case lets you see which server served you:

: estale.eu FSS 6 $; env NIS_OPTIONS="debug_bind debug_calls" nisdefaults -p
nis_list([auth_name=14442,auth_type=LOCAL],cred.org_dir.eu.cte.sun.com., 0x30003, 0x0, 0x0)
binding to directory cred.org_dir.eu.cte.sun.com. (parent first)
bind succeeded
create handle: DG
release otis.cte.sun.com., status = 0
status=Success, 1 object, [z=427, d=363, a=3327, c=4918]
cg13442.eu.cte.sun.com.
: estale.eu FSS 7 $; 

I got lucky with the customer and the problem fell out at the first attempt. They had a half deleted a NIS+ replica server so it was still in the org_dir directory object and was still running rpc.nisd but would respond with an error when ever it was called. If you got another NIS+ server you were o.k. In a way it was a pity to get there so quickly as I never had the chance to send them this script:

#!/bin/ksh
unset dom
unset host
verbose=0
vecho()
{
	if [ $verbose -eq 1 ]
	then
		echo $@
	fi
}
while getopts vd:h: c
     do
           case $c in
          	d) dom=$OPTARG ;;
          h)       host=$OPTARG;;
	  v)	   verbose=1 ;;
          \\?)      echo "USAGE ${0##\*/} [-v] -h host -d domain -- command"
             exit 2;;
          esac
     done
 shift `expr $OPTIND - 1`


if [ "${host}" = "" -a "$dom" = "" ] 
then
	echo one or both of -h and -m must be used
	exit 1
fi

if [ "$dom" != "" ]
then
for server in $(niscat -o ${dom} | nawk '/Master/ { master=1 } /Name/ { if (master==1) print $3 }')
do
	echo server=$server
	vecho NIS_OPTIONS="server=$server" $@
	NIS_OPTIONS="server=$server" $@
	x=$?
	if [ $x -ne 0 ]
	then
		niserror $x
	fi
done
fi
if [ "$host" != "" ]
then
	vecho NIS_OPTIONS="server=$host" $@
	NIS_OPTIONS="server=$host" $@
	x=$?
	if [ $x -ne 0 ]
	then
		niserror $x
	fi
fi
exit 0

Which amongst other things will run the same command using each NIS+ server for a directory in turn. Great when you think something is misbehaving but can't quite put your finger on which server it is.


: estale.eu FSS 10 $; ./nis_server -d org_dir.eu.cte.sun.com nismatch [auth_name=14442,auth_type=LOCAL],cred.org_dir
server=otis.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=enotty.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=pacrim-repzone-eu.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=eu-repzone-eu.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
: estale.eu FSS 11 $; 
ls -l ./nis_server
-rwxr-x--x   1 cg13442  staff        910 Mar 30  2001 ./nis_server
: estale.eu FSS 12 $; 

It appears that script is 7 years old. Again the problem was not really NIS+ at all but an admin error.

Thursday Sep 06, 2007

An Offer you can't refuse...

I just read this offer from Clive:

So in a one time offer only, if you are a customer who has never used SharedShell before and you want an hour or two of free remote Solaris Performance Consulting in the next week and you are happy for me to blog about it, drop me an email.

Given that Clive is one of the best performance Engineers around (he would never make such a claim himself) this is an offer that is well worth taking up. It's interesting that Clive is such a convert to the Shared Shell. The performance improvement from doing 2 hours in shared shell compared with 2 hours travelling to a site then 2 hours work and 2 hours travelling back is quite large and that assumes the site is only 2 hours away.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today