Will I ever escape NIS+?

Todays blast from the past is what do to when your NIS+, yes I did say NIS+, name space does not do what you are expecting. Contrary to popular myth NIS+ can be reliable and can scale to large deployments, so much so that there are a number of customers that do have large deployments and that does not include the two that I'm aware of in Sun. That said, even I would not advocate anyone setting up a NIS+ namespace now. LDAP is the future and the way to go.

Now back to NIS+. Today's problem was not atypical of the kind of issues you can see with NIS+ and was also an interesting as the SGRT questions or at a least the answers to the SGRT questions did not immediately lead to a resolution. The problem statement was “New users are not correctly authenticated”. So when they logged in “nisdefautlts -p” would say they were “nobody”. Having them keylogin would then, it was claimed, resolve the issue.

After a bit of questioning it was clear that either I was asking the wrong questions or the answers I was getting were not accurate or someone had installed some randomizing function into the system. Shared Shell to the rescue. I could now see with my own eyes what was going on and then suggest the next command to run without worrying about translation. It became clear that the problem was indeed random. Successive calls to “nisdefaults -p” would give different results and I would hazard a guess, although I did not confirm this, this effected all the users and all the systems.

The key to tracking this down is the NIS_OPTIONS envirnment variable which allows you to see each NIS+ call and it's return status and more interestingly in this case lets you see which server served you:

: estale.eu FSS 6 $; env NIS_OPTIONS="debug_bind debug_calls" nisdefaults -p
nis_list([auth_name=14442,auth_type=LOCAL],cred.org_dir.eu.cte.sun.com., 0x30003, 0x0, 0x0)
binding to directory cred.org_dir.eu.cte.sun.com. (parent first)
bind succeeded
create handle: DG
release otis.cte.sun.com., status = 0
status=Success, 1 object, [z=427, d=363, a=3327, c=4918]
cg13442.eu.cte.sun.com.
: estale.eu FSS 7 $; 

I got lucky with the customer and the problem fell out at the first attempt. They had a half deleted a NIS+ replica server so it was still in the org_dir directory object and was still running rpc.nisd but would respond with an error when ever it was called. If you got another NIS+ server you were o.k. In a way it was a pity to get there so quickly as I never had the chance to send them this script:

#!/bin/ksh
unset dom
unset host
verbose=0
vecho()
{
	if [ $verbose -eq 1 ]
	then
		echo $@
	fi
}
while getopts vd:h: c
     do
           case $c in
          	d) dom=$OPTARG ;;
          h)       host=$OPTARG;;
	  v)	   verbose=1 ;;
          \\?)      echo "USAGE ${0##\*/} [-v] -h host -d domain -- command"
             exit 2;;
          esac
     done
 shift `expr $OPTIND - 1`


if [ "${host}" = "" -a "$dom" = "" ] 
then
	echo one or both of -h and -m must be used
	exit 1
fi

if [ "$dom" != "" ]
then
for server in $(niscat -o ${dom} | nawk '/Master/ { master=1 } /Name/ { if (master==1) print $3 }')
do
	echo server=$server
	vecho NIS_OPTIONS="server=$server" $@
	NIS_OPTIONS="server=$server" $@
	x=$?
	if [ $x -ne 0 ]
	then
		niserror $x
	fi
done
fi
if [ "$host" != "" ]
then
	vecho NIS_OPTIONS="server=$host" $@
	NIS_OPTIONS="server=$host" $@
	x=$?
	if [ $x -ne 0 ]
	then
		niserror $x
	fi
fi
exit 0

Which amongst other things will run the same command using each NIS+ server for a directory in turn. Great when you think something is misbehaving but can't quite put your finger on which server it is.


: estale.eu FSS 10 $; ./nis_server -d org_dir.eu.cte.sun.com nismatch [auth_name=14442,auth_type=LOCAL],cred.org_dir
server=otis.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=enotty.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=pacrim-repzone-eu.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
server=eu-repzone-eu.cte.sun.com.
cg13442.eu.cte.sun.com.:LOCAL:14442:10,2192,14,2703,2400,2502,2705,2194,3000,2708,826:
: estale.eu FSS 11 $; 
ls -l ./nis_server
-rwxr-x--x   1 cg13442  staff        910 Mar 30  2001 ./nis_server
: estale.eu FSS 12 $; 

It appears that script is 7 years old. Again the problem was not really NIS+ at all but an admin error.

Comments:

This reminds me of a time when someone we both knew claimed every problem they saw in the lab was a NIS+ problem...

Posted by Paul Humphreys on April 11, 2008 at 01:50 AM BST #

One of my happiest days was the day the NIS kit was made available for 2.x!

Posted by Mike Smith on April 11, 2008 at 03:59 AM BST #

Yes YP lives on and it still sucks. You could write a paper on how not to deliver new technology around the introduction of NIS+. It was doomed from the start by it first being mandatory and second not ready for prime time when it released. This meant all it's bugs and idiosyncrasy's were thrust onto an unforgiving audience. It only lives on in places that either had a reason to stick with it (Sun Service as we had to support it) or were security conscious.

Posted by Chris Gerhard on April 11, 2008 at 04:06 AM BST #

Post a Comment:
Comments are closed for this entry.
About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today