Sun Rays\* and Round Robin

\*While the title of this blog is misleading, know that my intent is to help people find answers via search engines.  In reality, this post is about using operating systems that perform caching of resolved DNS names and how it can affect name resolution.  Nothing to do with Sun Ray, except the fact that they are the most widely used client for readers of this blog.

Every once in a while a problem shows up where an application or script installed on a Sun Ray server always connects to the first server in a list of hosts that has been configured for round robin in DNS.  Typically, this is noticed in kiosk mode where something like the Sun Ray Windows Connector is pointed at a DNS name that is configured to round robin (RR) through multiple Windows Servers hosts running Terminal/Remote Desktop Services.  Sun Ray administrators are often left scratching their heads as to the root cause.

The most common initial reaction is that the DNS server is setup incorrectly (I'll plead guilty that I've had the same reaction).  It is true that some DNS servers can be configured will return the same IP on the initial DNS query.  This is called a cyclic ordering and  results in the first IP address of the resolved name on every client to be identical.   For example, if a DNS server is set for cyclic ordering, every first request for the IP address of xyz.com will be that of "Server A" on all clients. The second query from the same client for xyz.com should result in "Server B" being resolved (if not, then you do have a DNS problem).  However, in the case of Sun Ray Server, the "second query from the same client" would be from the second Sun Ray session.  As far as the DNS server is concerned, all the Sun Ray sessions (at least those on the same server) are coming from the same client.  Therefore, even with cyclic ordering, round robin should work as expected (or even better) in a Sun Ray environment as compared to a "fat" client environment. You can check the documentation for specific DNS servers on how to set a random order for round robin (which should be the default), but know that chasing down cyclic ordering of RR entries as a cause to this "issue" is basically a red herring in a Sun Ray environment.

The most likely culprit in this scenario is a side effect of the name service caching daemon (nscd) that is active on the operating system where Sun Ray Server Software (SRSS) is installed.  The job of nscd is to speed up name resolution of a lot of things; hosts, groups, and in the case of Solaris, even things like user attributes (i.e. RBAC).  However, we are primarily concerned about the caching of host IP addresses.

The truth is that nscd usually breaks round robin scheme as it caches the first server returned by the first query from the "caller" (i.e. the host doing the name lookup, aka in this case the Sun Ray server) and all applications/scripts on the caller will use that address for the lifetime of the cache (TTL). The default TTL of the nscd cache on Solaris and Linux is 3600 seconds (an hour).  If you think that TTL is long, consider the similar mechanism on Windows is 86,400 seconds (1 day). 

On Solaris to disable the caching of hosts, edit /etc/nscd.conf on every Sun Ray server in the host group (aka FOG) and uncomment the line that reads "enable-cache        hosts           no".  To completely disable the name service caching daemon for Solaris 10, run the following command on every Sun Ray server in the host group (aka FOG): svcadm disable system/name-service-cache

On Linux to disable the caching of hosts, edit /etc/nscd.conf on every Sun Ray server in the host group (aka FOG) and change "enable-cache        hosts           yes"  to "enable-cache        hosts           no".   Then restart nscd /etc/init.d/nscd restart  Note that utilities per distro can vary, for instance on Oracle Linux or Red Hat you could run authconfig --disablecache to completely disable nscd.

On Windows...Ok, I'll include it for fun even though SRSS does not run on Windows.  Every release of Windows after Windows 2000 has an nscd like feature called "dnscache".  To permanently disable dnscache, use the Services control panel applet to set the 'Windows DNS Client' service to disabled.  Note that this service may also appear as "Dnscache."  You can stop caching temporarily until next reboot with net stop dnscache

If the caching of host names is still desirable on your platform, yet you'd like to shorten the cache life to something that more suits your needs, that is possible too.  Just read the man pages for nscd, though I'm sure if you made the edits above you've figured the cache TTL already.  For Windows, google "windows client dns cache".  Normally I'd just to post a link to technet article, but in my experience, Microsoft has been about as reliable with "permalinks" on support articles as a 20 year old Yugo would be on cross country road trips five year old PC having Windows 7 drivers available.

For those running Sun Ray Server Software on Solaris, there's one final step.  An über smart colleague (Ed. Note: I just love that he is listed as a "Newbie") mentioned the possibility of another problem that could exist in a Solaris environment, even if nscd is disabled.  This has to do with hosts on the same subnet as the caller and the default behavior of the Solaris resolver library.  By default, the Solaris resolver library puts servers that reside on the same subnet as the caller at the top of the sort order. You can read the notes in the Solaris man pages for "gethostbyname" and  nss(4) for more information, but both reference how this behavior can break round robin. The fix is to edit /etc/default/nss and uncomment (or add) the statement SORT_ADDRS=FALSE

Following these changes should allow sessions on your Sun Ray Server to behave as one would expect with round robin DNS entries. 

Happy Holidays and Happy computing resolving!

Comments:

Within an SRSS environment what is the best practice for NSCD? Typically I run into lots of Solaris SRSS installs that utilize window DNS. Currently I have DNS caching set to the default. I am just curious to get your thoughts.Thanks Rob

Posted by Rob Jaudon on January 05, 2011 at 07:26 AM PST #

Hi Rob,
If in your environment round robin is used a lot of load balance hosts, I'd recommended disabling at least caching of host names and making the change to nss mentioned above. Generally I work on Kiosk based systems so the other benefits of nscd are (at least to my eye) unseen, so I'm comfortable just turning it off. Although, I'd much rather see customers use some form of load balancing methods that ensures hosts are reachable, which of course RR does not.

Posted by ThinGuy on January 07, 2011 at 10:17 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Think Thin is a collection of bloggers that work with Oracle's Virtual Desktop portfolio of products.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today