Sun Rays\* and Round Robin
By ThinGuy on Dec 22, 2010
\*While the title of this blog is misleading, know that my intent is to help people find answers via search engines. In reality, this post is about using operating systems that perform caching of resolved DNS names and how it can affect name resolution. Nothing to do with Sun Ray, except the fact that they are the most widely used client for readers of this blog.
Every once in a while a problem shows up where an application or script installed on a Sun Ray server always connects to the first server in a list of hosts that has been configured for round robin in DNS. Typically, this is noticed in kiosk mode where something like the Sun Ray Windows Connector is pointed at a DNS name that is configured to round robin (RR) through multiple Windows Servers hosts running Terminal/Remote Desktop Services. Sun Ray administrators are often left scratching their heads as to the root cause.
The most common initial reaction is that the DNS server is setup incorrectly (I'll plead guilty that I've had the same reaction). It is true that some DNS servers can be configured will return the same IP on the initial DNS query. This is called a cyclic ordering and results in the first IP address of the resolved name on every client to be identical. For example, if
a DNS server is set for cyclic ordering, every first request for
the IP address of xyz.com will be that of "Server A" on all clients. The
second query from the same client for xyz.com should result in "Server B" being
resolved (if not, then you do have a DNS problem). However, in the case of Sun Ray Server, the "second query from the same client" would be from the second Sun Ray session. As far as the DNS server is concerned, all the Sun Ray sessions (at least those on the same server) are coming from the same client. Therefore, even with cyclic ordering, round robin should work as expected (or even better) in a Sun Ray environment as compared to a "fat" client environment. You can check the documentation
for specific DNS servers on how to set a random order for round robin (which should be the default), but know that chasing down cyclic ordering of RR entries as a cause to this "issue" is basically a red herring in a Sun Ray environment.
The most likely culprit in this scenario is a side effect of the name service caching daemon (nscd) that is active on the operating system where Sun Ray Server Software (SRSS) is installed. The job of nscd is to speed up name resolution of a lot of things; hosts, groups, and in the case of Solaris, even things like user attributes (i.e. RBAC). However, we are primarily concerned about the caching of host IP addresses.
The truth is that nscd usually breaks round robin scheme as it caches the first server returned by the first query from the "caller" (i.e. the host doing the name lookup, aka in this case the Sun Ray server) and all applications/scripts on the caller will use that address for the lifetime of the cache (TTL). The default TTL of the nscd cache on Solaris and Linux is 3600 seconds (an hour). If you think that TTL is long, consider the similar mechanism on Windows is 86,400 seconds (1 day).
On Solaris to disable the caching of hosts, edit
/etc/nscd.conf on every Sun Ray server in the host group (aka FOG) and uncomment the line that reads "enable-cache hosts no". To completely disable the name service caching daemon for Solaris 10, run the following
command on every Sun Ray server in the host group (aka FOG):
svcadm disable system/name-service-cache
On Linux to disable the caching of hosts, edit
/etc/nscd.conf on every Sun Ray server in the host group (aka FOG) and change "enable-cache hosts yes" to "enable-cache hosts no". Then restart nscd
/etc/init.d/nscd restart Note that utilities per distro can vary, for instance on Oracle Linux or Red Hat you could run
authconfig --disablecache to completely disable nscd.
On Windows...Ok, I'll include it for fun even though SRSS does not run on Windows. Every release of Windows after Windows 2000 has an nscd like feature called "dnscache". To permanently disable dnscache, use the Services control panel applet to set the 'Windows DNS Client' service
to disabled. Note that this service may also appear as "Dnscache." You can stop caching temporarily until next reboot with
net stop dnscache
If the caching of host names is still desirable on your platform, yet you'd like to shorten the cache life to something that more suits your needs, that is possible too. Just read the man pages for nscd, though I'm sure if you made the edits above you've figured the cache TTL already. For Windows, google "windows client dns cache". Normally I'd just to post a link to technet article, but in my experience, Microsoft has been about as reliable with "permalinks" on support articles as a
20 year old Yugo would be on cross country road trips five year old PC having Windows 7 drivers available.
For those running Sun Ray Server Software on Solaris, there's one final step. An über smart colleague (Ed. Note: I just love that he is listed as a "Newbie") mentioned the possibility of another problem
that could exist in a Solaris environment, even if nscd is disabled. This has to do with hosts on the same
subnet as the caller and the default behavior of the
Solaris resolver library. By default, the Solaris resolver library puts servers that reside on the same subnet
as the caller at the top of the sort order. You can read the notes in
the Solaris man pages for "gethostbyname" and nss(4) for
more information, but both reference how this behavior can break round
robin. The fix is to edit
/etc/default/nss and uncomment (or add) the
Following these changes should allow sessions on your Sun Ray Server to behave as one would expect with round robin DNS entries.
Happy Holidays and Happy