Wednesday Mar 17, 2010

Source aware routing, aka "packets go out the right interface", is available in onnv_135.

A follow up to the problem described in Jim's blog

Solaris Nevada onnv_135 now provides a way for administrators to ensure "source-aware route selection". What this achieves is the following: consider a system trying to send out a locally originated packet. If there are multiple "longest match" routes for the IP destination of the packet going through different interfaces, when the "ip_strict_src_multihoming" is set to a non-zero value, the route selection will give preference to a route going through an interface on which the IP source of the outgoing packet is configured. If no such route is available (i.e., all the available routes are through interfaces which do not have the IP source of the outgoing packet),  and ip_strict_src_multihoming  is set to 1, aka "preferred source aware routing, (or if the ip_strict_src_multihoming is set to 0) the route selection will pick the next matching route permitted by the applicable ECMP parameters and longest match.

If ip_strict_src_multihoming is set to 2 (aka "strict source multihoming"), the IP source of the outgoing packet MUST be configured on the outgoing interface, so if no such matching
route is available, the packet is dropped.

What does all this mean for the Administrator?

If you want simple ECMP, with "weak multihoming", set your ip_strict_dst_multihoming and ip_strict_src_multihoming to 0.

If you would like symmetric path selection (i.e., request/response packets go in/out the same interface), or have to ensure that originated packets are not dropped due to downstream ingress filtering, you may choose one of the following settings for transmit behavior:

  •  ip_strict_src_multihoming == 1, where the first preference would be for an interface matching the IP source, and if that's not  available, the system would fall back to the "weak" behavior.
  •  ip_strict_src_multihoming == 2, where routes would only be selected  if the outgoing interface has the IP source.


ip_strict_dst_multihoming remains unchanged, and impacts the "receive-side" behavior of the system.

Stay tuned: we expect to be adding some user-friendly tunables for all this through ipadm in the very near future!



Wednesday Mar 26, 2008

Brussels support for nge driver now available in nv86.

Miles Xu and Jason King have putback changes to plug nge into the Brussels framework in SNV yesterday, so that snv86 will now have an nge driver that is configurable through Brussels! 

Check out Miles blog describing this new feature!

Thursday Jan 24, 2008

Brussels framework putback to snv_83!

PSARC 2007/429 was putback Jan 24 2008! This putback provides a configuration framework for administering network drivers through the GLDv3 framework in Solaris Nevada. This feature should be available on bfu archives built on/after Jan 25 2008, and in snv_83.

This feature has a few ramifications for Network device-driver developers:Project teams that propose new network drivers should no longer need to provide ndd(1m) entry points for administering their driver. Instead, such drivers should provide setprop and getprop entry points as defned in PSARC 2007/429.

The putback of PSARC 2007/429  also converts the bge driver to the Brussels framework, so that the recommended method for administering properties of the bge driver is using the dladm(1m) command.

Information about the Brussels framework, including documentation of the driver interfaces for Brussels can be found at   http://opensolaris.org/os/project/brussels

A preview of the draft dladm(1M) man page is also available for those wishing to use the new features introduced by the Brussels/Clearview projects at   http://www.opensolaris.org/os/project/clearview/dladm-uv-brussels.1m.txt

What's next?

  • convert more drivers to plug into the Brussels framework- e1000g and   igb drivers are next on the pipeline with more to come soon,
  • implement the Peristence feature by leveraging on the dlmgmtd provided  by Project Clearview. This will allow  dladm(1m) to be used for persistently setting tunable values, and the setting will automatically  be incorporated at the next restart of the driver (or reboot)
  • For legacy drivers, we will clean up  the ioctl code path so that the ndd ioctl goes through the Brussels  framework (though we will continue to provide legacy support for existing ndd usage in datalink drivers). Details coming soon!
  •  Phase 0 cleanup of ndd abuse/deficiencies in the TCP/IP layer: e.g.,   inappropriately use of ndd(1m) in debugging kernel data-structures, when  better tools like dtrace and mdb are available, we have a profusion  of ndd commands (e.g., for /dev/rts) that are poorly documented and  understood. Shrinking this abuse to the bare essentials will place us in a better situation to evaluate alternatives/enhancements to ndd(1m)
Onwards, Upwards!

 


Thursday Oct 11, 2007

Brussels and data link administration

Solaris has a lot of cutting-edge tunable features implemented in its network drivers, but administering these features remains a somewhat chaotic story.Even a frequently tuned property like Jumbo Frame MTU can be quite a bear to administer, as Shantnu recently discovered.

 And there are more of these instances. For example, driver writers are frequently confused about the expected semantics for ethernet flow-control in solaris . Even Solaris engineers are sometimes confounded by the existing definitions.

 The good news is that Sun is working on improving these interfaces and bringing in a cleaner administrative interface. The project is Brussels and its objectives are to pull together the best of the exisiting methods, so that

  • like ndd, we can configure properties on the fly,
  • like driver.conf, we can configure properties like Jumbo Frames (but without the teeth gnashing syntax) and have them persist across reboot
  • leverage from the flexible syntax  and other features (like the show- subcommands) introduced in dladm(1m), while making this  play  nicely with smf
  • have a uniform, intuitive syntax for configuring properties (no more head scratching over whether its link_duplex or link_mode, and if it should be 0 for half-duplex, or 1 for half-duplex!)

The most important requirement to make this succesful is feedback from Administrators about what you would like to see implemented in Brussels- so please contribute your input! Here are some examples of the improvements being introduced by Brussels..

 For example, Brussels introduces the "show-ether" sub-command in dladm that will allow the administrator to view the status of the ethernet network. In the vanilla invocation,

# dladm show-ether bge1
LINK          PTYPE    STATE    AUTO SPEED-DUPLEX    PAUSE
bge1          current  up       yes  1G-f             tx


Which says that the bge1 interface is UP (i.e., the driver is is RUNNING)  with autonegotiation enabled, speed 1Gbps, full duplex. It also shows that flow control is in the "tx" (Transmit) direction only, i.e., we will send pause frames when congested, but ignore any received pause frames from the peer.

I can get "extended" output :

# dladm show-ether -x bge1
LINK     PTYPE    STATE    AUTO SPEED-DUPLEX    PAUSE
bge1     current  up       yes  1G-f              tx
--       capable  --       yes  1G-fh,100M-fh,10M-fh      bi
--       adv     --       yes  1G-f                      tx
--       peeradv  --       yes  1G-f                      bi

 
The additional rows tell me the hardware capabilities of the local endpoint ("capable") , those advertised to the peer ("adv") and those advertised by the peer ("peeradv").Note that the speed-duplex capabilites of bge1 are (1G, full duplex), (1G, half-duplex), (1000 Mbps, full duplex), (1000 Mbps, half duplex), (10 Mbps, full duplex), (10Mbps, half duplex). Although my bge1 driver is capable of bi-directional flow-control, it has been administratively configured to advertise TX only. Since the peer is advertising bi-directional flow control,  I  try:

 

# dladm set-linkprop -p flowctrl=bad bge1    # I have a syntax error!
dladm: warning: link property 'flowctrl' must be one of: no,rx,tx,bi

So I correct my error, and now have the state:

# dladm set-linkprop -p flowctrl=bi bge1  
# dladm show-ether -x bge1               
LINK    PTYPE    STATE    AUTO SPEED-DUPLEX          PAUSE
bge1    current  up       yes  1G-f                  both
--      capable  --       yes  1G-fh,100M-fh,10M-fh  both
--      adv     --       yes  1G-f                  both
--      peeradv  --       yes  1G-f                  both


We are also adding better observability into the kernel state- see Artem's blog on the new mdb macros that will make it possible to look at a crash dump or a running kernel's mdb state and figure out what customizations (including Private properties, known only to the driver!) have been applied.

Friday Nov 24, 2006

surya delivered into Nevada!

Surya has been delivered into snv, and should be available via SX 11/06!

For my next challenge, I'm going to look at the problem of providing a simple, intuitive interface for driver configuration .. the current methods (there are more than one of these!) involve arcane incantations, sometimes done via <driver>.conf files, sometimes via ndd, sometimes via kstat(1m), sometimes via  system(4), and, on occasion, through all of these methods! All of this is needlessly complex: dladm(1m) interfaces should be used to  provide a standardized interface for driver configuration.. watch this space for details !

 

Thursday May 04, 2006

Introduction

I've been in the kernel networking group at Sun for about 6 years now, working on various aspects of routing (in.routed, zebra/quagga, forwarding table performance). In the past, I have worked in diverse companies on various aspects of networking, including Parametric Technology Corp where I worked on interfaces to SUNRPC libraries for CAD tools, in DEC/Compaq where I was doing kernel networking for Tru64 UNIX (IPv6, iptunnels, 802.3ad, kernel routing), and even briefly at LANL where I worked on PVM/MPI.

 My current passion is the Surya project, where Sangeeta Misra and I are attempting to stream-line the packet-processing path, while replacing the existing forwarding-table data-structures/code with the FreeBSD radix tree. Check out the details in the design document.

My other pastime is to train myself to become a dtrace "power user". I'm putting together a DTrace Networking Cookbook Contributions of all shapes and sizes are invited!
About

sowmini

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today