Oracle Linux, virtualization , Enterprise and Cloud Management Cloud technology musings

  • July 24, 2013

The Ksplice differentiator

It's been exactly two years since we acquired a small startup called Ksplice. Ksplice as a company created the zero downtime update technology for the Linux kernel and they provided a service to their customers which tracked Linux kernel security fixes and providing these fixes as zero downtime Ksplice updates.

Essentially the ksplice technology allows us to create Linux kernel patches that can be applied in an online fashion. We are not talking about the ability to install a patch while the system is running and make it active after reboot. We are talking about a running system with a given kernel being patched and this patch becoming active instantly, without the need for a costly reboot (costly in terms of downtime caused by a reboot that has to be scheduled or coordinated and causing systems and applications to be unavailable during this time).

We offer this service as part of Oracle Linux Premier (and premier-limited) support, there's no extra $$ add-on option for this, anyone with Premier/Premier-limited has full access to this service. We support both the, what we call, Red Hat Compatible Kernel(RHCK) and the Unbreakable Enterprise Kernel (UEK). So whether a customer starts from RHEL, from OL with RHCK or OL with UEK, they're covered.

Essentially, when we release security errata for Oracle Linux, specifically for the Linux kernel itself, we release, as usual, a new kernel RPM and customers can just apply this RPM, reboot the server and they have the errata applied/active. Or, if they install the Ksplice update, then we provide what we call Ksplice zero downtime patches for each of the security fixes and they can then be applied to their running systems without reboot and the fixes are active/effective immediately. This can be done while production applications continue to run, at any point during the day or night, no need to bring applications down or do any specific planning.

Aside from continuing the model of providing a service for security updates, we have since done a few additional things for our customers :

1) We integrated the Ksplice portal with the Unbreakable Linux Network (ULN). When a customer logs into ULN they can generate a Ksplice key or when they run up2date to register a system they can automatically set it up to be enabled for ksplice tools. So there is no longer a need to have a separate registration, one with ULN and one with the Ksplice update server. We now do this behind the scenes. The customer can then install the uptrack tools (these are the tools that download and apply the updates) and be ready to use apply the updates. You can read more about that

2) While the above made it very convenient, it still meant that every server had to connect directly to servers hosted by us (oracle.com) and for many customers, it's very difficult to have servers in a datacenter be directly connected or have direct access to the internet. So to help these customers we created the offline client. A customer can create a local yum repository which contains RPMs that contain the ksplice updates for a given kernel and then distribute the updates locally within the company from a server on the intranet. This makes it easier to have one system that is registered with our system instead of having to register each server individually. You can read more about that here.

Now - there is another fundamental use-case for the Ksplice technology that we have incorporated in the Oracle Linux support service.

We obviously have a large and rapidly growing customer base that runs mission critical systems on Oracle Linux. Database servers running on top of Oracle Linux, WebLogic servers, Oracle Applications,... etc.. in order to provide the very best customer experience we have trained our engineers on the Oracle Linux side to make use of Ksplice technology in the case of gathering diagnostics and even fix specific problems.

Imagine a server that cannot have downtime but we are working on diagnosing a problem. In some cases, we would typically create a kernel with some extra debug or diagnostics code, provide this to the customer and then they schedule a reboot to apply this kernel, they run their system, after gathering the data, they re-install the original kernel and continue. Then, if we find out the issue and have a fix for this, we can provide them with a kernel that has the fix, they have to schedule downtime, apply the fix and reboot. Typically these systems are interconnected. What do I mean by that? Typically a database server has an application frontend, they are multitiered environments, you have, for instance, 3 middle tier servers connecting to a database server. So in order to reboot the database server, the application admin has to first schedule downtime for the app, bring down the app, and then the database admin can bring down the database and sysadmin can reboot the server. Yes it's that complex. So if you have to do three reboots, you can imagine the cost of that and the time impact. It's not just about a quick reboot of the server, there's a whole ecosystem that goes along with this.

In our case, when a customer has a critical production ticket open with us for their database server, we can do the following :

1) if we cannot get diagnostics without adding code to the Linux kernel, we can create a Ksplice update for the diagnosics and provide this Ksplice update. The customer can then apply this update onto their production system, without any downtime whatsoever. (1 reboot saved on the backend, and saved an application shutdown for each app)

2) once we gather the diagnostics, we can ask the customer to UNDO this Ksplice update, without downtime, the technology supports applying and removing patches online. (2 reboots saved and again saved an application shutdown for each app)

3) if we then determine what the problem is, and we come up with a patch/bugfix, we can create a Ksplice update for this fix and let the customer apply this on their production system, again, without downtime, without any additional work. (3 reboots saved and yet again saved an application shutdown for each app)

This is service is only provided in critical situations. We apply this model internally (1) on our own production system that run Oracle Linux (2) to provide fixes to customers running engineered systems like Exadata and Exalogic.

Another thing that is important to point out is that you do not have to reboot your system in order to start using this service. An existing environment that's up and running can be made Ksplice ready by just installing a few additional tools and no reboot needed. Once the tools are installed they can be used to apply the updates to your current running system, so even prepping the server is without downtime.

Some customers have procedures in place that apply security updates on a regular basis and as such they don't always want or need to be current when a security update is released, so they might have less use for the security update service but certainly still very much can rely on the latter model of fixing critical issues. Other customers have very strict requirements in patching vulnerabilities as soon as possible, for them, the service we offer by releasing security errata both as a separate kernel RPM and a Ksplice update, this service is just absolutely invaluable. They are released at the same time, so you can be constantly up to date without worry.

Let me give you an example using Oracle Linux 5 :

I went to the extreme and installed Oracle Linux 5 update 4 (released 9-3-2009). Installed the Ksplice uptrack tools and without doing any reboot, I can now start updating my system using uptrack-upgrade.

A timed 50 seconds later, the following errata updates were applied on this running system (without any impact on any running applications) :

Installing [v5267zuo] Clear garbage data on the kernel stack when handling signals.
Installing [u4puutmx] CVE-2009-2849: NULL pointer dereference in md.
Installing [302jzohc] CVE-2009-3286: Incorrect permissions check in NFSv4.
Installing [k6oev8o2] CVE-2009-3228: Information leaks in networking systems.
Installing [tvbl43gm] CVE-2009-3613: Remote denial of service in r8169 driver.
Installing [690q6ok1] CVE-2009-2908: NULL pointer dereference in eCryptfs.
Installing [ijp9g555] CVE-2009-3547: NULL pointer dereference opening pipes.
Installing [1ala9dhk] CVE-2009-2695: SELinux does not enforce mmap_min_addr sysctl.
Installing [5fq3svyl] CVE-2009-3621: Denial of service shutting down abstract-namespace sockets.
Installing [bjdsctfo] CVE-2009-3620: NULL pointer dereference in ATI Rage 128 driver.
Installing [lzvczyai] CVE-2009-3726: NFSv4: Denial of Service in NFS client.
Installing [25vdhdv7] CVE-2009-3612: Information leak in the netlink subsystem.
Installing [wmkvlobl] CVE-2007-4567: Remote denial of service in IPv6
Installing [ejk1k20m] CVE-2009-4538: Denial of service in e1000e driver.
Installing [c5das3zq] CVE-2009-4537: Buffer underflow in r8169 driver.
Installing [issxhwza] CVE-2009-4536: Denial of service in e1000 driver.
Installing [kyibbr3e] CVE-2009-4141: Local privilege escalation in fasync_helper().
Installing [jfp36tzw] CVE-2009-3080: Privilege Escalation in GDT driver.
Installing [4746ikud] CVE-2009-4021: Denial of service in fuse_direct_io.
Installing [234ls00d] CVE-2009-4020: Buffer overflow mounting corrupted hfs filesystem.
Installing [ffi8v0vl] CVE-2009-4272: Remote DOS vulnerabilities in routing hash table.
Installing [fesxf892] CVE-2006-6304: Rewrite attack flaw in do_coredump.
Installing [43o4k8ow] CVE-2009-4138: NULL pointer dereference flaw in firewire-ohci driver.
Installing [9xzs9dxx] Kernel panic in do_wp_page under heavy I/O load.
Installing [qdlkztzx] Kernel crash forwarding network traffic.
Installing [ufo0resg] CVE-2010-0437: NULL pointer dereference in ip6_dst_lookup_tail.
Installing [490guso5] CVE-2010-0007: Missing capabilities check in ebtables module.
Installing [zwn5ija2] CVE-2010-0415: Information Leak in sys_move_pages
Installing [n8227iv2] CVE-2009-4308: NULL pointer dereference in ext4 decoding EROFS w/o a journal.
Installing [988ux06h] CVE-2009-4307: Divide-by-zero mounting an ext4 filesystem.
Installing [2jp2pio6] CVE-2010-0727: Denial of Service in GFS2 locking.
Installing [xem0m4sg] Floating point state corruption after signal.
Installing [bkwy53ji] CVE-2010-1085: Divide-by-zero in Intel HDA driver.
Installing [3ulklysv] CVE-2010-0307: Denial of service on amd64
Installing [jda1w8ml] CVE-2010-1436: Privilege escalation in GFS2 server
Installing [trws48lp] CVE-2010-1087: Oops when truncating a file in NFS
Installing [ij72ubb6] CVE-2010-1088: Privilege escalation with automount symlinks
Installing [gmqqylxv] CVE-2010-1187: Denial of service in TIPC
Installing [3a24ltr0] CVE-2010-0291: Multiple denial of service bugs in mmap and mremap
Installing [7mm0u6cz] CVE-2010-1173: Remote denial of service in SCTP
Installing [fd1x4988] CVE-2010-0622: Privilege escalation by futex corruption
Installing [l5qljcxc] CVE-2010-1437: Privilege escalation in key management
Installing [xs69oy0y] CVE-2010-1641: Permission check bypass in GFS2
Installing [lgmry5fa] CVE-2010-1084: Privilege escalation in Bluetooth subsystem.
Installing [j7m6cafl] CVE-2010-2248: Remote denial of service in CIFS client.
Installing [avqwduk3] CVE-2010-2524: False CIFS mount via DNS cache poisoning.
Installing [6qplreu2] CVE-2010-2521: Remote buffer overflow in NFSv4 server.
Installing [5ohnc2ho] CVE-2010-2226: Read access to write-only files in XFS filesystem.
Installing [i5ax6hf4] CVE-2010-2240: Privilege escalation vulnerability in memory management.
Installing [50ydcp2k] CVE-2010-3081: Privilege escalation through stack underflow in compat.
Installing [59car2zc] CVE-2010-2798: Denial of service in GFS2.
Installing [dqjlyw67] CVE-2010-2492: Privilege Escalation in eCryptfs.
Installing [5mgd1si0] Improved fix to CVE-2010-1173.
Installing [qr5isvgk] CVE-2010-3015: Integer overflow in ext4 filesystem.
Installing [sxeo6c33] CVE-2010-1083: Information leak in USB implementation.
Installing [mzgdwuwp] CVE-2010-2942: Information leaks in traffic control dump structures.
Installing [19jigi5v] CVE-2010-3904: Local privilege escalation vulnerability in RDS sockets.
Installing [rg7pe3n8] CVE-2010-3067: Information leak in sys_io_submit.
Installing [n3tg4mky] CVE-2010-3078: Information leak in xfs_ioc_fsgetxattr.
Installing [s2y6oq9n] CVE-2010-3086: Denial of Service in futex atomic operations.
Installing [9subq5sx] CVE-2010-3477: Information leak in tcf_act_police_dump.
Installing [x8q709jt] CVE-2010-2963: Kernel memory overwrite in VIDIOCSMICROCODE.
Installing [ff1wrijq] Buffer overflow in icmpmsg_put.
Installing [4iixzl59] CVE-2010-3432: Remote denial of service vulnerability in SCTP.
Installing [7oqt6tqc] CVE-2010-3442: Heap corruption vulnerability in ALSA core.
Installing [ittquyax] CVE-2010-3865: Integer overflow in RDS rdma page counting.
Installing [0bpdua1b] CVE-2010-3876: Kernel information leak in packet subsystem.
Installing [ugjt4w1r] CVE-2010-4083: Kernel information leak in semctl syscall.
Installing [n9l81s9q] CVE-2010-4248: Race condition in __exit_signal with multithreaded exec.
Installing [68zq0p4d] CVE-2010-4242: NULL pointer dereference in Bluetooth HCI UART driver.
Installing [cggc9uy2] CVE-2010-4157: Memory corruption in Intel/ICP RAID driver.
Installing [f5ble6od] CVE-2010-3880: Logic error in INET_DIAG bytecode auditing.
Installing [gwuiufjq] CVE-2010-3858: Denial of service vulnerability with large argument lists.
Installing [usukkznh] Mitigate denial of service attacks with large argument lists.
Installing [5tq2ob60] CVE-2010-4161: Deadlock in socket queue subsystem.
Installing [oz6k77bm] CVE-2010-3859: Heap overflow vulnerability in TIPC protocol.
Installing [uzil3ohn] CVE-2010-3296: Kernel information leak in cxgb driver.
Installing [wr9nr8zt] CVE-2010-3877: Kernel information leak in tipc driver.
Installing [5wrnhakw] CVE-2010-4073: Kernel information leaks in ipc compat subsystem.
Installing [hnbz3ppf] Integer overflow in sys_remap_file_pages.
Installing [oxczcczj] CVE-2010-4258: Failure to revert address limit override after oops.
Installing [t44v13q4] CVE-2010-4075: Kernel information leak in serial core.
Installing [8p4jsino] CVE-2010-4080 and CVE-2010-4081: Information leaks in sound drivers.
Installing [3raind7m] CVE-2010-4243: Denial of service due to wrong execve memory accounting.
Installing [od2bcdwj] CVE-2010-4158: Kernel information leak in socket filters.
Installing [zbxtr4my] CVE-2010-4526: Remote denial of service vulnerability in SCTP.
Installing [mscc8dnf] CVE-2010-4655: Information leak in ethtool_get_regs.
Installing [8r9231h7] CVE-2010-4249: Local denial of service vulnerability in UNIX sockets.
Installing [2lhgep6i] Panic in kfree() due to race condition in acpi_bus_receive_event.
Installing [uaypv955] Fix connection timeouts due to shrinking tcp window with window scaling.
Installing [7klbps5h] CVE-2010-1188: Use after free bug in tcp_rcv_state_process.
Installing [u340317o] CVE-2011-1478: NULL dereference in GRO with promiscuous mode.
Installing [ttqhpxux] CVE-2010-4346: mmap_min_addr bypass in install_special_mapping.
Installing [ifgdet83] Use-after-free in MPT driver.
Installing [2n7dcbk9] CVE-2011-1010: Denial of service parsing malformed Mac OS partition tables.
Installing [cy964b8w] CVE-2011-1090: Denial of Service in NFSv4 client.
Installing [6e28ii3e] CVE-2011-1079: Missing validation in bnep_sock_ioctl.
Installing [gw5pjusn] CVE-2011-1093: Remote Denial of Service in DCCP.
Installing [23obo960] CVE-2011-0726: Information leak in /proc/[pid]/stat.
Installing [pbxuj96b] CVE-2011-1080, CVE-2011-1170, CVE-2011-1171, CVE-2011-1172: Information leaks in netfilter.
Installing [9oepi0rc] Buffer overflow in iptables CLUSTERIP target.
Installing [nguvvw6h] CVE-2011-1163: Kernel information leak parsing malformed OSF partition tables.
Installing [8v9d3ton] USB Audio regression introduced by CVE-2010-1083 fix.
Installing [jz43fdgc] Denial of service in NFS server via reference count leak.
Installing [h860edrq] Fix a packet flood when initializing a bridge device without STP.
Installing [3xcb5ffu] CVE-2011-1577: Missing boundary checks in GPT partition handling.
Installing [wvcxkbxq] CVE-2011-1078: Information leak in Bluetooth sco.
Installing [n5a8jgv9] CVE-2011-1494, CVE-2011-1495: Privilege escalation in LSI MPT Fusion SAS 2.0 driver.
Installing [3t5fgeqc] CVE-2011-1576: Denial of service with VLAN packets and GRO.
Installing [qsvqaynq] CVE-2011-0711: Information leak in XFS filesystem.
Installing [m1egxmrj] CVE-2011-1573: Remote denial of service in SCTP.
Installing [fexakgig] CVE-2011-1776: Missing validation for GPT partitions.
Installing [rrnm0hzm] CVE-2011-0695: Remote denial of service in InfiniBand setup.
Installing [c50ijj1f] CVE-2010-4649, CVE-2011-1044: Buffer overflow in InfiniBand uverb handling.
Installing [eywxeqve] CVE-2011-1745, CVE-2011-2022: Privilege escalation in AGP subsystem.
Installing [u83h3kej] CVE-2011-1746: Integer overflow in agp_allocate_memory.
Installing [kcmghb3m] CVE-2011-1593: Denial of service in next_pidmap.
Installing [s113zod3] CVE-2011-1182: Missing validation check in signals implementation.
Installing [2xn5hnvr] CVE-2011-2213: Denial of service in inet_diag_bc_audit.
Installing [fznr6cbr] CVE-2011-2492: Information leak in bluetooth implementation.
Installing [nzhpmyaa] CVE-2011-2525: Denial of Service in packet scheduler API
Installing [djng1uvs] CVE-2011-2482: Remote denial of service vulnerability in SCTP.
Installing [mbg8auhk] CVE-2011-2495: Information leak in /proc/PID/io.
Installing [ofrder8l] Hangs using direct I/O with XFS filesystem.
Installing [tqkgmwz7] CVE-2011-2491: Local denial of service in NLM subsystem.
Installing [wkw7j4ov] CVE-2011-1160: Information leak in tpm driver.
Installing [1f4r424i] CVE-2011-1585: Authentication bypass in CIFS.
Installing [kr0lofug] CVE-2011-2484: Denial of service in taskstats subsystem.
Installing [zm5fxh2c] CVE-2011-2496: Local denial of service in mremap().
Installing [4f8zud01] CVE-2009-4067: Buffer overflow in Auerswald usb driver.
Installing [qgzezhlj] CVE-2011-2695: Off-by-one errors in the ext4 filesystem.
Installing [fy2peril] CVE-2011-2699: Predictable IPv6 fragment identification numbers.
Installing [idapn9ej] CVE-2011-2723: Remote denial of service vulnerability in gro.
Installing [i1q0saw7] CVE-2011-1833: Information disclosure in eCryptfs.
Installing [uqv087lb] CVE-2011-3191: Memory corruption in CIFSFindNext.
Installing [drz5ixw2] CVE-2011-3209: Denial of Service in clock implementation.
Installing [2zawfk0b] CVE-2011-3188: Weak TCP sequence number generation.
Installing [7gkvlyfi] CVE-2011-3363: Remote denial of service in cifs_mount.
Installing [8einfy3y] CVE-2011-4110: Null pointer dereference in key subsystem.
Installing [w9l57w7p] CVE-2011-1162: Information leak in TPM driver.
Installing [hl96s86z] CVE-2011-2494: Information leak in task/process statistics.
Installing [5vsbttwa] CVE-2011-2203: Null pointer dereference mounting HFS filesystems.
Installing [ycoswcar] CVE-2011-4077: Buffer overflow in xfs_readlink.
Installing [rw8qiogc] CVE-2011-4132: Denial of service in Journaling Block Device layer.
Installing [erniwich] CVE-2011-4330: Buffer overflow in HFS file name translation logic.
Installing [q6rd6uku] CVE-2011-4324: Denial of service vulnerability in NFSv4.
Installing [vryc0xqm] CVE-2011-4325: Denial of service in NFS direct-io.
Installing [keb8azcn] CVE-2011-4348: Socket locking race in SCTP.
Installing [yvevd42a] CVE-2011-1020, CVE-2011-3637: Information leak, DoS in /proc.
Installing [thzrtiaw] CVE-2011-4086: Denial of service in journaling block device.
Installing [y5efh27f] CVE-2012-0028: Privilege escalation in user-space futexes.
Installing [wxdx4x4i] CVE-2011-3638: Disk layout corruption bug in ext4 filesystem.
Installing [cd2g2hvz] CVE-2011-4127: KVM privilege escalation through insufficient validation in SG_IO ioctl.
Installing [aqo49k28] CVE-2011-1083: Algorithmic denial of service in epoll.
Installing [uknrp2eo] Denial of service in filesystem unmounting.
Installing [97u6urvt] Soft lockup in USB ACM driver.
Installing [01uynm3o] CVE-2012-1583: use-after-free in IPv6 tunneling.
Installing [loizuvxu] Kernel crash in Ethernet bridging netfilter module.
Installing [yc146ytc] Unresponsive I/O using QLA2XXX driver.
Installing [t92tukl1] CVE-2012-2136: Privilege escalation in TUN/TAP virtual device.
Installing [aldzpxho] CVE-2012-3375: Denial of service due to epoll resource leak in error path.
Installing [bvoz27gv] Arithmetic overflow in clock source calculations.
Installing [lzwurn1u] ext4 filesystem corruption on fallocate.
Installing [o9b62qf6] CVE-2012-2313: Privilege escalation in the dl2k NIC.
Installing [9do532u6] Kernel panic when overcommiting memory with NFSd.
Installing [zf95qrnx] CVE-2012-2319: Buffer overflow mounting corrupted hfs filesystem.
Installing [fx2rxv2q] CVE-2012-3430: kernel information leak in RDS sockets.
Installing [wo638apk] CVE-2012-2100: Divide-by-zero mounting an ext4 filesystem.
Installing [ivl1wsvt] CVE-2012-2372: Denial of service in Reliable Datagram Sockets protocol.
Installing [xl2q6gwk] CVE-2012-3552: Denial-of-service in IP options handling.
Installing [l093jvcl] Kernel panic in SMB extended attributes.
Installing [qlzoyvty] Kernel panic in ext3 indirect blocks.
Installing [8lj9n3i6] CVE-2012-1568: A predictable base address with shared libraries and ASLR.
Installing [qn1rqea3] CVE-2012-4444: Prohibit reassembling IPv6 fragments when some data overlaps.
Installing [wed7w5th] CVE-2012-3400: Buffer overflow in UDF parsing.
Installing [n2dqx9n3] CVE-2013-0268: /dev/cpu/*/msr local privilege escalation.
Installing [p8oacpis] CVE-2013-0871: Privilege escalation in PTRACE_SETREGS.
Installing [cbdr6azh] CVE-2012-6537: Kernel information leaks in network transformation subsystem.
Installing [1qz0f4lv] CVE-2013-1826: NULL pointer dereference in XFRM buffer size mismatch.
Installing [s0q68mb1] CVE-2012-6547: Kernel stack leak from TUN ioctls.
Installing [s1c6y3ee] CVE-2012-6546: Information leak in ATM sockets.
Installing [2zzz6cqb] Data corruption on NFSv3/v2 short reads.
Installing [kfav9h9d] CVE-2012-6545: Information leak in Bluetooth RFCOMM socket name.
Installing [coeq937e] CVE-2013-3222: Kernel stack information leak in ATM sockets.
Installing [43shl6vr] CVE-2013-3224: Kernel stack information leak in Bluetooth sockets.
Installing [whoojewf] CVE-2013-3235: Kernel stack information leak in TIPC protocol.
Installing [7vap7ys6] CVE-2012-6544: Information leak in Bluetooth L2CAP socket name.
Installing [0xjd0c1r] CVE-2013-0914: Information leak in signal handlers.

That's 190 kernel fixes that were released between 2009 and now, applied in one go, on a running system. When I go and look at the number of kernels we have released to customers for this version since 2009, I find just over 60 kernel RPMS. So that means, if you wanted to be current using the traditional model of applying kernel updates, the model used by the other Linux vendors (or any other OS for that matter), you'd have scheduled 60 reboots on your servers (for each server with all the multitiered complexities), or if you did it once every several months, still at least 6-10 reboots per server.

Now, with us, 0. Rebootless updates, active when installed, not after reboot. Zero downtime updates, active when installed not after reboot.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.