X

News, tips, partners, and perspectives for the Oracle Solaris operating system

PxFS 介绍与全局挂载透视

Guest Author

只要使用 Sun Cluster 软件,就会用到代理文件系统 (Proxy file system, PxFS)。使用 PxFS 可以配置全局设备,而全局设备是群集中设备管理的中心。源代码已经开发出来了,现在我们有必要介绍一下 PxFS 的一些奇妙之处。我将通过数篇博客文章,概述 PxFS 的体系结构及其源代码参考。在本篇文章中,我将会介绍 PxFS 并解释全局挂载。

PxFS 是一个协议层,它可以通过符合 POSIX 且高度可用的方式在群集节点之间分配基于磁盘的文件系统。这样,就可以实现从多个节点同时进行符合 POSIX 的访问,而不需要应用程序执行文件级锁定。管理员若要完成全局挂载,唯一的要求就是确保挂载点在所有群集节点上存在。然后,在挂载命令后面添加 "-g",就可以实现全局挂载。以下博客文章对术语进行了解释

首先我要说明,创建并全局挂载 UFS 文件系统是非常容易的,甚至不必使用专用的物理设备。下面我将创建一个 lofi 设备,再将其格式化为 UFS,然后将其全局挂载。

注意:请勿在 Solaris 9 中尝试此操作,因为该 Solaris 版本中存在一个 lofs 错误,会使系统出现紧急情况。


# mkfile 100m /var/tmp/100m
# LOFIDEV=`lofiadm -a /var/tmp/100m`
# yes | newfs ${LOFIDEV}

让我们在群集范围内挂载以上 lofi 设备(确保目标目录在所有节点上存在)。



# mount -g ${LOFIDEV} /mnt

搞定!现在您可以在群集的任何节点上访问 /mnt,并以透明方式访问节点 1 上的 lofi 设备的 UFS 文件系统。

现在,让我们深入探讨全局挂载的细节。我将在共享存储中全局挂载一个文件系统,以此作为示例。我们准备了一个三节点群集,它的节点 2 和节点 3 与共享存储直接连接。svm 元设备 "/dev/md/mydg/dsk/d42" 目前正从节点 1 全局挂载在目录 "/global/answer" 上。

这里提供了启动 PxFS 服务的参考代码。

挂载子系统为 HA 服务。从群集的角度讲,HA 服务都具有故障转移功能。任何 HA 服务都有一个主服务器以及一个或多个辅助服务器。当目前的主服务器终止时,任何一个辅助服务器都可以成为主服务器。这种从辅助服务器提升为主服务器的过程对于应用程序是透明的。

对于任何群集设置,始终只有一个挂载服务主服务器,而其他所有节点具有挂载服务辅助服务器。首次为每个群集节点启用全局挂载时,还将为相应的节点创建挂载客户机。

挂载主服务器和辅助服务器是节点加入群集时创建的挂载副本对象的两个方面。这些代码创建了挂载副本服务器。副本框架可确保一次只有一个主服务器,同时在需要时将辅助服务器提升为主服务器

现在介绍执行全局挂载时的操作顺序。请参考下图。全局挂载期间执行的各个步骤已标有序号。将鼠标指针指向编号时将会弹出工具提示,其中描述了对应的步骤并提供了相应代码的链接。




Step 1href="javascript:void(0)"
onmouseover="Tip('The global mount command, mount -g, can be issued from any cluster node. It gets into the kernel and a generic mount redirects the call to PxFS. At this point, the directory to be mounted on is locked.
onnv-gate/usr/src/uts/common/syscall/mount.c#125
ohac/ohac/usr/src/common/cl/pxfs/client/pxvfs.cc#594', WIDTH, 400, TITLE, 'Step 1 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 2href="javascript:void(0)"
onmouseover="Tip('The PxFS client tells the mount server about this global mount request via the mount client on that node. The mount client will have the server reference.
ohac/ohac/usr/src/common/cl/pxfs/client/pxvfs.cc#999', WIDTH, 400, TITLE, 'Step 2 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 3href="javascript:void(0)"
onmouseover="Tip('The mount server in turn asks every client except the originating node, in this case node1, to lock the mount point.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1204', WIDTH, 400, TITLE, 'Step 3 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 3href="javascript:void(0)"
onmouseover="Tip('The mount server in turn asks every client except the originating node, in this case node1, to lock the mount point.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1204', WIDTH, 400, TITLE, 'Step 3 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 4href="javascript:void(0)"
onmouseover="Tip('

For shared devices, the mount server creates a PxFS primary and secondary. The node on which the device is primaried becomes the PxFS primary. For local devices, the mount is non-HA and an unreplicated PxFS server is created. The lofi device example, above, will result in an unreplicated PxFS server being created on node1.

The PxFS server does a hidden mount of the device. Details of the mount is contained in the PxFS server object.

ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1237
ohac/ohac/usr/src/common/cl/pxfs/server/repl_pxfs_server.cc#125', WIDTH, 400, TITLE, 'Step 4 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 4href="javascript:void(0)"
onmouseover="Tip('

For shared devices, the mount server creates a PxFS primary and secondary. The node on which the device is primaried becomes the PxFS primary. For local devices, the mount is non-HA and an unreplicated PxFS server is created. The lofi device example, above, will result in an unreplicated PxFS server being created on node1.

The PxFS server does a hidden mount of the device. Details of the mount is contained in the PxFS server object.

ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1237
ohac/ohac/usr/src/common/cl/pxfs/server/repl_pxfs_server.cc#125', WIDTH, 400, TITLE, 'Step 4 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 5href="javascript:void(0)"
onmouseover="Tip('The mount server passes a reference to the newly created server to all mount clients and asks the clients to do a user-visible PxFS mount.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1438
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1684', WIDTH, 400, TITLE, 'Step 5 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 5href="javascript:void(0)"
onmouseover="Tip('The mount server passes a reference to the newly created server to all mount clients and asks the clients to do a user-visible PxFS mount.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1438
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1684', WIDTH, 400, TITLE, 'Step 5 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 5href="javascript:void(0)"
onmouseover="Tip('The mount server passes a reference to the newly created server to all mount clients and asks the clients to do a user-visible PxFS mount.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1438
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1684', WIDTH, 400, TITLE, 'Step 5 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 6href="javascript:void(0)"
onmouseover="Tip('The mount client creates and adds a vfs_t entry of the same type as the underlying file system.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1804
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1817', WIDTH, 400, TITLE, 'Step 6 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 6href="javascript:void(0)"
onmouseover="Tip('The mount client creates and adds a vfs_t entry of the same type as the underlying file system.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1804
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1817', WIDTH, 400, TITLE, 'Step 6 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />
Step 6href="javascript:void(0)"
onmouseover="Tip('The mount client creates and adds a vfs_t entry of the same type as the underlying file system.
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1804
ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1817', WIDTH, 400, TITLE, 'Step 6 for a global mount', SHADOW, true, FADEIN, 300, FADEOUT, 300, STICKY, 1, CLOSEBTN, true, CLICKCLOSE, true)"
onmouseout="UnTip()" />

为方便阅读,下面汇总了上图中提到的步骤。


  1. 全局挂载命令 mount -g 可以从任何群集节点发出。该命令将���入内核,随后有一个通用挂载将调用重定向到 PxFS。此时,要在其上进行挂载的目录被锁定。

    http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/syscall/mount.c#125 http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/client/pxvfs.cc#594

  2. PxFS 客户机将通过该节点上的挂载客户机通知挂载服务器有关此全局挂载请求的情况。挂载客户机将获得服务器引用。

    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/client/pxvfs.cc#999

  3. 而挂载服务器将指示除始发节点(在本例中为节点 1)以外的其他每个客户机锁定挂载点。

    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1204

  4. 对于共享设备,挂载服务器将创建 PxFS 主服务器和辅助服务器。作为主服务器的设备所在的节点将成为 PxFS 主服务器。对于本地设备,挂载将成为非 HA 挂载,同时将创建一个非复制的 PxFS 服务器。以上 lofi 设备示例将导致在节点 1 上创建一个非复制的 PxFS 服务器。

    PxFS 服务器将对设备执行隐藏挂载。该挂载的细节包含在 PxFS 服务器对象中。

    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1237 http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/server/repl_pxfs_server.cc#125

  5. 挂载服务器将对新创建的服务器的引用传递给所有挂载客户机,并指示这些客户机执行用户可见的 PxFS 挂载。

    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1438 http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_server_impl.cc#1684

  6. 挂载客户机将创建并添加一个 vfs_t 条目,其类型与底层文件系统的相应条目相同。

    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1804 http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/pxfs/mount/mount_client_impl.cc#1817

现在,挂载将在所有客户机上可见。挂载子系统还有其他一些奇妙之处,例如,在节点加入群集时启动一个 fs 副本,或者当某个与存储连接的节点加入群集时创建一个新的 PxFS 辅助服务器或主服务器,等等。下一部分将讨论 PxFS 中正规文件访问的工作方式。

感谢 Walter Zorn 提供了 javascript 库,它使得工具提示的创建容易许多。


Binu Philip
Solaris Cluster 工程部

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.