Saturday Aug 20, 2005

comments on gigabit netwrok basedon 48 port switch

gigabit network based with 48 port switch

I had a blog commenting on the IB network that was based on 24 port chips and switches
In gigabit world, 48 port is very common, so how can one construct non blocking bisectionally network based on 48 port switches?
Just like the IB cases, one can use 3 tier switches to construct an no blocking network
we will use 24 port to connect to hosts and 24 port to connect to switches.
  • for 1 link between the swithes, we get 24x2x24=1152 node
  • for 2 links : 24x2x12=576
  • for 3 links 24x2x8=384
  • for 4 links 24x2x6=288
  • for 6 links 24x2x4=192
  • for 8 links 24x2x3=144
  • for 12 links 24x2x2=96
  • for 24 links 24x2=48
For blocking network there are more CONFIGURATIONS

Tuesday Aug 09, 2005

IPMI observation

IPMI observation

There are things to learn everyday, IPMI is becoming importent part of the SUN's Opteron server management environment
Interesting to find that IBM and AMD also sign up as v2 adapter
There were discussion about what are "asserted and de-asserted means, in one of the v2 spec it said the following:
  • asserted:Active-high(positive true) signals are asserted when in the high electrical state (near power potential). Active-low(negative true) signals are asserted when in the low electrical state (near ground potential)
  • deasserted: A signal is deasserted when in the inactive state. Active-low signal names have "_L" appended to the end of signal mnemonic. Active-high signal name have no "_L" suffix. To reduce confusion when referring to active-high and active-low signals, the term one/zero, high/low, and true/false are not used when describing signal states

Friday Dec 10, 2004

PowerPC 970 vs Opteron

IBM's JS20 blade server is a high density PowerPC 970 2P server

  • 14 in a chassis
  • share FD and CDROM
  • share other modules, SAN, IB, Gigabit
  • 2 DIMM slots and need to use 2 slots, need 2GB DIMM for 4GB
  • two IDE drive 40GB, does not seems to have RAID support
  • two gigabit NIC
  • three years on-site limited warranty for part and labor
  • support SUSE 8,9 and RHELv3
  • CSM 1.3.2 for linux
  • ESSL 4.1, P ESSL 3.1 for linux
  • VisulAge C++ for linux
  • XL Fortran v8.1 for linux
  • Myrinet-2000 for linux
  • support GPFS linux
  • support loadlever for linux

PPC 970 chip has the followings:

  • improved power4 chip
  • 0.13um process, 121 mmxmm die size, 52 m transistors. 1.3v core voltage, 1.8Ghz@42W, now has 2.2GHz
  • I$=64KB direct-mapped,D$=32KB, 2-way assoc, 512MB L2
  • 1.1GHz FSB
  • 4 flops per clock peak

Since ppc/linux is not binary comparable with x86/linux, so one need to recompile every applications to run on ppc/linux

  • no gridengine6 support
  • no pathscale support
  • no ROCKS support

Thursday Sep 30, 2004

Myrinet vs Infiniband

A few comment on myrinet vs Infiniband

HCA port1/22
connect typeFibrecopper
max switch ports128/256(soon)288
local memory 2/4 MB128/256MB
The dual ports in HCA can be used to construct a Highly Available Network. Because of the cost most cluster only use one port in HCA, myrinet provide one port HCA, the dual port HCA in IB , one port is not used.

MPI over Infiniband

There are many implementations of MPI over Infiniband, from google search, one find:
  • MVAPICH(MPI-1 over VAPI for InfiniBand): This is an MPI-1 implementation. It is on Verbs Level Interface (VAPI), developed by Mellanox Technologies . The implementation is based on MPICH and MVAPICH. MVAPICH is pronounced as ``em-vah-pich''. The latest release is MVAPICH 0.9.4 (includes MPICH 1.2.6) support G5, IA-32, IA-64, x86_64
  • MVAPICH2 (MPI-2 over VAPI for InfiniBand): This is an alpha version of MPI-2 implementation over InfiniBand based on the beta-test version of MPICH2. It is on Verbs Level Interface (VAPI), developed by Mellanox Technologies . MVAPICH2 is currently available for IA-32 architecture
  • scali MPI support any interconnect providing a DAT interface (uDAPL). This enables the MPI to utilize the latest performance enhancements and uDAPL implementations provided by third part vendors. ( Expanded support for InfiniBand. The following versions or later are supported: InfiniCon 2.1, Mellanox 3.1, Topspin 2.0 and Voltaire 2.1
  • MPICH2-CH3 Device for Infiniband Based on the Verbs interface, use Mellanox Verbs implementation VAPI, this project implements the CH3 interface of MPICH2 - a so called MPICH2-CH3-IB Device.
  • LAM/MPI 7.1 it is possible to make use of the high bandwidth and low latency of Infiniband networks. LAM/MPI provides Infiniband support using the Mellanox Verbs Interface (VAPI). Currently LAM/MPI uses the Infiniband send/receive protocol for tiny messages and RDMA for long messages. RDMA for tiny messages will be expored in the near future

Sunday Sep 26, 2004

parallel FS for HPTC cluster

In a HPC cluster, one need parallel FS for the compute node to access the data

This weblog will try to summary the different parallel FS existing today.

  1. NFS: most universal parallel FS under IP , NFSv4 has many new features
    • state, An NFSv4 client uses state to notify an NFSv4 server of its intentions on a file: locking, reading, writing..
    • byte-range locking and share reservation
    • file delegation. An NFSv4 server can allow an NFSv4 client to access and modify a file in its's own cache without sending any network request until another client try to access the same file
    • compound RPCs into single RPC request
    • protocal support for file migration and replication
    • combines disparate NFS protocols (stat, NLM, mount, ACL and NFS) into a single protocol
  2. QFS, SUN, run on Solaris/SPARC, require FC-SAN
    • separate metadata
    • DAU, 4k, 15k,32k,64K
    • Direct IO capability
    • SC support
  3. PVFS, PVFS2, on linux cluster, opensource
    • Distributed metadata
    • Supoort TCP/IP, infiniband , myrinet
    • User-controlled stripping of files across nodes
    • Multiple interfaces , MPI-IO via ROMIO
    • Support for heterogenous clusters
  4. GPFS, IBM, on AIX and linux support FC-SAN or SAN Simulation

    • Scalable FS
    • Parallel FS, store blocks of a single file in multiple nodes
    • Shared FS, use token management system to ensure data consistency
    • HA FS, provides automated recovery
    • Client side data caching
    • large blocksize options 16k, 64k, 256k,513k,1024k
    • Blocking-level locking, DLM
    • Data protection, metadata and user data can be replicated
  5. Lustre FS, on linux,
    • support TCP, Quadrics, Elan, Myrinet GM, SCI, SDP and Infiniband interconnect,
    • opensource
    • metadta server and Object storage server
    • support 1000+ nodes
  6. RedHat Global FS, on linux,
    • use SAN constructed with iSCSI or Fibre Channelt,
    • opensource
    • support direct IO
    • support 100+ nodes
    • has no single-point-of-failure
  7. CXFS, SGI, support IRIX, Solaris and Windown NT, require FC-SAN
  8. OpenAFS,
    • support Solaris sparc/x86, many other OS
    • opensource
  9. TerraGrid from Terrascale
    • clients running TerraGrid iSCSI Initiators
    • Storage nodes running TerrGrid iSCSI Targets
    • no need of metadata
    • buildin HA
    • support linux, amd and em64T



Top Tags
« July 2016