Sunday Jul 19, 2009

Hadoop Architecture

Hadoop Architecture
Cloud computing is a convergence of High Performance Computing architectures, Web 2.0 data models, and Enterprise computing data scale.
Cloud Analytics should leverage Sun's compelling storage architecture.

Hadoop Distributed File System (HDFS)
is scalable with high availability and high performance. HDFS on servers with 3 cluster nodes minimum (1 Master Node and 2 Slaves Nodes). The blocks data are 64 MB (default) / 128 MB, every block is replicated  3 times (default). NameNode is the metadata of the file system. The files are divided and distributed on DataNodes.
MapReduce is a data processing software and is designed to store and stream extremely large datasets in batch, not intended for realtime querying and does not support ramdom access. JobTracker schedules and manages jobs, TaskTracker executes individual map() and reduce() tasks on each cluster node.
HBase is distributed storage system, column-oriented and multi-dimensional, This software is very interesting to manage very large structured data for the web semantic. HBase can manage billions of rows, millions of columns, thousands of versions and petabytes across thousands of servers. Realtime querying.
Hive is a system for managing and querying structured data built on top of Hadoop with SQL as data warehousing tool. No realtime querying

High Availability

- The NameNode is a single point of failure (SPOF), the transaction Log is stored in multiple directories and a directory is on the local file system or on a remote file system (NFS/CIFS).
- The secondary NameNode is the copies of FsImage and Transaction Log from NameNode to a temporary directory.
- For increasing the high availability of the Hadoop cluster it is possible to interconnect 2 master nodes (active/passive) servers with Solaris Cluster


- For the security of the Hadoop cluster you should encrypted the data for safeguarding all transactions on the web.

Proof Of Concept

- Create an architecture with minimum three nodes and test the performance and the feasibility of Hadoop.
- For rapidly testing Hadoop you can use the OpenSolaris Hadoop Live CD
- The OpenSolaris LiveHadoop setup install three virtual nodes Hadoop Cluster
        - Once OpenSolaris boots, two virtual servers are created using Zones
        - Zones are very lightweight, minimizing virtualization overheads and leaving more memory for your application
        - The "Global" zone hosts the NameNode and JobTracker, and two "Local" zones each host a DataNode and TaskTracker


- Interface your application with HDFS and implement the "Save as Cloud..." and  "Open from Cloud...". functionalities. Use the Hadoop Java API for your development.

Service and Support

- HDFS, MapReduce, HBase and Hive are Open Source software and supported on OpenSolaris.
- For the US countries it is possible to contact Cloudera for bringing big data to the enterprise with Hadoop.
- Who support Hadoop across the globe ?

Architecture Overview

Sizing for HA Cluster

- Business Data Volume = Customer needs
- No RAID factor, No HBA port
- 2 CPU Quad-core for all servers
- 2 System hard disks
- Number of replication blocks = 3
- Block size = 128 MB
- Temporary Space = 25% of the total hard disk
- Raw Data Volume = 1.25 \* (Business Data Volume \* Nb of replication blocks)
- Number of NameNode Servers = 2
- Number of DataNode Servers = Raw Data Volume / Server Capacity Storage
- NameNode RAM = 64 GB
- DataNode RAM = 32 GB mini

Key Links


Saturday Feb 14, 2009


Open Storage   The Best Performance At The Best Price
Open Storage

Key Business Drivers
  • Manufacturing : Costs reduction – Eco responsibility
  • Telecommunications : Outsourcing - Costs reduction – Eco responsibility
  • Banking & Finance : Increase banking and financial transactions, Inheritance optimization and mangement  - Costs reduction
  • Government : Accomplish more work with fewer resources – Eco responsibility
  • Retail : Need to manage profitability and control expenses – Eco responsibility
  • Media & Entertainment : On-going technological innovation – Costs reduction – Eco responsibility
  • Healthcare : Accelerating employers-led initiatives  - Costs reduction – Eco resposibility
  • Education & Research : Enable anytime, anywhere access - Creating a new form of collaborative education  - Costs reduction – Eco resposibility
  • Transportation & Travel : Social responsibility - Technology, exposure to other cultures - Collecting and sharing experiences - Business travel online adoption - Green initiatives - Costs reduction
  • Energy : Energy education  - Carbon emissions reduction – Power consumption reduction - Energy cost reduction
  • Pharmaceutical : Enhanced information dissemination - Costs reduction - Eco  responsibility
  • IT Outsourcing : Resources consolidation - Green sourcing - Costs reduction
IT Drivers
  • Increase data volume and processing
  • Speed up deployment new services
  • Green IT
  • Infrastructure consolidation
  • IT costs reduction
  • Open Source componants
  • Data management simplification
  • OpenStorage Strategy : Freedom of use - More material choice - More suppliers - Larger community users
  • Sun Storage 7000 Unified Storage Systems Appliance with SATA, SAS and SSD  technologies, JBOD Array, Opteron processors
  • OpenSolaris
  • ZFS Services : Snapshot, Encryption, Replication, Compression, RAID-Z, De-Duplication, Media Management, 1600 PB, Virtual Pools, Dynamic Stripping, Snapshot, Compression embeded, Administration simplified
  • Data Protocols: FS v3 and v4, CIFS, ISCSI, HTTP, WebDAV, FTP, NDMP v4
  • Data Services : Flash Hybrid Storage Pool, RAID-Z (5), RAID-Z DP (6), Mirroring, Striping, Active-active Clustering, Remote Replication, Antivirus via ICAP Protocol, Snapshots, Clones, Compression, Thin Provisioning, End-to-End Data Integrity, Multi-Path I/O, Fault Management
  • Management : DTrace Analytics, Dashboards, Role-Based Access Control, NIS LDAP & AD Alerts, Phone Home, SNMP, Scripting, Upgrade Hardware View, Advanced Networking
Key Performance Indicators
  • Power consumption
  • Return On Investment
  • Total Cost of Ownership
  • Time to deploy a new service
  • Number of Open Source components
  • Number of contributors
  • Economies made by the OpenSource choice
  • Service quality
Added Value Services
  • OpenStorage Workshop
  • Product Deployment Services
  • Sun Learning Services
  • Sun Managed Services
  • Sun Support Services
  • Sun Global Financial Services Operation
  • Objective : Cost effective network unified storage solution. Reduce administration. Reduce reliance on platform/OS knowledge
  • Solution : Sun Storage 7410 Cluster with 2 x J4400. On-board Flash Disk for increased data read performance. Managed Ops contract was uplifted to 3 year 24x7 gold support
  • Customer Benefit : Open Source approach. ZFS - today and future capabilities (Pooled storage). Price. User interface. SSD Integration, an easy and inexpensive expansion


Friday Jun 06, 2008

Cloud Computing

Cloud Computing Sun and Clouds

I think that the Cloud Computing concept exists for several years but the technologies are now available and mature to be implemented in datacenter.
Cloud computing is a real business opportunity for service providers and outsourcing companies. They will be able to manage many datacenters across the world in different countries with lower total cost of ownership. According to me, Cloud Computing is the result of 2 major technologies, the Grid Computing and the Virtualization on servers, storage, network and desktop. Imagine many datacenters distributed in the world and managed as a unique resource. It is now possible in the real life with the new technologies !
The major difficulty for the Cloud computing is the infrastructure scablabilty distributed in any geographic points. If an application has need of more resources unavailable in one datacenter, the Cloud Computing must run simultaneously the application process on a second datacenter and so on.

Sun Value Proposition

  • AMD, INTEL, CMT Processors blades in the same box
  • Multi OS : Linux, Solaris, Windows
  • High Performance Network Gigabit, 10G or Infiniband. Reduction of cabling with switch Magnum
  • Sun Blade 6048 Modular System
  • Sun Datacenter Switch 3456
  • Sun StorageTek J4xxx
  • Sun Storage 7000 Unified Storage System
  • High Performance Storage (Lustre, pNFS, Sun Fire x4540 48TB, SAM-FS Archiving)
  • Sun Studio 12 (for free)
  • Sun Grid Engine (Open Source)
  • Sun HPC Cluster Tools (OpenMPI)
  • Hadoop : Distributed applications with high density of data
  • MogileFS: File System  with horizontal storage extension on unlimited number of machines
  • Dynamic System Domains, Solaris Containers, VMWare, Microsoft Virtual Server
  • Sun xVM Infrastructure with Sun xVM Server ( LDom, Xen) and Sun xVM Ops Center
  • Solaris Cluster and Geo Cluster Edition
  • Storage virtualization : Sun StorageTek 99xxV and Sun Virtual Tape Library, Solaris ZFS
  • Sun Virtual Desktop Infrastructure Software
  • VirtualBox (Client virtualization)

Sunday May 25, 2008

Key Success Factors for Business Value

Key Success Factors

The Key Success Factors (KSF) are the strategic elements that a company must monitor in order to ensure its durability and its ability to outperform its competitors.
The Key Success Factors are conditioned by the company and market environment.

Some Key Success Factors for deliver Business Value in the company :

  • Products and Services standardization
  • IT Processes Optimization
  • Change Management
  • Environnements Analysis, Monitoring, Automation Deployments
  • Increase Hardware Use Ratio, Infrastructure Flexibility, Dynamic Infrastructure
  • Business Continuity
  • Open Source
  • Service Level Agreement
  • ...

Saturday May 24, 2008

Measuring IT's Business Value

Measuring IT's Business Value You can’t manage, What you don’t measure !

IT Value Benefits are beyond Costs reduction, contributing to increase company's profitability
It's not easy to prove that an infrastructure gives business value to a company. I will try to answer at this difficult question.
To measure the IT infrastructure performance it is necessary to understand the performance indicator concept..

What is Key Business Value ?
  • Focus on Industry : Bank/Finance, Government, Retail, Telco, Manufacturing...
  • Business Value is Large : Stakeholder Value, Customer Value, Employee Value, Partner Value, Supplier Value, Managerial Value, Societal Value
  • Key Business Indicators : Profitability, Revenue Growth, Customer Satifaction, Market Share, Cross-Sell Ratio, Marketing Campaign Response Rates, Relationship Duration...
  • Common Language Management
  • IT Portfolio
  • IT Maturity
What is IT Value ?
  • Business/IT Alignment
  • Intellectual Properties
  • IT Process Automation
  • IT Performance
  • Innovation
  • Community
  • Know-How
  • Expertise
  • Service Level Agreement
Key Metrics
  • Key Metrics = Key Performance Indicators (KPIs)
  • Technical performance indicators : CPU, I/O, SAPS, SpecInt, TPC-H, Availability Ratio, Time To Repair, Data Loss Ratio...
  • Financial performance indicators : TCO, ROI, Depreciation...
  • Ecological performance indicators: Space, Watt, CO2, RoHS Ratio, WEEE Recycling Ratio...
What is Performance Lever ?
The Performance Lever is specific key performance indicator, it increases the system performance and it interacts with key indicators
It is a functional indicator, not technical !

  • #Concurrent Users Indicator is a Performance Lever (Business Indicator)
  • #CPU and #I/O are Key Performance Indicators
  • Calculate #CPU = f(#Concurrent users)
  • Calculate #I/O = f(#Concurrent users)
  • Start a Provisioning Process automatically
  • with #CPU and #I/O Values
  • Activate #CPU and #I/O Cards with Capacity on Demand Process
  • Integrate a new Web server with xVM Ops Center and N1 Service Provisioning System process
  • Testing
Measuring IT Value Process
  • Put technical captors in different points of the infrastructure via scripts, software…
  • Gather technical, financial and ecological indicators values
  • Integrate the native indicators values into the CMDB (My SQL for example because it's free software)
  • Calculate the complex indicators with native indicators
  • Analyse the results with reporting tool (StarOffice Calc for example because it's not a expensive solution)
  • Compare the results obtained to the awaited results.
  • Build a dashboard for CIO, IT Managers
  • Infrastructure update with analysis results

Monday Jan 28, 2008

IT Value Propositions

IT Value Propositions

That's a unique Sun IT Propositions which brings Value to the Company

It is an important part of business value proposition as it shows our core-business :
- Our assets compared with competitors
- Our services capabilities
- A significant reference including figured customer benefit
- Functional and technical indicators to drive solution performance

The IT model is based on 5 axis :
  Scalability/Power : Horizontal/Vertical Scalability, Power (CPU, I/O...)
  ECO : Economy (Costs), Ecology (KVA, RoHS, WEEE...)
  Security : (Data, Access...)
  Availability : (Clustering, Components redondancy...)
  Flexibility : (Virtualization, Provisioning...)

A large part of Sun's IT value proposition is based on the fact that we master all the key elements of the IT value chain.
It does not mean we cannot address heterogeneous environments, but it creates the conditions to deliver strong IT solutions to our customers.
We know how to address a broader range of needs and when we answer a business problem from one of our customers,
we are in a position to consider all the aspects of it. This is a strong differentiator compared to some of our competitors who are specialized in one area.

We have defined the Sun IT value propositions that can be seen as templates of the “Business/IT Alignement Approch” which are instantiated when we address a particular customer.
A given IT value proposition defines the typical key performance indicators that we use. It also describes the unique assets and services that Sun owns and that makes Sun proposition unique on the market. Finally, a real life customer experience is presented.

Sun IT Value Propositions

  1. Industrialization and Best Practices : Products/Services, IT processes industrialization and best practices
  2. Standardized technical basis : Normalization and management of technical basis evolutions, architecture principles
  3. Optimization of computer rooms : physical room optimization, consolidation, cooling and electric security
  4. Provisioning : Environment analysis, monitoring and deployments automation
  5. Infrastructure Virtualization : Utilization ratio improvement, infrastructure flexibility
  6. Desktop Virtualization : Access to applications from everywhere in the world with complete security
  7. Web 2.0 : Technologies and Web use for next Internet generation
  8. Eco Datacenter : Economical and Ecological infrastructure for Datacenter
  9. Open Source : Freedom and software components choice
  10. Disaster Recovery Plan : Infrastructure for disaster recovery
  11. Infrastructure Business Application : Technical infrastructure for ERP, Business Intelligence, Data Warehousing
  12. High Performance Computing : Parallel computing grids
  13. Business Continuity : Availability and security infrastructure according  to Service Level Agreement
  14. Identity Management : Users identification and access management
  15. Security : Information access in full security
  16. Archiving : Data Management from its creation to its destruction. Data archiving
  17. Data Protection : Backup, restore, data replication
  18. Services Oriented Architecture : Systems interoperability, Web Services
  19. x86 : Servers and software with high performance at low cost
  20. CMT : Servers and software with high performance at low cost
  21. Cloud Computing : A Software Design and a Set Of Architectures (Grid Computing and Virtualization)

Tuesday Jan 22, 2008

Key Performance Indicators For Business Value

Key Performance indicators KPI to follow the IT Infrastructure Performance

The IT Infrastructure is analyzed according 5 axis of IT model (scalability/power, flexibility, security, availability, economy/ecology). and the performance is measured by the Key Performance Indicators.
Control IT infrastructure is to give the means of measuring the variations of architecture states and of being able to anticipate the risks which degrade its level of maturity and thus the value business delivered by the company.

I propose performance indicators classified by IT Infrastructure Solution
  • Industrialization and Best Practices : Incidents number handled per day, Solving mean time for an incident, Changes number Mean time for the change, Infrastructure Maturity Level, Mean time to repair...
  • Standardized Technical Basis : OS Number, OS number releases, Administration software number, Open source components ratio, Technical basis change frequency, Total Cost of Ownership...
  • Computer Room Optimization : Servers consolidation ratio, Storage consolidation ratio, SwaP ratio, Hot points number in a room, Electric consumption ratio, Square meter reduction ratio, Return on Investment...
  • Provisioning : Time to deploy a new service, Update time, Services number deployed per year, OS number deployed per year, Applications number deployed per year, Administration ratio for a deployment...
  • Infrastructure Vitualization : Virtualized applications ratio, SwaP ratio, Use rate of equipment environment, Return On Investment, Virtual machines number, Availability ratio...
  • Desktop Virtualization : Virtualized terminals number, Virtual machines number deployed per year, Decibels decrease ratio in the call center, Productivity improvement ratio, Temperature decrease ratio in the call center....
  • Web 2.0 : Open Source components number, Electrical consumption, Costs saving, SwaP ratio, Web concurrent users number, Time to Repair, Return on Investment, Total Cost of Ownership...
  • Eco Datacenter : Servers consolidation ratio, Storage consolidation ratio, SwaP ratio, Electric consumption ratio, Square meter reduction ratio, Return On Investment...
  • Open Source : Open Source projects number, Open Source Components number, Contributors number, Economies made by the OpenSource choice, Freedom of Choice, Service Quality, Open standards...
  • Disaster Recovery Plan : Data quality ratio after incident, Recovery point objective, Recovery time objective, Full recovery time objective...
  • Business Application Infrastructure : Response time, Number of concurrent users, Data flow integration time, Availability ratio, Mean time of intervention on site, Number of application modules composing the solution...
  • High Performance Computing : Mean calculation time, Gflops number, Watt/Flop number, Availability ratio, Processors number, Calculation hours number per year...
  • Business Continuity : Data quality ratio after incident, Recovery point objective, Recovery time objective, Full recovery time objective, Hardware availability ratio, Data restore period...
  • Identity Management : Propagation time of a new user, Exemptions number, Applications number integrating the SSO, Notifications number per user, Propagation anomalies number per year, Password number per user...
  • Archiving : Retention period of archived data, Resource consumption per service, Media number on which the data have transited during its lifecycle, Return On Investment, Mean time of an archived data research...
  • Data Protection : Data availability ratio, Data retention duration, Data restore duration, Data quality ratio after incident...
  • Service Oriented Architecture : Time to make a new service available, Cost of inter-services cross charges, Applications ratio participating to the SOA, Response time, Data repository quality...

Saturday Jan 19, 2008

Eco Datacenter

Eco Datacenter Ecology it's good for planet and good for business

OpenEco is a global on-line community that provides free, easy-to-use tools to help participants assess, track, and compare energy performance, share proven best practices to reduce greenhouse gas (GHG) emissions, and encourage sustainable innovation. more

March 21, 2007
- Today is International Earth Day, a day celebrated each year around the world on the vernal equinox. It's also a good time to remind ourselves that even small changes in the way we conduct business can have a big impact on our environment.
At Sun, eco responsibility is about changing the way we approach business, IT, and the environment through sustainable computing. To do that, we innovate, act, and share. more

Our Technology Assets

  • The own Sun experience on his Santa Clara's Datacenter (USA)
       Click on this photo

Thursday Jan 17, 2008

Business Intelligence

Business Intelligence

Business Intelligence drives Business and IT Performance

In today’s highly competitive business climate, making better decisions faster
can mean the difference between surviving and thriving. The challenges are
managing the exponential growth of data in a cost-effective and secure manner,
while transforming relevant data into information for decision support needs. Sun takes the cost and complexity out of today’s business intelligence and data warehouse requirements with a single open platform whose architecture can scale to meet your entire needs from deployment today to meeting your growth needs tomorrow. The results are faster access to information, the ability to make better decisions quickly and speed up time to market.

Sun has more 2000 customers references in business intelligence and data warehousing in the world on all industries (bank/finance, manufacture, retail, government, telco...)
Sun Microsystems developed a network competenties and expert in Business Intelligence and Data Warehousing around the world and working with its partners : SAS, Oracle, Informatica, SAP, etc.
The Sun Microsystems Business Intelligence Solutions integrate specific services around Extraction, Transformation and Loading,  Database, Reporting, OnLine Analytical Processing, Technical architectures, Proof-of-Concept and Benchmarks.

See performance results
DMreview, Wintercorp, TPC           

The qualification of the Business Intelligence technical architecture is declined according three assumptions :

  1. Data storage volume : for disk sizing. technical architecture support the data volume. The useful volume is the raw volume for operational systems with index, agregats, metadata and data work for database system.
  2. Extract, transform and load : for extraction, transformation and data loading. The technical architecture support for ETL process is based on the data flow volume and data processing.
  3. Users volume : for sizing users activities (Reporting, OnLine Analytical Processing). The technical architecture support for reporting process is based on concurrent users number on Data Warehouse and Data Marts.
Our Technology Assets

Tuesday Jan 15, 2008

IT Trends

IT Trends

The Sun solution aligned on the Top Strategic Technologies 2009

Cloud Computing
Servers - Beyond Blade Servers
Green IT
Web-Oriented Architectures
Enterprise Mashups
Specialized Systems
Social Software and Social Networking
Unified Communications
Business Intelligence

IT Trends according Gartner more



Business stakes are changing, the IT infrastructure must be increasingly reactive to significantly reduce Time To Market. Today, we have the technology and methodology addressing these new business challenges.


« July 2016