Monday Apr 28, 2014

DBAs Guide to Deploying Sandboxes in the Cloud

Overview

The need for a private, secure and safe area for data discovery within the data warehouse ecosystem is growing rapidly as many companies start investing in and investigating "big data". Business users need space and resources to evaluate new data sources to determine their value to the business and/or explore news way of analyzing existing datasets to extract even more value.  These safe areas are most commonly referred to as "Sandboxes" or "Discovery Sandboxes" or "Discovery Zones".  If you are not familiar with the term then Forrester Research defines a "sandbox" as:

“data exploration environment where a power user can analyse production […] with near complete freedom to modify data models, enrich data sets and run the analysis whenever necessary, without much dependency on IT and production environment restrictions.” *1

These sandboxes are tremendously useful for business users because they allow them to quickly and informally explore new data sets or new ways of analyzing data without having to go through the formal rigour normally associated with data flowing into the EDW or deploying analytical scripts within the EDW. They provide business users with a high degree of freedom. The real business value is highlighted in a recent article by Ralph Kimball:

In several of the e-commerce enterprises interviewed for this white paper, analytic sandboxes were extremely important, and in some cases hundreds of the sandbox experiments were ongoing simultaneously.

As one interviewee commented “newly discovered patterns have the most disruptive potential, and insights from them lead to the highest returns on investment" *2

Key Characteristics

So what are they key characteristics of a sandbox? Essentially there are three:

  1. Used by skilled business analysts and data scientists
  2. Environment has fewer rules of engagement
  3. Time boxed

Sandboxes are not really designed to be used by CIOs or CEOs or general BI users. They are designed for business analysts and data scientists who have a strong knowledge of SQL, detailed understanding of the business and the source data that is being evaluated/analyzed. As with many data exploration projects you have to be able to understand the results that come back from a query and be able to determine very quickly if they make sense.

As I stated before, the normal EDW rules of engagement are significantly relaxed within the sandbox and new data flowing into the sandbox is typically disorganised and dirty. Hence the need for strong SQL skills to create simplified but functional data cleaning and transformation scripts with the emphasis being to make new data usable as quickly as possible. Part of the "transformation" process might be to generate new data points derived from existing attributes. A typical example of this is where a data set contains date-of-birth information, which in itself is quite a useful piece of information, that can be transformed to create a new data point of "age". Obviously the business analysts and data scientist need to be reasonably proficient in SQL to create the required transformation steps - it is not a complicated process but it highlights the point that the business community needs to have the necessary skills so that they are self-sufficient.

Most importantly the sandbox environment needs to have a time limit. In the past this is where most companies have gone wrong! Many companies fail to kill off their sandboxes. Instead these environments evolve and flourish into shadow marts and/or data warehouses which end up causing havoc as users can never be sure which system contains the correct data. Today, most enlightened companies enforce a 90-day timer on their sandboxes. Once the 90 day cycle is complete then ownership of the processes and data are either moved over to the EDW team, who can then start to apply the corporate standards to the various objects and scripts, or the environment and all its data is simply dropped.

The only way a business can support the hundreds of live sandbox experiments described in Kimball's recent report (*2) is by enforcing these three key characteristics.

Choosing your deployment model:

Over the years that I have spent working on various data warehouse projects I have seen a wide variety of  weird and wonderful deployment models designed to support sandboxing. In very general terms these various deployment models reduce down to one of the following types:

  1. Desktop sandbox
  2. Detached sandbox
  3. Attached sandbox

each one of these deployment models has benefits and advantages as described here:

1. Desktop Sandboxes

Many business users prefer to use their desktop tools, such as spreadsheet packages, because the simple row-column data model gives them a simplified and easily managed view of their data set. However, this approach places a significant processing load on the desktop computer (laptop or PC) and while some vendors offer a way to off-load some of that processing to bespoke middleware servers this obviously means implementing an additional specialised middleware server on dedicated hardware.  Otherwise, companies have to invest large amounts of money upgrading their desktop systems with additional memory and solid-state disks.

Creating a new sandbox is just a question of opening a new, fresh worksheet and loading the required data set. Obviously, the size and breadth of the dataset is limited by the resources on the desktop system and complicated calculations can take a considerable time to run with little or no scope for additional optimisation or tuning. Desktop sandbox are, by default, data-silos and completely disconnected from the enterprise data warehouse which makes it very difficult to do any sort of joined-up analysis. 

The main advantage of this approach is that power users can easily run what-if models where they redefine their data model to test new "hierarchies", add new dimensions or new attributes. They can even change the data by simply over-typing existing values. Collaboration is a simple process of emailing the spreadsheet model to other users for comments. The overriding assumption here is that users who receive the spreadsheet are actually authorised to view the data! Of course there is nothing to prevent recipients forwarding the data to other users. Therefore, it is fair to say that data security is non-existent.

For DBAs, the biggest problem with this approach is that it offers no integration points into the existing cloud management infrastructure. Therefore, it is difficult for the IT team to monitor the resources being used and make appropriate x-charges.  Of course the DBA has no control over the deletion of desktop based sandboxes so there is a tendency for these environments to take on a life of their own with business users using them to create "shadow" production systems that are never decommissioned.

Overall, the deployment of desktop sandboxes is not recommended.

2. Detached Sandboxes

Using a detached, dedicated sandbox platform resolves many of the critical issues related to desktop sandbox platforms most notably the issues relating to: data security and processing scalability. Assuming a relatively robust platform is used to manage the sandboxes then the security profiles implemented in the EDW can be replicated across to the stand-alone platform. This approach still allows users to redefine their data model to test new "hierarchies", add new dimensions or new attributes within what-if models and even change data points but this ability is "granted" by the DBA rather than being automatically taken and enforced by the business user. In terms of sharing results there is no need to distribute data via email and this ensures everyone gets the same consistent view of the results (and by default the original source, should there be a need to work backwards from the results to the source).

Key concerns for business users is the level of latency that occurs from the need to unload and reload not only the required data but also all the supporting technical and business metadata. Unloading, moving and importing large historical data sets can be very time consuming and can require large amounts of resources on the production system - which may or may not be available depending on the timing of the request. 

For the DBA issues arise around the need to monitor additional hardware and software services in the data center. For IT this means more costs because additional floor space, network bandwidth, power and cooling may be required. Of course, assuming that the sandbox platform fits into the existing monitoring and control infrastructure then x-charging can be implemented. In this environment the DBA has full control over the deletion of a sandbox so they can prevent the spread of "shadow" production data sets. For important business discoveries, the use of detached sandboxes does provide the IT team with the opportunity to grab the loading and analysis scripts and move them to the production EDW environment. This helps to reduce the amount of time and effort needed to "productionize" discoveries.

While detached sandboxes remove some of the disadvantages of desktop platforms it is still not an ideal way to deliver sandboxes to the business community.

3. Attached Sandboxes

Attached sandboxes resolve all the problems associated with the other two scenarios. Oracle provides a rich set of in-database features that allow business users to work with in-place data, which in effect, removes the issue of data latency. Oracle Database is able to guarantee complete isolation for any changes to dimensions, hierarchies, attributes and/or even individual data points so there is no need to unload, move and then reload data. All the existing data security policies remain in place which means there is no need to replicate security profiles to other systems where there is the inherent risk that something might be missed in the process.

For the DBA, x-charging can be implemented using existing infrastructure management tools. The DBA has full control over the sandbox in terms of resources (storage space, CPU, I/O) and duration. The only concern that is normally raised regarding the use of attached sandboxes is the impact on the existing operational workloads. Fortunately, Oracle Database, in conjunction with our engineered systems, has a very robust workload management framework (see earlier posts on this topic: https://blogs.oracle.com/datawarehousing/tags/Workload_Management). This means that the DBA can allocate sufficient resource to each sandbox while ensuring that the key operational workloads continue to meet their SLAs. Overall, attached sandboxes, within an Oracle Database environment, is a win-win solution: both the DBA and the business community get what they need.

Summary

Deployment Model

Benefit

Disadvantages

Desktop Sandbox

High degree of local control over data
“Fast” performance
Quick and easy sharing of results

Reduced data scalability
Not easy to integrate new data
Very costly to implement
Undermines data consistency-governance
Data security is compromised

Detached Sandbox

Reduces workload on EDW
Upload personal/external data to sandbox
Explore large volumes of data without limits

Requires additional hardware and software
Requires replication of corporate data
High latency
Replication + increased management of operational metadata

Attached Sandbox

Upload additional data to virtual partitions Easy to mix new data with corporate data
No replication of corporate data
Efficient use of DW platform resources
Data access controlled by enterprise security features

Requires robust workload management tools

From this list of pros and cons it is easy to see that the "Attached Sandbox"  is the best deployment model to use. Fortunately, Oracle Database 12c has a number of new features and improvements to existing features that mean it is the perfect platform for deploying and managing attached sandboxes.

B-O-X-D: the lifecycle of a sandbox

Now we know what type of sandbox we need to deploy (just in case you were not paying attention - attached sandboxes!) to keep our business users happy the next step is to consider the lifecycle of the sandbox along with the tools and features that support each of the key phases. To make things easier I have broken this down into four key DBA-centric phases as shown below:

Sandbox lifecycle

Over the next four weeks I will cover these four key phases of the sandbox lifecycle and explain which Oracle tools and Oracle Database features are relevant and how they can be used. 

Footnotes

*1 Solve the Data Management Conflict Between Business and IT, by Brad Peters - Information Management Newsletters, July 20, 2010

*2 The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics by Ralph Kimball

Tuesday Apr 15, 2014

OpenWorld call for Papers closes today!

 Just a gentle reminder - if you have not submitted a paper for this year's OpenWorld conference then there is still just enough time because the deadline is Today (Tuesday, April 15) at 11:59pm PDT. The call for papers website is here http://www.oracle.com/openworld/call-for-papers/index.html and this provides all the details of how and what to submit.

I have been working with a number of customers on some really exciting papers so I know this year's conference is going to be really interesting for data warehousing and analytics. I would encourage everyone to submit a paper, especially if you have never done this before. Right now both data warehousing and analytics are among the hottest topics in IT and I am sure all of you have some great stories that you could share with your industry peers who will be attending the conference. It is a great opportunity to present to your peers and also learn from them by attending their data warehouse/analytics sessions during this week long conference. And of course you get a week of glorious Californian sunshine and the chance to spend time in one of the World's most beautiful waterfront cities.

If you would like any help submitting a proposal then feel free to email during today and I will do my best to provide answers and/or guidance. My email address is keith.laker@oracle.com.

Have a great day and get those papers entered into our OpenWorld system right now! 

Wednesday Feb 05, 2014

OTN Virtual Developer Day Database 12c content now available on-demand

Thank you to everyone who attended the SQL pattern matching session during yesterday's OTN Virtual Developer Day event. We had a great crowd of people join our live workshop session. I hope everyone enjoyed using the amazing platform which the OTN team put together to host the event.  

The great news is that all the content from the event is now available for download and you can watch the all on-demand videos from the four tracks (Big Data DBA, Big Data Developer, Database DBA and Database Developer). 

The link to fantastic OTN VDD platform is here: https://oracle.6connex.com/portal/database2014/login?langR=en_US&mcc=aceinvite and this is what the landing pad page looks like:

OTNVDD Me

This page will give you access to the keynote session by Tom Kyte and Jonathan Lewis which covered the landscape of Oracle DB technology evolution and adoption.  The content looks at what's next for Oracle Database 12c looking at the high value technologies and techniques that are driving greater database efficiencies and innovation.

You will be able to access the videos, slides from each presentation and a huge range of technical hands-on labs covering big data and database technologies, including my SQL Pattern Matching workshop. If you want to download the the Virtualbox image for the Database tracks it is available here: http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html (this contains everything you need to run my SQL Pattern Matching workshop).

While you doing the workshop, if you have any questions then please feel free to email me - keith.laker@oracle.com.

Enjoy.

Friday Jan 24, 2014

How to create more sophisticated reports with SQL

SQL Pivot Cube

This is a continuation of a series of posts that cover Oracle's SQL extensions for reporting and analysis. As reviewed in earlier blog posts (Part 1 and Part 2), We have extended SQL's analytical processing capabilities by introducing a family of aggregate and analytic SQL functions. Over the last couple of days I have been exploring how to use some of these SQL features to create tabular-style reports/views. In the past when I worked as a BI Beans product manager I would typically build these types of reports using OLAP cubes and/or Java code. It is been very interesting to work through some of my old report scenarios and transfer my Java aggregation processing back into the database by using SQL to do all the heavy lifting without having to resort to low-level coding.

[Read More]

Tuesday Oct 22, 2013

OOW content for Pattern Matching....

If you missed my sessions at OpenWorld then don't worry - all the content we used for pattern matching (presentation and hands-on lab) is now available for download.

My presentation "SQL: The Best Development Language for Big Data?" is available for download from the OOW Content Catalog, see here: https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=9101

For the hands-on lab ("Pattern Matching at the Speed of Thought with Oracle Database 12c") we used the Oracle-By-Example content. The OOW hands-on lab uses Oracle Database 12c Release 1 (12.1) and uses the MATCH_RECOGNIZE clause to perform some basic pattern matching examples in SQL. This lab is broken down into four main steps:
  • Logically partition and order the data that is used in the MATCH_RECOGNIZE clause with its PARTITION BY and ORDER BY clauses.
  • Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause. These patterns use regular expressions syntax, a powerful and expressive feature, applied to the pattern variables you define.
  • Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause.
  • Define measures, which are expressions usable in the MEASURES clause of the SQL query.
You can download the setup files to build the ticker schema and the student notes from the Oracle Learning Library. The direct link to the example on using pattern matching is here: http://apex.oracle.com/pls/apex/f?p=44785:24:0::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:6781,2.

Wednesday Jun 27, 2012

FairScheduling Conventions in Hadoop

While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things:

  • Organizations are still determining what their dataflow and analysis workloads will comprise
  • Small deployments under tests aren't likely to show the signs of strains that would send someone looking for resource allocation options
  • The default scheduling options -- the FairScheduler and the CapacityScheduler -- are not placed in the most prominent position within the Hadoop documentation.

However, for production deployments, it's wise to start with at least the foundations of scheduling in place so that you can tune the cluster as workloads emerge. To do that, we have to ask ourselves something about what the off-the-rack scheduling options are. We have some choices:

  • The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis.
  • The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis.
  • Writing your own implementation of the abstract class org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually overkill.

If you're going to have several concurrent users and leverage the more interactive aspects of the Hadoop environment (e.g. Pig and Hive scripting), the FairScheduler is definitely the way to go. In particular, we can do user-specific pools so that default users get their fair share, and specific users are given the resources their workloads require.

To enable fair scheduling, we're going to need to do a couple of things. First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. We do this by adding the following to the mapred-site.xml file in HADOOP_HOME/conf:

<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>

<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/path/to/allocations.xml</value>
</property>

<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>

<property>
<name>pool.name</name>
<value>${user.name}</name>
</property>

What we've done here is simply tell the JobTracker that we'd like to task scheduling to use the FairScheduler class rather than a single FIFO queue. Moreover, we're going to be defining our resource pools and allocations in a file called allocations.xml For reference, the allocation file is read every 15s or so, which allows for tuning allocations without having to take down the JobTracker.

Our allocation file is now going to look a little like this

<?xml version="1.0"?>
<allocations>
<pool name="dan">
<minMaps>5</minMaps>
<minReduces>5</minReduces>
<maxMaps>25</maxMaps>
<maxReduces>25</maxReduces>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
</pool>
<mapreduce.job.user.name="dan">
<maxRunningJobs>6</maxRunningJobs>
</user>
<userMaxJobsDefault>3</userMaxJobsDefault>
<fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
</allocations>

In this case, I've explicitly set my username to have upper and lower bounds on the maps and reduces, and allotted myself double the number of running jobs. Now, if I run hive or pig jobs from either the console or via the Hue web interface, I'll be treated "fairly" by the JobTracker. There's a lot more tweaking that can be done to the allocations file, so it's best to dig down into the description and start trying out allocations that might fit your workload.

[Read More]

Tuesday Apr 10, 2012

Self-Study course on ODI Application Adapter for Hadoop

For those of you who want to work with Oracle Data Integrator and its Hadoop capabilities, a good way to start is the newly released self-study course from Oracle University. You can find the course here.

Enjoy, and if you have any feedback, do send this into Oracle University by logging in (so we can unleash our big data analytics on it ;-) ).

Monday Jul 18, 2011

Is Parallelism married to Partitioning?

In the last couple of months I have been asked a number of times about the dependency of parallelism on partitioning. There seems to be some misconceptions and confusion about their relationship, and some people still believe that partitioning is required to perform operations in parallel. Well, this is not the case, so let me try to briefly explain how Oracle parallelizes operations. Note that 'parallel operation' in this context means to break down a single operation – e.g. a single SQL statement – into smaller unit of works that are processed in parallel to leverage more or even all resources in a system to return the result faster. More in-depth information and whitepapers on Parallel Execution and Partitioning can be found on OTN.

Let’s begin with a quick architecture recap. Oracle is a shared everything system, and yes, it can run single operations in a massively parallel fashion. All of the data is shared between all nodes in a cluster and the capability of how to parallelize operations is not restricted to any predetermined static data distribution done at database/table setup (creation) time. In a pure shared nothing architecture, database files (or tables) have to be partitioned to map disjoint data sets – “partitions” of data – to specific “nodes” of a multi-core or multi-node system to enable parallel processing as illustrated below (note that the distribution in a shared nothing system is normally done based on a hash algorithm; I just used letter ranges for illustration purposes here):

Design strategies for parallelization: Shared Nothing v Shared Everything architecture

In other words, with the Oracle Database any node in a cluster configuration can access any piece of data in an arbitrary parallel fashion, and the data does not have to be “partitioned” at creation time for parallelization.

Oracle’s parallel execution determines the most optimal subset of the smaller parallel units of work based on the user’s request – namely the SQL statement. Oracle does not rely on Oracle Partitioning to enable parallel execution, but takes advantage of partitioning wherever it makes sense, just like a shared nothing system; we will talk about this a bit later.

So how does Oracle dynamically determine these smaller units of work? In Oracle, the basic unit of work in parallelism is called a granule. Oracle divides the operation being parallelized (for example a table scan, table update, or index creation) into granules. Oracle distinguishes two different kinds of granules for initial data access: Block range granules and Partition based granules.

Block range granules are ranges of physical blocks from a table. Each parallel execution server works on its own block range granule. Block range granules are the basic unit of most parallel operations even on partitioned tables. Therefore, from an Oracle Database perspective, the degree of parallelism (DOP) is not related to the number of partitions or even the fact whether a table is partitioned or not. The number of granules and their size correlates with the DOP and the object size.

That said, Oracle is flexible in its decisions: some operations can benefit from an underlying partitioned data structure and leverage individual partitions as granules of work, which brings me to the second kind of granules; with partition based granules only one parallel execution server performs the work for all of the data in a single partition or subpartition (the maximum allowable DOP obviously then becomes the number of partitions). 

The best known operation that can benefit from parallel granules is a join between large equi-partitioned tables, a so-called partition-wise join. A partition-wise join takes place when tables to be joined are equi-partitioned on the join key or the operation is executed in parallel and the larger one of the tables to being joined is partitioned on the join key.

By joining the equivalent partitions the database has to do less data distribution and the sort/join resource requirements are reduced. The picture below illustrates the execution of a partition-wise join. Instead of joining all data of SALES with all data of CUSTOMERS, partition pairs are joined in parallel by multiple parallel execution servers, having each parallel execution server working on one pair of partitions. You can understand this a little bit like shared-nothing-like processing at runtime.

Now that we know about Oracle’s choice of granules how do we tell which type of granules was chosen?  The answer lies in the execution plan. In the example below, operation ID 7 (above the table access of SALES) it says PX BLOCK ITERATOR, which means that block range granules have been used to access table SALES in parallel.

To illustrate partition-wise joins and the use of partition based granules consider the following example: table SALES is range partitioned by date (8 partitions) and sub partitioned by hash on cust_id using 16 subpartitions for each range partition, for a total of 128 subpartitions. The CUSTOMER table is hash partitioned on cust_id with 16 hash partitions.

Below is the execution plan for the following query:

SELECT c.cust_last_name, COUNT(*)

FROM sales s, customers c

WHERE s.cust_id = c.cust_id AND

s.time_id BETWEEN TO_DATE('01-JUL-1999', 'DD-MON-YYYY') AND

     (TO_DATE('01-OCT-1999', 'DD-MON-YYYY'))

GROUP BY c.cust_last_name HAVING COUNT(*) > 100;

Operation ID 11 above the table access of SALES says PX PARTITION RANGE ITERATOR which means partition based granules were used to access partition 8 and 9 and subpartitions 113 to 144 (Partition  pruning which be discussed in another blog)

Operation ID 8 says PX PARTITIONS HASH ALL which means the corresponding 16 hash subpartitions of the SALES table are joined to customer table using a full partition-wise join. (For more information on Partition-Wise joins refer to  Oracle® Database VLDB and Partitioning Guide)

As we have seen,unlike shared nothing systems, Oracle parallel execution combines the best of both worlds, providing the most flexible way of parallelizing an operation but still take full advantage of pre-existing partitioning strategies for optimal execution when needed.

Wednesday May 25, 2011

Parallel Load: Uniform or AutoAllocate extents?

Over the last couple of years there has been a lot of debate about space management in Data Warehousing environments and the benefits of having fewer larger extents. Many believe the easiest way to achieve this is to use uniform extents but the obvious benefits can often be out weighed by some not so obvious drawbacks.

For performance reasons most loads leverage direct path operations where the load process directly formats and writes Oracle blocks to disk instead of going through the buffer cache. This means that the loading process allocates extents and fills them with data during the load. During parallel loads, each loader process will allocate it own extent and  no two processes work on the same extent. When loading data into a table with UNIFORM extents each loader process will allocate its own Uniform extent and begin loading the data, if the extents  are not fully populated the table is left with a lot of partially filled extents, effectively creating ‘HOLES’ in the table. Not only is this space wastage but it also impacts query performance as subsequent queries that scan the table have to scan all of the extents even if they are not fully filled.

What is different with AUTOALLOCATE? AUTOALLOCATE will dynamically adjust the size of an extent and trim the extent after the load in case it is not fully loaded (Extent Trimming)


Tom Kyte covers this problem in great details in his post Loading and Extents but below is a simple example just to illustrate what a huge difference there can be in space management when you load into a table with uniform extents versus a table with autoallocated extents.

1) Create two tablespaces: Test_Uniform (using uniform extent management), Test_Allocate (using auto allocate)

create tablespace test_uniform datafile '+DATA/uniform.dbf' SIZE 1048640K

AUTOEXTEND ON NEXT 100M

EXTENT management local uniform size 100m;

create tablespace test_allocate datafile '+DATA/allocate2.dbf' SIZE 1048640K

AUTOEXTEND ON NEXT 100M

EXTENT management local autoallocate segment space management auto;

2)Create a flat file with a 10,000,000 records.

-rw-r--r-- 1 oracle dba 1077320689 May 17 16:59 TEST.dat

3)Do a parallel direct path load of this file into each tablespace:

create table UNIFORM_TEST                                                                         parallel
tablespace Test_
_uniform                                                                        as  select * from big_table_ext;

create table AUTOALLOCATE_TEST
parallel
tablespace Test_allocate
as select * from big_table_ext;

Let's view the space usage using a PL/SQL package called show_space  .

 

 As you can see from the results there are no unformatted blocks in the autoallocate table as extent trimming has taken place after the load was complete. The same can not be said for the uniform_test table. It is quite evident from the numbers that there is substantial space wastage in the uniform_test table. Although the count of full blocks are the same in both cases, the Total Mbytes used is 10x greater in the Uniform table.

Conclusion

Space utilization is much better with autoallocate becuase of extent trimming. As I said before more information on this topic can be found on Tom's Kyte post.

 

Wednesday Apr 20, 2011

Calculating Parameter Values for Parallelism

[Read More]

Wednesday Mar 23, 2011

Best Practices: Data Warehouse in Archivelog mode?

[Read More]

Monday Feb 14, 2011

Auto DOP and Queuing Parameter Settings

[Read More]

Wednesday Sep 08, 2010

Session Queuing and Workload Management

[Read More]

Friday Apr 23, 2010

Oracle Communications Data Model

[Read More]
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« February 2016
SunMonTueWedThuFriSat
 
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     
       
Today