<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>The Data Warehouse Insider</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/" />
   <link rel="self" type="application/atom+xml" href="http://blogs.oracle.com/datawarehousing/xml/rss.xml" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237</id>
   <updated>2009-11-09T19:03:03Z</updated>
   
   <generator uri="http://www.sixapart.com/movabletype/">Movable Type Enterprise 4.23-en</generator>


<entry>
   <title>Next Generation Data Warehouse Platforms</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/11/next_generation_data_warehouse.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.15468</id>
   
   <published>2009-11-09T19:01:00Z</published>
   <updated>2009-11-09T19:03:03Z</updated>
   
   <summary>Rather than entertain you all with my own theories I decided to be a literature critic and discuss a paper / survey done by TDWI&apos;s Philip Russom. The report goes into &quot;what a next generation DW platform should try to...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Opinion" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="databasemachine" label="Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="parallelexecution" label="Parallel Execution" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sql" label="SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Rather than entertain you all with my own theories I decided to be a literature critic and discuss a paper / survey done by TDWI's Philip Russom. The report goes into "what a next generation DW platform should try to address and what it could look like". You can read an Oracle sponsored copy <a href="http://www.oracle.com/database/docs/tdwi-nextgen-platforms.pdf">here</a>. Do note that the research was sponsored by a large number of vendors - not just Oracle - and that this is just our way of getting a complementary copy to you...</p>  <p>I think there are a couple of interesting outcomes from the report and I'd like to summarize some of it in the following points:</p>  <ul>   <li>You should start thinking about either a new infrastructure or start thinking about how to leverage the existing infrastructure for more advanced data warehousing in the next year or so.</li>    <li>Poor query performance and the lack of advanced analytics (OLAP, Data Mining, Statistics etc.) and inadequate load performance are the top three issues identified in the report. So look for a platform that will go fast (really fast!) and allows you to use advanced analytics for a large use community. </li>    <li>More and more companies are looking towards a more real-time (dare I say Right-Time) environment - for at least parts of the DW - in light of integrating DW data with operational systems or in a fancy way striving for operational excellence. I would think this is where the mixed workloads come from as well, which are an interesting trend in the paper. Again, you will want a platform that will allow for these mixed workloads and the trickle feeds while reading from the DW. </li>    <li>Hardware evolutions do drive platforms in that the capacity increases, the price of that capacity decreases and that storage media change to faster media over the next couple of years. Think flash, think memory, think (maybe even) solid state disks.</li>    <li>Private Clouds - inline with the previous statements - are expected to be seen as an infrastructure. Inline with the big boss, I would prefer to call this a compute grid rather than referring to water vapor too much... But it means that low cost small commodity compute resources racked together and used across workloads (consolidation anyone?) are something in your near future... </li> </ul>  <p>As a true critic, I of course have to say something sour about the report... my sour grape about this report is the question as to whether you would like a DBMS built for transactions as your DW database... Not quite sure what a survey on DW workloads expects to get as an answer. It did surprise me that the DBMS built for DW has flat growth and moderate to good commitment. My conclusion is that this is not necessarily a great discussion to have, as neither seem to be interesting to too many people. I think people focus on finding a platform that qualifies by solving their pain points and/or the above points.</p>  <p>And then of course, to keep on the sour face,&#160; the fun discussion on MPP. I think that everyone agrees that large operational data stores (3NF like schemas in many cases) benefit from parallel processing capacities. What we are seeing is that more and more people need more than just full table scans on these schemas. In light of the above trends (and there is more on those in the report) people require multiple workloads on these systems, read consistency without readers blocking writers etc. In other words, a next generation platform needs to deal with both massively parallel queries and with single transaction level reads and writes. </p>  <p>A discussion around MPP versus SMP versus OLTP DB versus DW DB is not very useful I would think. It should be on the issues and merits of a DBMS and platform not on a theoretical argument. And I think most of the report does bring up the right points you should be thinking about while designing what the platform should be doing. I just think that people should keep in mind that theory is not always the best lead argument. Keith has written a long <a href="http://blogs.oracle.com/datawarehousing/2009/09/oltp_and_data_warehouse_how_do.html">post</a> on a related topic. One other thing, many new DBMS systems have sprung into the DW market, many just featuring a single core feature (compression, columnar storage, in-memory etc.). For a critical system that sees mixed workloads and lots of data, these point solutions may not be the right thing to base an architecture on...</p>  <p>I was also surprised by the Data Quality entry in Good Growth - Good Commitment. While I think that more and more people are paying attention, I still see this as the orphan in the DW environment. Lots of talk, not much action. I just wish it was more action and less talk, because, come on, this is really a problem! So I hope the report is spot on there, but I'm a skeptic (nothing to blame the report for, just me being in this business not seeing too much happening in DQ).</p>  <p>Ok, be done now with the sulking :-)</p>  <p>Couple of things around some new trends and things in Oracle that would help with building an infrastructure for the next generation data warehouse:</p>  <ul>   <li>Performance =&gt; Sun Oracle Database Machine. Yes, it really is fast!</li>    <li>In-memory processing =&gt; Oracle now has (11gR2) In-Memory Parallel Execution. More about this can be read in Maria's excellent post <a href="http://blogs.oracle.com/datawarehousing/2009/09/in-memory_parallel_execution_i.html">here</a>. </li>    <li>In-Database Analytics =&gt; As the report says in Exadata V2 and Oracle 11gR2 we are now offloading data mining model scoring to the storage side of the house, which allows us to embed mining models into more and more operational systems and get online (direct) feedback on transactions. We also have for years moved more and more OLAP and Stats functionality into the engine </li>    <li>Real-time data warehousing =&gt; First and foremost the read consistency model introduced in Oracle 4 (this is not a typo...) allowing readers to see consistent data during writes, secondly, the just completed acquisition of Golden Gate and the ETL capabilities (like streams) in Oracle allow for very nice real time data feeds. Oracle's MAA architecture allows us to be up and running 24*7 on commodity hardware and deliver an online experience to all customers... </li>    <li>Cloud computing =&gt; see the in-database mapreduce post <a href="http://blogs.oracle.com/datawarehousing/2009/10/in-database_map-reduce.html">here</a>. </li>    <li>Appliances =&gt; Sun Oracle Database Machine </li> </ul>  <p>So, rather than finding point solutions for all of these challenges as one of the conclusions shows, maybe a solution would be to choose a platform that actually allows you to do what you need from a single vendor... If you are really thinking operational BI, real time DW it is hard to imagine how you are going to do this with multiple DBMS solutions. If the analytics run on a secondary system, you have just created a delay as you now need to update the EDW and then this analytics system. This simply means you cannot actually do real time data warehousing... Loading just to the analytics system means that you are back into data silos. </p>  <p>Concluding, I think most conclusions of the report are spot on. Think Big Data, understand the business drivers, plan for your systems etc. Not so sure a set of separate point solutions is what you need if you are going to do real time data warehousing...</p>  <p>Anyways, do read the report! I think it gives you good food for thought and a great insight in the trends you will be dealing with in your near future.</p>]]>
      
   </content>
</entry>

<entry>
   <title>Chunking of Updates</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/11/chunking_of_updates.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.15368</id>
   
   <published>2009-11-05T00:44:18Z</published>
   <updated>2009-11-09T19:03:57Z</updated>
   
   <summary>While I was looking at the Edition-based Redefinition I was puzzled by, or thinking about the performance of large data volumes moving across the edition. So I started thinking about how do I get that trigger to do stuff in...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="plsql" label="PL/SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>While I was looking at the Edition-based Redefinition I was puzzled by, or thinking about the performance of large data volumes moving across the edition. So I started thinking about how do I get that trigger to do stuff in parallel. Turns out I was thinking upside down... or inside out is maybe better.</p>  <p>In 11g Release 2 a new DBMS package is introduced called DBMS_PARALLEL_EXECUTE, and in the RUN_TASK procedure it allows me to actually fire a cross-edition trigger, and there is my answer to how to get the redefinition of the data to happen in parallel. Eureka... we already figured this out, it just took me a while to figure out the connections...</p>  <p>If you are just interested in avoiding large rollback segments or trying to find some way of doing updates on data in a small controlled batch manner I would say, have a look at the documentation for DBMS_PARALLEL_EXECUTE. The nice thing about the chunking is that you get restartability and control over the update in smaller chunks without having to build the infrastructure for all of this yourself. </p>  <p>You can read some more <a href="http://www.oracle.com/technology/oramag/oracle/09-nov/o69asktom.html?_template=/ocom/print">here</a> and <a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10577/d_parallel_ex.htm">here</a>.</p>]]>
      
   </content>
</entry>

<entry>
   <title>Edition Based Redefinition</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/10/edition_based_redefinition.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.15148</id>
   
   <published>2009-10-21T22:16:00Z</published>
   <updated>2009-10-22T22:46:29Z</updated>
   
   <summary>Or, how can I use 11gR2 to update my operational data warehouse with real time data loads and still patch the schema to deal with errors or with upgrades due to changing business demands. The Theory In a time when...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="activedatawarehouse" label="Active Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="database" label="Database" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="oracle11g" label="Oracle 11g" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="redefinition" label="Redefinition" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sql" label="SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="upgrade" label="Upgrade" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Or, how can I use 11gR2 to update my operational data warehouse with real time data loads and still patch the schema to deal with errors or with upgrades due to changing business demands.</p>  <h4>The Theory</h4>  <p>In a time when the data warehouse was refreshed weekly (wow, that often?) fixing the schema or the data was a piece of cake. The data was static, the users were querying (you hoped) away and you could tinker with the schema without really making a mess of it all. Take the DW down over the weekend, hey no problem. Those where the days, when you had nice batch windows to load data, weekends to upgrade the schema and the query layers and in general not too much pressure.</p>  <p>Both Operational Data Warehousing and Real Time Data Warehousing make all of the above a blast from the past and the fantasy dream world of all data warehouse architects.</p>  <p>Today chances are that your data warehouse is as critical to your business as the OLTP application that captures sales, orders or something else that makes you money. Taking the system down is not an option in many cases.</p>  <p>In OLTP land this reality has been around for a while... patching the app while the app is running is something that really saves time and money over there.</p>  <p>Oh, but we just said that the DW is like the OLTP app. Therefore, should I use the same patching process and functionality? Yes, and that is where we bring in Edition Based Redefinition.</p>  <h4>The Scenario</h4>  <p>We have a very small star schema, with a single fact table and two dimensions. The idea is to simply use this example and highlight some of the things that should be done to enable online patching. The goal is not to create a very complex scenario and cover every possible case to be looked at...</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_schema_2.png"><img title="Schema, user and objects" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="250" alt="Schema, user and objects" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_schema_thumb.png" width="379" border="0" /></a> </p>  <p>We are also looking at a real time or short load time interval. Again, the scenario will just be inserting a small number of rows, but it should actually show the principle quite well.</p>  <p>To simulate the query angle we have two users, one is the owner of the schema and the objects to be worked on, the other queries these objects. The owner is REDEF_OWN and the user is REDEF_QRY (and yes, very creative naming...).</p>  <h4></h4>  <h4>Redefinition in Action</h4>  <p>Let's get going. We have created the tables (all three) and loaded the initial data set. Users are all happy and business is booming. </p>  <p>Our change means that the product dimension gets a second and third level, but most importantly name needs to be changed to being a code to uniformly identify the product across the globe. This means there will be some re-linking of data as some facts will go to a different product code (the new "name").</p>  <p>First, develop and test all of the steps on a copy in the development environment, then go to work on the editions in production. For a complete list of the steps to do take a look in the documentation (<a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10471/adfns_editions.htm#BABGGADB">here</a>) or download the script with the code for this example.</p>  <p>Now that you know your changes you will create Editioning Views that will regulate access to editioned data structures. This access is both ETL as well as the query access. This setup is shown in the following illustration.</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_procedure_2.png"><img title="blog_redef_procedure" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="278" alt="blog_redef_procedure" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_procedure_thumb.png" width="409" border="0" /></a> </p>  <p>What we did here is rename the actual tables which now have a _GEN suffix and created Editioning Views with the original names. The code looks like this (note the new keyword in the create view syntax!):</p>  <p><font face="Courier New" size="1">ALTER TABLE PRODUCTS RENAME TO PRODUCTS_GEN; </font></p>  <p><font face="Courier New" size="1">CREATE OR REPLACE EDITIONING VIEW PRODUCTS     <br />AS      <br />SELECT&#160; prd_id&#160; <br />,&#160;&#160;&#160;&#160;&#160;&#160; prd_name      <br />,&#160;&#160;&#160;&#160;&#160;&#160; prd_desc      <br />FROM PRODUCTS_GEN;</font></p>  <p>With this redirection in place (and we do this for all three tables to ensure we can modify all of these) we are ready to create the new edition and start modifying the actual objects. As a quick side note on grants and security, make sure the query users (in this case REDEF_QRY) have the correct grants on both the tables and the views to allow for selects.</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_change_edition_screen.png"><img title="redef_change_edition_screen" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="308" alt="redef_change_edition_screen" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_change_edition_screen_thumb.png" width="488" border="0" /></a> </p>  <p>Once we are in the new edition all changes can be made, tested and then published. The screen above shows how to change the Edition and see what edition this session is working with. In this case, we are working in the PHASETWO edition.</p>  <p>As we add the columns to the table, make sure to not change the primary key (yet). We will do that later and by using the view, re-point the foreign key to that new primary key. Also note that we are not changing the definition of the description, so we will leave it as is...</p>  <p>The change looks like this:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_change_objects_screen.png"><img title="redef_change_objects_screen" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="533" alt="redef_change_objects_screen" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_change_objects_screen_thumb.png" width="493" border="0" /></a> </p>  <p>The next step is to get the changes propagated from the live system into the new system. Remember data loads are going on as we set all of this up. For this we need a basic procedure that captures the logic and then use a cross edition trigger on the first edition to invoke the logic for each inserted row (or a batch).</p>  <p>We use a procedure to allow extensions to the example without directly modifying the trigger. For this simple example, the trigger probably would have been flexible enough to deal with the changes.</p>  <p>After creating the procedure (see the code for some simple PL/SQL procedure doing some work), create the forward edition trigger (this is in Edition PhaseTwo!) and enable it. For simplicity we will not go into waiting for pending DML. In a real scenario you may want to do this of course. See the links to the documentation for more information on that step.</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_create_trigger_screen.png"><img title="redef_create_trigger_screen" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="306" alt="redef_create_trigger_screen" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_create_trigger_screen_thumb.png" width="510" border="0" /></a> </p>  <p>At this point, both editions are still in synch. E.g. no data has been changed. First we transform the data to be upgraded by executing the transforms. Obviously, at this point, every insert going into the base schema will have the trigger fire and will show up transformed in the new edition.</p>  <p>Just to show what the actual status is, here is the view a query user will see when looking at products (he is looking at Ora$Base):</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_view_current.png"><img title="redef_qry_view_current" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="224" alt="redef_qry_view_current" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_view_current_thumb.png" width="512" border="0" /></a> </p>  <p>At the same time, if the owner looks at PRODUCTS (he is in PhaseTwo) he will see the following:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_owner_view_before_xform.png"><img title="redef_owner_view_before_xform" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="244" alt="redef_owner_view_before_xform" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_owner_view_before_xform_thumb.png" width="515" border="0" /></a> </p>  <p>As you can see only the non-modified columns are showing (why? - See the actual code of the view - in short, because we are selecting the new columns...).</p>  <p>Using the code in the documentation we will apply the transformations in the new edition and see what the result will be:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_owner_view_after_xform_1.png"><img title="redef_owner_view_after_xform" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="210" alt="redef_owner_view_after_xform" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_owner_view_after_xform_thumb_1.png" width="524" border="0" /></a> </p>  <p>We do the same steps to the fact table to cater for the redefinition of the dimension key (we went from 20 to 40). The details of this are in the code package.</p>  <p>At this point in time we have all data available in the PhaseTwo edition in it current form. Now a new ETL record with the following values comes in for the FACT table (SALES): 20, 10, 200, 3, 20. In other words we are still referring to dimension key 20.</p>  <p>This gives us the following results in the Ora$Base edition, note that the last record refers to dimension key 20 as a product_key:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_after_etl_current.png"><img title="redef_qry_after_etl_current" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="276" alt="redef_qry_after_etl_current" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_after_etl_current_thumb.png" width="524" border="0" /></a> </p>  <p>In the PhaseTwo edition we see the record transformed and we see that the product key field is always 40, even for the new record:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_after_etl_phasetwo.png"><img title="redef_qry_after_etl_phasetwo" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="225" alt="redef_qry_after_etl_phasetwo" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_qry_after_etl_phasetwo_thumb.png" width="535" border="0" /></a> </p>  <p>In effect we have updated our data, redefined some codes, added some columns and all of this with no noticeable impact for the end users.</p>  <p>The situation is now as follows:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_procedure_step2_4.png"><img title="blog_redef_procedure_step2" style="display: inline" height="602" alt="blog_redef_procedure_step2" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/blog_redef_procedure_step2_thumb_1.png" width="489" /></a> </p>  <p>Now we should change the primary and foreign keys as well so it all refers to the new situation, we do this right before the switch of the users and the load processes. Because the cross edition trigger is in place you will get a key violation on the next generation primary key.</p>  <p>After we change the keys, the PK on PRODUCTS_GEN is named PRODUCT_PK_1. Inserting into the dimension in Ora$Base (there are no keys on PRD_ID_1 in that revision) will give the error coming from the trigger:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_etl_current_dupkeyerror.png"><img title="redef_etl_current_dupkeyerror" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="176" alt="redef_etl_current_dupkeyerror" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/EditionBasedRedefinition_9386/redef_etl_current_dupkeyerror_thumb.png" width="535" border="0" /></a> </p>  <p>Now we are all set and we can switch the edition for all users:</p>  <p><font face="Courier New" size="1">ALTER DATABASE DEFAULT EDITION = PhaseTwo;</font></p>  <h4></h4>  <h4></h4>  <p>After a disconnect and reconnect of any users that were connected, they will see the new status and are by default running on the upgraded edition. Next steps will include the removal of the triggers and the procedures. </p>  <p>Obviously your ETL is now changing as well as there are more fields to fill and the fields have different names. Some of the logic built into the triggers will move into ETL. And this is where the whole testing in development and test comes into place. Just make sure the ETL cut over is a simple as the above for the query users.</p>  <h4>A Word on Performance</h4>  <p>The interesting discussion is now whether or not you want to do this exact construct with large batch ETL routines. I would say that when doing this in batch, it may make sense to do a dual data load and use the new logic to populate the new system with the old data. But I'm open to any comments on this...</p>  <h4>A Word on Naming Conventions</h4>  <p>As you can see we are adding columns to the original table. If you start with a clean sheet of paper, you may consider using Redefinition Views as the query object, and keeping the underlying tables more generic. In other words, rather than using _1 suffixes maybe you can use more common or global names that will allow you to keep the tables within some set naming standards (and not have slightly ad hoc _1 version indicators).</p>  <p>The idea therefore is to work with a level of indirection to the query users... if that makes sense.</p>  <h4>So What?</h4>  <p>Why is this so interesting? Didn't you test this all anyways in your test environment? Yes, all of that is true, as we all test our OLTP app upgrade in the lab. The point is that with this edition based approach I can run the ETL process continuously, everyone keeps on querying without any downtime, and then everything is all of a sudden changed with no downtime for the users. That means no interruption on the business at all... that is what you need when you are running an active data warehouse (or should we call this OLDW for On-Line Data Warehousing?).</p>  <h4>More Resources</h4>  <p>Most of the above scripts and a different example can actually be found in the user guides <a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10471/adfns_editions.htm#BABGGADB">here</a> and <a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10471/adfns_editions.htm#ADFNS99913">here</a>. To download the script, <a href="http://blogs.oracle.com/datawarehousing/11gr2_redefinition_script.sql">click here</a>.</p>]]>
      
   </content>
</entry>

<entry>
   <title>Impressions from Openworld</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/10/impressions_from_openworld.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.15061</id>
   
   <published>2009-10-16T01:20:00Z</published>
   <updated>2009-10-16T03:36:34Z</updated>
   
   <summary>Well, Openworld closed today with a wrap up party under (kind of) sunny skies. If nothing else it was a lot better than the weather on Tuesday. Wow, what a storm! Luckily the tents were all up and I think...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="OpenWorld 2009" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="oracledatabasemachine" label="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Well, Openworld closed today with a wrap up party under (kind of) sunny skies. If nothing else it was a lot better than the weather on Tuesday. Wow, what a storm! Luckily the tents were all up and I think everyone just enjoyed the show.</p>  <p>On the data warehousing front, I think this was a great show. For anyone - like me - mostly interested in DW related technology and customers there was a lot to enjoy. We had quite a few sessions where Database machine customers shared their stories. Four of them shared their (top 10) lessons learned while implementing a Database machine. Six of them talked about their experiences on a customer panel. Various other sessions discussed how Database machine changes the data warehousing game. All in all, I think no one should doubt that this is a product very much in demand.</p>  <p>Once my feet are all happy again I will update the blog and give all of you the links to the customer presentations.</p>  <p>That was also shown at the keynote hall where we had some pretty cool demonstrations on an actual V2 Database machine. The demogrounds had a nice steady flow of folks looking for more technical details and we got quite busy discussing what the machine can do. Lots of interest in Data Warehousing, but also a lot of interest around Consolidation and of course OLTP deployments.</p>  <p>A lot more happened at the show, and I'm sure the news outlets covered most of the things Larry Ellison talked about ($10 Million anybody?).</p>  <p>For those looking for some light entertainment, but with a serious message, have a look at the speech by Governor Arnold Schwarzenegger (<a href="http://gov.ca.gov/speech/13613/">here</a>).</p>  <p>For now, it is time to take the shoes off and enjoy a quite evening in front of a TV... See you all next year!</p>]]>
      
   </content>
</entry>

<entry>
   <title>If you are coming to OpenWorld...</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/10/if_you_are_coming_to_openworld.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14843</id>
   
   <published>2009-10-08T20:40:02Z</published>
   <updated>2009-10-08T20:53:15Z</updated>
   
   <summary>A quick update on how to get organized at OpenWorld. The various product teams have created focus area booklets that list all the presentations related to a specific topic and these are now available for download from the OpenWorld site....</summary>
   <author>
      <name>keith.laker</name>
      
   </author>
   
      <category term="OpenWorld 2009" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="openworld" label="OpenWorld" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>A quick update on how to get organized at OpenWorld. The various product teams have created focus area booklets that list all the presentations related to a specific topic and these are now available for download from the OpenWorld site. For the data warehousing area we have created the following PDF booklets to help you:<br />
</p><ul><br />
	<li><a href="http://www.oracle.com/us/openworld/034315.pdf">Data Warehousing</a></li><br />
	<li><a href="http://www.oracle.com/us/openworld/034653.pdf">Exadata</a></li></ul><br />
In addition there is a whole series of subject specific PDF booklets that you can download from <a href="http://www.oracle.com/us/openworld/030606.htm#database">here</a>:<br />
<ul><br />
	<li>Cloud Computing</li><br />
	<li>Data Integration and Oracle GoldenGate</li><br />
	<li>Database Application Development</li><br />
	<li>Database Security</li><br />
	<li>Database Upgrade</li><br />
	<li>Database Utilities</li><br />
	<li>Enterprise Manager</li><br />
	<li>Grid Computing</li><br />
	<li>Green/Environmental Sustainability</li><br />
	<li>High Availability</li><br />
	<li>Information Management</li><br />
	<li>Management and Infrastructure</li><br />
	<li>Modernization</li><br />
	<li>Oracle Consulting</li><br />
	<li>Oracle Multimedia</li><br />
	<li>Oracle Spatial and Oracle MapViewer</li><br />
	<li>Oracle@Oracle Best Practices</li><br />
	<li>Secure Enterprise Search and Text</li><br />
	<li>Semantic Technologies</li>
</ul><br />
The data warehouse PM team will be in demo grounds all week on the data warehouse demo pod so please come and visit and say hi and let us know if you find the blog useful. We will be based in Moscone West in the database area. A map of the layout of the Moscone West demo grounds is <a href="http://www.oracle.com/us/openworld/018298.pdf">here</a>. Look forward to seeing you all next week.<p></p>]]>
      
   </content>
</entry>

<entry>
   <title>In-Database MapReduce (Map-Reduce)</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/10/in-database_map-reduce.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14706</id>
   
   <published>2009-10-01T21:35:59Z</published>
   <updated>2009-10-02T18:22:24Z</updated>
   
   <summary>The Map-Reduce model has become a popular way for programmers to describe and implement parallel programs. These custom map-reduce programs are often used to process a large data set in parallel. This post shows how to implement Map-Reduce Programs within...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="database" label="Database" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="mapreduce" label="Map-Reduce" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="mapreduce" label="MapReduce" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="plsql" label="PLSQL" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="parallel" label="Parallel" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sql" label="SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>The Map-Reduce model has become a popular way for programmers to describe and implement parallel programs. These custom map-reduce programs are often used to process a large data set in parallel. This post shows how to implement Map-Reduce Programs within the Oracle database using Parallel Pipelined Table Functions and parallel operations.</p>  <h3>The Theory</h3>  <p>Pipelined Table Functions were introduced in Oracle 9i as a way of embedding procedural logic within a data flow. At a logical level, a Table Function is a function that can appear in the FROM clause and thus functions as a table returning a stream of rows. Table Functions can also take a stream of rows as an input. Since Pipelined Table Functions are embedded in the data flow they allow data to be 'streamed' to a SQL statement avoiding intermediate materialization in most cases. Additionally, Pipelined Table Functions can be parallelized. </p>  <p>To parallelize a Table Function the programmer specifies a key to repartition the input data. Table Functions can be implemented natively in PL/SQL, Java, and C. You can find more information and examples about Table Functions and the functionality mentioned above at the following URL:</p>  <p><a href="http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96624/08_subs.htm#19677">http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96624/08_subs.htm#19677</a></p>  <p>Pipelined Table Functions have been used by customers for several releases and are a core part of Oracle's extensibility infrastructure. Both external users and Oracle Development have used Table Functions as an efficient and easy way of extending the database kernel. </p>  <p>Examples of table functions being used within Oracle are the implementation of a number of features in Oracle Spatial and Oracle Warehouse Builder. Oracle Spatial usages include spatial joins and several spatial data mining operations. Oracle Warehouse Builder allows end users to leverage Table Functions to parallelize procedural logic in data flows such as the Match-Merge algorithm and other row-by-row processing algorithms.</p>  <h3>Step-by-Step Example</h3>  <p>All examples are available in plain text in this file: <a href="http://blogs.oracle.com/datawarehousing/omr.sql">omr.sql</a>.</p>  <p>To illustrate the usage of parallelism, and Pipelined Table Functions to write a Map-Reduce algorithm inside the Oracle database, we describe how to implement the canonical map-reduce example: a word count. For those unfamiliar with the example, the goal of word count is to return all distinct words within a set of documents as well as a count of how often this word occurs within this set of documents.</p>  <p>The procedural code in this word count example is implemented in PL/SQL but, as said before, Oracle allows you to pick your language of choice to implement said procedural logic.</p>  <h4>Step 1 - Setting up the Environment</h4>  <p>We will be looking at a set of documents, these documents can be either files outside of the database, or they can be stored as Secure Files/CLOB columns within the database. Within this table our documents are stored, effectively reflecting a file system. </p>  <p>In this case we are going to create an table within the database using the following definition:</p>  <blockquote>   <p><font face="Courier New" size="2">CREATE TABLE documents </font><font face="Courier New" size="2">(a CLOB)       <br /></font><font face="Courier New" size="2">&#160; LOB(a) STORE AS SECUREFILE(TABLESPACE sysaux);</font></p> </blockquote>  <p>Each row in this table corresponds to a single document. We populate this table with a very simple corpus resulting in 3 documents with the text shown here:</p>  <blockquote>   <p><font face="Courier New" size="2">INSERT INTO documents VALUES ('abc def');       <br />INSERT INTO documents VALUES ('def ghi');        <br />INSERT INTO documents VALUES ('ghi jkl');        <br />commit;</font></p> </blockquote>  <p>The end result of both the map function and the reduce table function are going to live in a package, keeping the code nice and tidy. To show the steps to be taken we will take snippets from the overall package and show those in the section to follow. The actual package will contain a set of types, which are required for the code to work. All code was tested on Oracle Database 11g (11.1.0.6). </p>  <p>Download the full code here.</p>  <p>The following figures show the package being deployed.</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/CreatePackageHeader.jpg"><img title="CreatePackageHeader" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="225" alt="CreatePackageHeader" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/CreatePackageHeader_thumb.jpg" width="426" border="0" /></a> </p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/CreatePackageBody.jpg"><img title="CreatePackageBody" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="236" alt="CreatePackageBody" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/CreatePackageBody_thumb.jpg" width="431" border="0" /></a> </p>  <h4>Step 2 - Creating the Mapper and the Reducer</h4>  <p>First we need to create a generic function to "map" (as in map-reduce) or tokenize a document. Note that the goal is not to show the best map function, but how this will work in principle in the database. This specific map function is very basic and better implementations may be found elsewhere.&#160; </p>  <p>You can use the aggregation engine to get the results and only use the mapper. A query and a result would look like this:</p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/FirstSelect.jpg"><img title="FirstSelect" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="174" alt="FirstSelect" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/FirstSelect_thumb.jpg" width="435" border="0" /></a> </p>  <p>The aggregation is done in SQL, no reducer required.</p>  <p>Of course, you could write your own aggregation Table Function to count the occurrences of words in a document. That is what you would do if you were writing the map-reduce program without leveraging the Oracle aggregation engine as we did before. This aggregation Table Function is the reducer of the map-reduce program unit.</p>  <p>The Table Function specifies that it's input be partitioned by word and could (to use the Oracle execution engine's sort) ask for the data to ordered or clustered by word. We show a sample count program in this post to complete the example. </p>  <p></p>  <p></p>  <h4>Step 3 - In-Database Map-Reduce</h4>  <p>After you have completed both the mapper and the reducer you are ready to do the full map-reduce in the database. Running a query using this Table Function will give us a parallel workload on external documents, doing what the typical map-reduce programs do. </p>  <p><a href="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/SecondSelect.jpg"><img title="SecondSelect" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="177" alt="SecondSelect" src="http://blogs.oracle.com/datawarehousing/WindowsLiveWriter/InDatabaseMapReduce_8E7B/SecondSelect_thumb.jpg" width="439" border="0" /></a> </p>  <h3>Summary</h3>  <p>Oracle Table Functions are a proven technology, used by many internal and external parties to extend Oracle Database 11g. </p>  <p>Oracle Table Functions are a robust scalable way to implement Map-Reduce within the Oracle database and leverage the scalability of the Oracle Parallel Execution framework. Using this in combination with SQL provides an efficient and simple mechanism for database developers to develop Map-Reduce functionality within the environment they understand and with the languages they know.</p>  <p>Download the code here: <a href="http://blogs.oracle.com/datawarehousing/omr.sql">omr.sql</a>. For the example, I ran this in OE (as you can see on the SQL screens). No special privileges required. </p>]]>
      
   </content>
</entry>

<entry>
   <title>OLTP And Data Warehouse? How Does That Work? </title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/oltp_and_data_warehouse_how_do.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14621</id>
   
   <published>2009-09-28T14:06:11Z</published>
   <updated>2009-09-28T14:10:53Z</updated>
   
   <summary>I recently came across a posting on another website that was questioning some of the messaging associated with our recent launch of the Sun Oracle Database Machine. The implication being made was that an appliance designed to support data warehousing...</summary>
   <author>
      <name>keith.laker</name>
      
   </author>
   
      <category term="Opinion" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="activedatawarehousing" label="Active Data Warehousing" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sunoracledatabasemachine" label="Sun Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>I recently came across a posting on another website that was questioning some of the messaging associated with our recent launch of the Sun Oracle Database Machine. The implication being made was that an appliance designed to support data warehousing could not support OLTP operations and an appliance designed to support OLTP operations could not be expected to support data warehousing. </p>

<p>Are these two types of operations mutually exclusive? Is it actually possible to deliver an appliance that truly does provide the best of both worlds? More importantly why would you need such an appliance? I would argue that were are rapidly reaching a point where there is very little difference between the types of features and of operations needed to support OLTP operations compared to data warehouse.</p>

<p>Many of today's data warehouses are looking more and more like an OLTP application. This is what is often referred to as "active data warehousing". Active data warehousing is completely different to traditional data warehousing (as supported by the current data warehouse appliance vendors) and is based around three types of operations that overlap and intersect:</p>

<ul>
	<li>Data loading in near-real time using trickle feed ETL jobs to update small amounts of data</li>
	<li>Operational style queries requesting very small amounts of data to support transactional operations that run around the clock</li>
	<li>Analytical style queries, executed continuously to plan next year's budgets, uncover market trends or predict future growth patterns</li>
</ul>

<p>To support these types of "active" operations a data warehouse appliance must provide the following:</p>

<p><strong>1) Strong Transactional Heritage</strong><br />
The traditional data warehouse is becoming a thing of the past. Very few customers are looking to build a data warehouse based on single batch updates where the database is taken down over night, or over the weekend, to load new data.  An active data warehouse is as the name suggests, active. It is not supposed to be offline. By linking into operational systems the data warehouse becomes a 24*7 extension. In reality an active data warehouse cannot be unavailable. It is also wrong to assume that trickle feeds just deliver small amounts of data at regular intervals. For many customers, such as those in retail, financial organizations or telecommunications this is not the case. The trickle feeds deliver terabytes of data that need to be loaded as quickly a possible so business users can extract competitive advantages as quickly as possible.</p>

<p>Active data warehousing is often seen as "the revenge of OLTP" systems because of the need to combine a strong robust transactional model with data warehouse features within a single database engine. Therefore, this latest announcement of a Sun Oracle Database Machine that supports both OLTP and data warehousing means it provides the perfect platform for delivering active data warehousing.</p>

<p>The traditional data warehouse appliance vendors have no background in transactional processing. Their databases lack a sophisticated locking model so queries struggle to deliver accurate and consistent results during data loads. Typically these appliance databases fail to support active data warehouse style operations because they can only support a single view of the data. Consequently when data is being loaded all readers are locked out of the system until the write operation(s) completes. </p>

<p>If read operations are running when the data load process starts the data load is locked out until the queries all complete. This simplistic model causes severe performance problems for customers looking for the benefits of moving to a truly active data warehouse.</p>

<p>In contrast Oracle has a long and distinguished history of both transactional processing and data warehousing. The world-class robust locking model within the Oracle Database allows both readers and writers to co-exist without blocking each other.  Oracle automatically provides read consistency to a query so that all the data that the query sees comes from a single point in time. Oracle can also provide read consistency to all of the queries in a transaction.  This means the Oracle Database is ideally suited to deliver a real active data warehouse. <br />
The leading analyst companies such as Gartner, Forrester and IDC recognize the Oracle Database as a market leader in both data warehousing and OLTP.</p>

<p><strong>2) Robust Security Model</strong><br />
Pushing data out to many different OLTP applications and business users means that the issue of data security becomes a critical focus area. There are many areas that have to be considered when attempting to secure a database. Security has become a multidimensional problem. Customers need to consider how best to:</p>

<ul>
	<li>Protect their network and media</li>
	<li>Develop a strong authentication framework</li>
	<li>Manage internal threats</li>
	<li>Deliver audit and configuration monitoring</li>
	<li>Create a central place to manage all users</li>
	<li>Encrypt and mask sensitive data</li>
</ul>

<p>In addition, many countries have passed laws that specify how data must be managed and who can access that data. Failure to meet the requirements of this legislation can result in damaging publicity as well as fines or even lengthy jail sentences.</p>

<p>Security within most data warehouse appliance systems is limited. The typical appliance database provides basic user authentication and authorization procedures so users who access the data can only see information to which they have been granted access. In some cases appliance vendors will offer some form of data encryption, which can be applied to a specific column of data. However, the encryption process is not transparent and usually requires changes to both ETL/data loading processes and the query access processes.</p>

<p>The auditing and logging of user activity is invariably limited to basic information such as details of logons and logoffs, number of failed logins, non-typical access periods and non-typical logon sources.</p>

<p>In today's security conscious world these security features are completely inadequate. In fact, most of the appliance vendors have now woken up to this fact and are slowly linking up with a third party vendors in an attempt to bring 21st Century rigor to their security features. The problem for customers is that they have to deal with another software vendor, additional licenses and attempt to broker solutions to support issues where the boundaries of responsibility are unclear.</p>

<p>In contrast Oracle is able to fully secure your data from end to end and across the whole data lifecycle. For over 30 years Oracle has been developing market-leading solutions to help customers secure all information within their data warehouses and OLTP systems. Oracle data security solutions help address numerous government and industry regulations including SOX, PCI, HIPAA, FISMA, JSOX and the European Union Data Privacy Directive. </p>

<p>To help customers secure and defend their data warehouse Oracle has developed its 'Defence-in-Depth Guide' along with its 'Maximum Security Architecture'. This is a series of solutions to help customers address today's top security concerns. The Oracle Maximum Security Architecture provides an easy to follow overview of Oracle's defence-in-depth approach to data security.</p>

<p>Oracle's market leading security and data encryption features are designed to work transparently, minimizing any impact on existing applications while addressing mandatory requirements found in many regulations.</p>

<p>Overall, Oracle provides everything you need to secure the data within both your data warehouse and your OLTP applications. There is no need to contact third party vendors, no need to change your database schemas, no need to change your existing business intelligence tools. Oracle Database security features are designed to be transparent to make customers lives easier and all Oracle security features work transparently with Sun Oracle Database Machine.</p>

<p><strong>3) Strong Resource Management</strong><br />
The other implication made on the website mentioned earlier was that you would have to be idiot to put your operational systems on the same appliance as your data warehouse. But why would the appliance vendors want to stop customers from running OLTP applications along side their data warehouse? Apart from the lack of strong transactional features the appliance vendors typical uses a shared nothing architecture, which is optimized to support pre-defined batch queries that scan large volumes of data. These optimizations do not suit the types of queries that dominate active data warehouse and OLTP operations. Operational queries request small amounts of data but the shared nothing architecture has to run these queries across all nodes within the cluster tying up resources that really are not needed. It does not take many of these operational queries to flood an appliance system and the DBA has little, if any, control over the amount of resource acquired by each query. </p>

<p>The Oracle DBA has complete control over how system resources are used because Oracle's 'shared everything' architecture provides the ability to dynamically optimize each query. Before the query is executed the current workload is examined and the amount of processing power, i.e. degree of parallelism, allocated to the query is adjusted rather than simply allocating the same amount of processing resource every time (as happens with the other appliance vendors).  The degree of parallelism is, therefore, optimized for each query and there is no requirement for a minimal degree of parallelism across all nodes. Operations can run in parallel using one, some or all nodes depending on the current workload, the characteristics and importance of the query. </p>

<p>Over the years the Oracle Database has been extended to provide fully automated performance optimization features. These query optimization features have the ability to learn over time, which are the most optimal query plans, and then lock subsequent queries into using those plans to ensure consistent performance.</p>

<p>It is this ability to effectively manage the resources within the Sun Oracle Database Machine that allows Oracle to run both OLTP and data warehouse operations on the same platform. Oracle has a world-class database with strong transactional and data warehouse features, it has world-class resource management features and now with Sun Oracle Database Machine it has a world-class high performance platform.</p>

<p><strong>4) Deliver High Availability</strong><br />
Customers cannot afford to have their active data warehouse or OLTP applications inactive. Without access to the operational systems and/or the data warehouse, revenue and customers can be lost, penalties incurred and bad press generated, all of which can have a lasting effect on both customers and the company's reputation. Building a robust, high availability IT infrastructure to support both a data warehouse and OLTP operations is critical to the success and well being of all enterprises in today's fast moving economy. The data warehouse, like all OLTP systems, is now a mission-critical system. </p>

<p>Building a robust and resilient platform based on a typical data warehouse appliance is very difficult because of its shared nothing architecture. This means data is dedicated to a specific node and if a node fails then the data associated with that node also becomes unavailable. Which by default, implies that the whole system is unavailable. </p>

<p>To resolve this architectural challenge appliance vendors offer a number of different methods for delivering a resilient enterprise data warehouse, each governed by the size of the customer's budget. </p>

<p>In some cases the use of features to deliver a robust and resilient platform can affect performance. For example when using a 'fallback' configuration to protect data, write performance is reduced as data has to be written to both the primary and fallback locations to ensure consistency. This has a knock-on effect in terms of query performance because user requests can be blocked while a write operation is being performed. If that write operation takes longer because of the need to support a fallback copy of the data, business users will be locked out of the system for longer. </p>

<p>Oracle databases operate in highly resilient and recoverable configurations and run thousands of mission-critical systems around the world. Oracle's database is designed to support mission critical environments from the ground up. One of the key parts in providing a highly available solution is Oracle Real Application Clusters (RAC). This is the premier database clustering technology that allows two or more computers (also referred to as "nodes") in a cluster to concurrently access a single shared database. This effectively creates a single database system that spans multiple hardware systems yet appears to the application as a single unified database. </p>

<p>Oracle Automatic Storage Management (ASM) provides protection at the storage layer. With performance and high availability as a primary objective, ASM builds on the principle of stripe and mirror everything. Intelligent mirroring capabilities allow administrators to define two or three way mirrors for the ultimate protection of critical business data.</p>

<p>Furthermore, Oracle has an extremely mature set of fully integrated and resilient solutions that enable customers to deliver highly available data warehousing. Oracle has developed "Maximum Availability Architecture"  (MAA) to provide superior data protection and availability by minimizing or eliminating planned and unplanned downtime at all technology stack layers including hardware or software components.</p>

<p><strong>OLTP And Data Warehouse? How Does That Work?<br />
</strong>For many appliance vendors it just not an option to run an OLTP application on their servers because of the inherent limitations built in to their database and architecture. Therefore, to go back to the original question: "OLTP And Data Warehouse? How Does That Work?" For Oracle customers the answer is: very nicely thank you!</p>]]>
      
   </content>
</entry>

<entry>
   <title>In-Memory Parallel Execution in Oracle Database 11gR2</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/in-memory_parallel_execution_i.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14604</id>
   
   <published>2009-09-28T01:23:05Z</published>
   <updated>2009-09-28T01:42:56Z</updated>
   
   <summary>As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session: Tuesday, October 13 2009 5:30PM, Moscone South Room 308 Session...</summary>
   <author>
      <name>maria.colgan</name>
      
   </author>
   
      <category term="Best Practices" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="OpenWorld 2009" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="inmemoryparallelexecution" label="In-Memory Parallel Execution" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="oracledatabase11g" label="Oracle Database 11g" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="parallelexecution" label="Parallel Execution" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="parallelquery" label="Parallel Query" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="performance" label="Performance" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session: </p>

<p>Tuesday, October 13 2009 5:30PM, Moscone South Room 308<br />
Session <a href="http://www28.cplan.com/cc221_new/session_details.jsp?isid=311420&ilocation_id=221-1&ilanguage=english">S311420</a> <br />
Extreme Performance with Oracle Database 11g and In-Memory Parallel Execution.</p>

<p>In this session you will get more details and insight from the folks who actually built this functionality! A must see if this is of any interest, so book that ticket now and register!</p>

<p>Down to business, what is "In-Memory Parallel Execution"? </p>

<p>Let's begin by having a quick trip down memory-lane back to Oracle Database 7 when Parallel Execution (PX) was first introduced.  The goal of PX then and now is to reduce the time it takes to complete a complex SQL statement by using multiple processes to go after the necessary data instead of just one process. Up until now these parallel server processes, typically by-passed the buffer cache and read the necessary data directly from disk. The main reasoning for this was that the objects accessed by PX were large and would not fit into the buffer cache. Any attempt made to read these large objects into the cache would have resulted in trashing the cache content. </p>

<p>However, as hardware systems have evolved; the memory capacity on a typical database server have become extremely large. Take for example the 2 CPU socket Sun server being used in new the Sun Oracle Database Machine. It has an impressive 72GB of memory, giving a full Database Machine (8 database nodes) over ½ a TB of memory. Suddenly using the buffer cache to hold large object doesn't seem so impossible any more. </p>

<p>In-Memory Parallel Execution (In-Memory PX) takes advantage of these larger buffer caches but it also ensures we don't trash the cache. </p>

<p>In-Memory PX begins by determining if the working set (group of database blocks) necessary for a query fits into the aggregated buffer cache of the system. If the working set does not fit then the objects will be accessed via direct path IO just as they were before.  If the working set fits into the aggregated buffer cache then the blocks will be distributed among the nodes and the blocks will be affinitzed or associated with that node. </p>

<p>In previous releases, if the Parallel Execution of one statement read part of an object into the buffer cache, then subsequent SQL statement on other nodes in the cluster would access that data via Cache Fusion. This behavior could eventually result in a full copy of that table in every buffer cache in the cluster.  In-Memory PX is notably different because Cache Fusion will not be used to copy the data from its original node to another node, even if a parallel SQL statement that requires this data is issued from another node. Instead Oracle uses the parallel server process on the same node (that the data resides on) to access the data and will return only the result to the node where the statement was issued.</p>

<p>The decision to use the aggregated buffer cache is based on an advanced set of heuristics that include; the size of the object, the frequency at which the object changes and is accessed, and the size of the aggregated buffer cache. If the object meets these criteria it will be fragmented or broken up into pieces and each fragment will be mapped to a specific node. If the object is hash partitioned then each partition becomes a fragment, otherwise the mapping is based on the FileNumber and ExtentNumber. </p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="InMemoryPX_final.jpg" src="http://blogs.oracle.com/datawarehousing/InMemoryPX_final.jpg" width="600" height="419" class="mt-image-none" style="" /></span></p>

<p>To leverage In-Memory PX you must set the initialization parameter PARALLEL_DEGREE_POLICY to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.</p>

<p>Obviously this post is more of a teaser, for in-depth discussions on this, go to Openworld and/or keep an eye out for a new white paper called Parallel Execution Fundemental in Oracle Database 11gR2 that will be coming soon to <a href="http://www.oracle.com/technology/products/bi/db/11g/index.html">oracle.com</a>. This paper not only covers In-Memory PX but Auto-DOP and parallel statement queuing. </p>

<p>Stay tuned for  more on 11gR2 coming soon...<br />
</p>]]>
      
   </content>
</entry>

<entry>
   <title>500GB/sec and Database Machine Generation 2</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/500gbsec_and_database_machine.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14499</id>
   
   <published>2009-09-21T17:41:32Z</published>
   <updated>2009-09-21T16:40:31Z</updated>
   
   <summary>Last Tuesday we announced the second generation of Database Machine. This second generation of Oracle Exadata is now running on Sun hardware. The premise for Database Machine is still the same: deliver extreme performance systems on commodity hardware with ease...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="databasemachine" label="Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="exadata" label="Exadata" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Last Tuesday we announced the second generation of Database Machine. This second generation of Oracle Exadata is now running on Sun hardware. The premise for Database Machine is still the same: deliver extreme performance systems on commodity hardware with ease of deployment. </p>

<p>The database machine is a prime example (as was the first generation) of software-enabled hardware. The software offers the real value, the hardware is of the shelve stuff allowing a great price point and an easy way to quickly release a next generation system and get the benefits of faster chips and other components. The software allows the easy migration and the extreme benefits. </p>

<p>The Sun Oracle Database Machine comes with some new and very cool Exadata software features, it once again has InfiniBand - generation 2 delivers even higher throughput numbers - and it is now available in smaller configurations.</p>

<p><strong>So what is new here?</strong></p>

<p>For one, the addition of flash into the system is something very compelling and a leap forward in terms of performance and throughput. And yes, that is where the 500GB/sec comes from...</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="cache_hierarchy.JPG" src="http://blogs.oracle.com/datawarehousing/cache_hierarchy.JPG" width="460" height="280" class="mt-image-none" style="" /></span></p>

<p>Effectively what we did in generation 2 is adding a very fast cache into the storage tier of the system, and by doing this created a hierarchy as shown above. The fastest tier is the actual memory in the database nodes, which we increased on the machine. The bottom part of hierarchy is the disk, here we increased the throughput for a whole rack to 21GB/sec. By adding flash cards (not flash drives!) to the storage tier we can leverage this as cache and get the benefits from a scale out strategy. As we scale out the storage, we scale out the flash and the throughput. </p>

<p>The Exadata cache is a smart cache that we carefully manage. If you deem it necessary you can pin objects into the cache as well. Since the Exadata Storage Server actually understands the structure of the data stored, the cache does so too. It is after all managed by the Exadata software. This means that we do not use a regular LRU (Least Recently Used) algorithm, but determine which data is hot and cache these sets when we deem it better to do so.</p>

<p>One distinct difference with the flash you see in traditional storage arrays is that we are not using flash disks in Exadata. We are using PCIe cards. This means we are not constraint by slow disk controllers and can get these massive throughput numbers of 50GB/sec for a full rack database machine.</p>

<p>On top of this, we are introducing Hybrid Columnar Compression with Exadata generation 2. We talked about this already in a previous post around the 11g Release 2 database new features. </p>

<p>In the data warehousing workload (assuming bulk loads for example and lots of querying) we can achieve a 10x compression of the data with almost no impact on query performance. That compression rate allows us to achieve up to 500GB/sec of scan rates from the flash cards.</p>

<p>To put that into perspective, in generation 1 of the Database Machine we achieved up to 14GB/sec of throughput from the disks (in a full rack). In generation 2 we are up to 21 GB/sec, both numbers are uncompressed. Flash gets us to around 50GB/sec. The truly staggering numbers come with that 10x Hybrid Columnar Compression rate... For anyone who has ever run queries on a system, 500 GB/sec is really, really fast!</p>

<p><strong>Storage Indexes</strong></p>

<p>That is not all though. Generation 2 of Exadata also introduces Storage Indexes. A storage index is something more akin to a range partition, but we evaluate this at the storage layer. Sometime this is referred to as a negative index.</p>

<p>What happens is that for each column commonly queried we transparently store the min and max values of that column. We do this for a certain data size e.g. as soon as we finish writing the data and filling up that predefined size we calculate the min and max for the relevant columns. The result is something like this:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="storage_index_schematic.JPG" src="http://blogs.oracle.com/datawarehousing/storage_index_schematic.JPG" width="225" height="275" class="mt-image-none" style="" /></span></p>

<p>If the user now issues a query asking something like SELECT * FROM TABLE WHERE B<2 the scans will only look for the first set of rows in above picture. Since the minimum value in the second block is 3, no rows matching the query will be in that set of rows. This allows a Storage Index to gives us transparent data elimination without overhead, making the scans more distinctive and therefore faster.</p>

<p>So as you can see, the whole system is faster on all accounts than the already fast generation 1 system. It is also much faster than anything else out there in the market.</p>

<p>Seeing that there is much more news, like the actual family details (half racks, quarter racks and smaller) the offloading of data mining scoring and all the 11g Release 2 details we haven't yet cover, expect quite a few follow-up posts on both 11gR2 and Generation 2 Exadata.</p>

<p>Next is as promised earlier, the 11gR2 in-memory parallel execution<br />
</p>]]>
      
   </content>
</entry>

<entry>
   <title>Introducing the next generation Database Machine</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/introducing_the_next_generatio.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14436</id>
   
   <published>2009-09-16T18:00:50Z</published>
   <updated>2009-09-16T16:53:57Z</updated>
   
   <summary>Yesterday, in a live webcast Larry Ellison introduced the next generation Database Machine. Before we dive into technical details, have a look at the announcement and the information out on the web: http://www.oracle.com/us/corporate/press/033684 (click here) http://www.oracle.com/database/database-machine.html (click here) Most importantly,...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="News" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="exadata" label="Exadata" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="flash" label="Flash" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="performance" label="Performance" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sunoracledatabasemachine" label="Sun Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Yesterday, in a live webcast Larry Ellison introduced the next generation Database Machine. Before we dive into technical details, have a look at the announcement and the information out on the web:</p>

<p>http://www.oracle.com/us/corporate/press/033684 (click <a href="http://www.oracle.com/us/corporate/press/033684">here</a>)<br />
http://www.oracle.com/database/database-machine.html (click <a href="http://www.oracle.com/database/database-machine.html">here</a>)</p>

<p>Most importantly, have a look at the following page which has some excellent overview and introductory papers:</p>

<p>http://www.oracle.com/technology/products/bi/db/exadata/index.html (click <a href="http://www.oracle.com/technology/products/bi/db/exadata/index.html">here</a>)</p>

<p>Now, there are a number of very important implications of this next generation database machine. The most important one being Flash Cards! Imagine 500GB/sec of throughput from your system... </p>]]>
      
   </content>
</entry>

<entry>
   <title>Announcing something very cool...</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/announcing_something_very_cool.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14408</id>
   
   <published>2009-09-15T00:10:13Z</published>
   <updated>2009-09-15T00:14:13Z</updated>
   
   <summary>What are we announcing, well... you just have a look over here: http://www.oracle.com/features/larry-ellison-webcast.html Larry Ellison will be introducing a very cool new offering into the market. The webcast is tomorrow (Tuesday) September 15 at 1pm Pacific Time. Tune in, or...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="databasemachine" label="Database Machine" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="exadata" label="Exadata" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sun" label="Sun" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>What are we announcing, well... you just have a look over here:</p>

<p>http://www.oracle.com/features/larry-ellison-webcast.html</p>

<p>Larry Ellison will be introducing a very cool new offering into the market. The webcast is tomorrow (Tuesday) September 15 at 1pm Pacific Time.</p>

<p>Tune in, or come back after the session and read all about it here...</p>]]>
      
   </content>
</entry>

<entry>
   <title>11gR2 - What is in it for me?</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/11gr2_-_what_is_in_it_for_me.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14230</id>
   
   <published>2009-09-07T12:31:28Z</published>
   <updated>2009-09-07T12:53:00Z</updated>
   
   <summary>By now you probably heard the news of the Oracle Database 11g Release 2. It is available and comes with a number of very interesting features for data warehousing. Obviously now you want to try this, so here is the...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Oracle Database Machine" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="compression" label="Compression" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="database11gr2" label="Database 11gR2" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="parallelexecution" label="Parallel Execution" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sql" label="SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="tpch" label="TPC-H" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>By now you probably heard the news of the Oracle Database 11g Release 2. It is available and comes with a number of very interesting features for data warehousing.</p>

<p>Obviously now you want to try this, so <a href="http://www.oracle.com/technology/software/products/database/index.html">here is the download site</a> for the software (OTN License required of course). The data warehouse collateral can be found on OTN as well, so here is the <a href="http://www.oracle.com/technology/products/bi/db/11g/index.html">link for that</a>.</p>

<p>I would recommend everyone to first and foremost read <a href="http://www.oracle.com/technology/products/bi/db/11g/pdf/twp_bidw_overview_11gr2.pdf">the update</a> on what the Oracle Database offers for data warehousing.</p>

<p>As a guide to the paper just mentioned you may want to skim over some of the sections and look for In-Memory Parallel Execution, look for either Archive Compressoin or Warehouse Compression (look for both using Exadata Hybrid Columnar Compression) and take a quick peak at the Warehouse Builder section. Those sections will give you a quick glance of some of the new features. Now these are some of the features that made it into the highest-level overview paper.</p>

<p>In this blog post and in the next set of postings (yes, this is a series of posts...) we will discuss a myriad of new features (including the above mentioned ones) in much more detail.</p>

<p>One of the features that has already hit the blogosphere is the Exadata Hybrid Columnar Compression. Kevin commented on a post (<a href="http://kevinclosson.wordpress.com/2009/09/01/oracle-switches-to-columnar-store-technology-with-oracle-database-11g-release-2/">here</a>).</p>

<p>As there seems to be some confusion as to what we really did here, we did introduce a columnar compression mechanism. It consists out of 2 compression grades - if I may use that word here - one called data warehouse compression that allows great compression (10x) with excellent query ratings and an archiving compression that really focuses on the compression rate at the cost of performance. You can mix and match these compression forms by applying different schemes to different partitions.</p>

<p>Technically, what we did here is something along these lines...</p>

<p>Data is still organized in tables and rows, however the database now organizes this data in so-called compression units. The compression units hold a set of rows (let's say a couple of thousand to make it a bit more concrete) and the data within the compression units is organized in columns and then compressed. This column orientation within a compression unit allows for larger rates of compression as we have more of the same values close together.</p>

<p>The hybrid columnar compression is something else (just to be clear) then Advanced Compression (often referred to as OLTP compression) and you do not need one to get the other work.</p>

<p>OLTP compression is a method to compress data that sees very frequent inserts and updates and it will keep the data compressed at a high ration (3-4x) while you write, update and read. </p>

<p>Hybrid Columnar Compression for Data Warehousing is focused at data that gets delivered in bulk loads (direct writes) and gets queried a lot rather than update a lot. This hybrid columnar for data warehousing will deliver significantly higher compression rates (10x and we have seen more) than OLTP compression. But it is aimed at that write-ones, read-many workload.</p>

<p>As Kevin said, Hybrid is the key word in the compression feature new in 11gR2 (ehm, sorry, new in Exadata...). By using a hybrid solution we get great compression rates, excellent table scans and good random row access. The latter is the one that is always a problem in pure columnar storage.</p>

<p>One of the fun little features - to switch gears here - that is added to SQL is in analytics. We now have something called LISTAGG. LISTAGG allows you to flatten a result set into a single record. This is often used in 1:many constructions such as hierarchical data and bill of materials.</p>

<p>If we use good old EMP and DEPT, you get something like this:</p>

<p>SELECT deptno, <br />
       LISTAGG(ename, '; ') WITHIN GROUP (order by ename) <br />
FROM emp GROUP BY deptno;</p>

<p>DEPTNO    LISTAGG(ENAME,';')WITHINGROUP(ORDERBYENAME)<br />
--------------   -----------------------------------------------------------------------------------   <br />
  10            CLARK; KING; MILLER<br />
  20            ADAMS; FORD; JONES; SCOTT; SMITH<br />
  30            ALLEN; BLAKE; JAMES; MARTIN; TURNER; WARD</p>

<p>Next up, we'll look at the in-memory parallel execution. If you want to see how fast it is, just take a look at the TPC benchmark sites <a href="http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=cluster">here</a>... Find the 1000GB benchmark or click <a href="http://www.tpc.org/tpch/results/tpch_result_detail.asp?id=109060301">here</a>.<br />
</p>]]>
      
   </content>
</entry>

<entry>
   <title>Data Warehousing @ OpenWorld 2009 - How to get organized</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/09/data_warehousing_openworld_200.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.14155</id>
   
   <published>2009-09-02T05:47:11Z</published>
   <updated>2009-09-02T06:04:25Z</updated>
   
   <summary>The OpenWorld site now has a complete list of dates and times for all the data warehouse presentations scheduled for this year&apos;s conference. You can see the complete schedule here. To help our data warehouse customers, partners and Oracle employees...</summary>
   <author>
      <name>keith.laker</name>
      
   </author>
   
      <category term="OpenWorld 2009" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="openworld" label="OpenWorld" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>The OpenWorld site now has a complete list of dates and times for all the data warehouse presentations scheduled for this year's conference. You can see the complete schedule <a href="http://www28.cplan.com/cc221_new/newCatalog.jsp?ilc=221-1&ilg=english&isort_sessions=&isort_demos=&isort_exhibitors=&is=yes&isort_sessions_type=&isort_exhibitors_type=&isort_demos_type=&icriteria2=&search_sessions=yes&icriteria1=Database&icriteria8=&openTagSearch=&icriteria6=&icriteria3=+&icriteria4=+&icriteria7=&horizontal1=20">here</a>.</p>

<p>To help our data warehouse customers, partners and Oracle employees get started on planning which sessions they would like to attend we have created an online calendar for the data warehouse presentations. In addition to this we will be creating a small booklet that you will be able to download and give to customers. This will provide a simplified calendar of events and information about the data warehouse demo pods that will be part of the demo grounds. The booklet will be ready for next month's newsletter.</p>

<p>Depending on your calendar software, you can subscribe directly to this new calendar and refresh it on a regular basis to receive updates/changes to the schedules or alternatively download a stand-alone calendar file and simply add this as a new local/offline calendar using your local calendar software. Once you have subscribed to, or added, this calendar to your local calendar application you can then synchronize the events with your PDA, iPhone, iPod, mobile phone etc.  All the times for each presentation have been entered in Pacific Time so you will need to enable time zone support within your calendar and PDA/iPhone/iPod/mobile phone to see the events at the correct local Pacific Time when you arrive in San Francisco. At the moment the master calendar in iCal looks like this:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="Cal.png" src="http://blogs.oracle.com/datawarehousing/2009/09/01/Cal.png" width="301" height="268" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>Here are the some options for linking this master calendar to your own calendar software:</p>

<p><strong>1) If you are using Apple iCal</strong><br />
<em>Subscribing to a live calendar</em><br />
This is the best way to ensure your calendar is as up to date as possible. It is suitable for anyone who plans to synchronize this calendar with an iPhone, iPod, PDA or mobile phone calendar. You can subscribe to the "DW OpenWorld 2009" calendar using the standard features in Apple iCal. In the panel showing all your calendars simply click on a blank area and then select the option to "Subscribe". The URL for the calendar is:</p>

<p>webcal://ical.me.com/keith_laker/OpenWorld2009.ics</p>

<p>This will add a new calendar to your Subscriptions node. All the events listed in this calendar are set for US Pacific time zone which means to view the events at the correct scheduled time you will need to change your time zone to US Pacific for both iCal (Preferences->Advanced-Turn On Time Zone Support) and in the Apple System Preferences under Date & Time. To see changes and updates to this calendar you will need to use the "refresh" option.</p>

<p><em>Using the Web-based Calendar Viewer</em><br />
This is the ideal solution if you plan to use wifi access during the conference since you do not need to download anything or synchronize this calendar with any other devices. <br />
You can view this calendar online using any web browser using the following URL:</p>

<blockquote>http://ical.mac.com/WebObjects/iCal.woa/wa/default?d=11&u=keith_laker&v=0&y=2009&m=9&n=OpenWorld2009.ics</blockquote>

<p>This view defaults to US Pacific time since it is controlled by my master calendar. As we make changes to add additional presentations or change the date/time of a presentation to reflect changes made by the OpenWorld team these will be automatically published to this calendar. </p>

<p>Adding a stand-alone/disconnected Calendar<br />
If you want to create a stand-alone calendar that is not linked to the master calendar there is an option to download the associated .ics file for this calendar using the web-based calendar view. Look for the "download" link on the left side of the web page. The .ics file can be manually added to Apple iCal or any other .ics compatible calendar software. Obviously, changes made to the master calendar will not be pushed through to the stand-alone/disconnected calendar but you can simply download an updated .ics file form the web calendar view as required.</p>

<p><strong>2) If you are using Google Calendar</strong><br />
For customers, partners and employees who are already using Google calendar the DW OpenWorld 2009 calendar can be added as a new URL which will then appear in the "Other Calendars" panel, or it can be added as a local calendar to the "My Calendars" panel.</p>

<p>To subscribe to this new calendar using your own Google calendar account. Simply use the "Add by URL" option and set the public calendar address to: </p>

<blockquote>http://www.google.com/calendar/ical/fdfm46n29mkv2r0j0kktsro6ok%40group.calendar.google.com/public/basic.ics</blockquote>

<p>You can also use this Google .ics file to add a calendar to Sunbird/Lightning or any other calendar software that supports the .ics format.</p>

<p>Google also provides an XML feed for integration with other calendar applications and the XML data feed is here:</p>

<blockquote>http://www.google.com/calendar/feeds/fdfm46n29mkv2r0j0kktsro6ok%40group.calendar.google.com/public/basic</blockquote>

<p><br />
<strong>3) If you are using Mozilla Sunbird/Lightning<br />
</strong>Lightning is a Thunderbird plug-in version of Sunbird. Both these products can be downloaded from the Mozilla home page:</p>

<ul>
	<li>Lightning for Thunderbird</li>
	<li>Sunbird</li>
</ul>
Before you add the DW OpenWorld 2009 calendar to either Lightning or Sunbird I would recommend first setting the correct time zone in the application (for Sunbird useTools->OptionsTimeZone, for Lightning use Preferences->Lightning->Time zone) then restart the application to update this setting.

<p>To subscribe to the DW OpenWorld 2009 calendar use the following steps:<br />
<ul><br />
	<li>File->New Calendar</li><br />
	<li>Select "On the Network"</li><br />
	<li>Select iCalendar (ICS)</li><br />
	<li>Set the location to one of the following:</li><br />
	<li>If you want to link to the Apple iCal calendar use this URL:<br />
	webcal://ical.me.com/keith_laker/OpenWorld2009.ics</li><br />
	<li>If you want to link to the Google Calendar use this URL:<br />
http//www.google.com/calendar/ical/fdfm46n29mkv2r0j0kktsro6ok%40group.calendar.google.com/public/basic.ics</li><br />
	<li>Set the name to something like "DW OpenWorld 2009"</li><br />
	<li>Set the color to RED</li><br />
</ul></p>

<p>This will create a linked calendar and any changes or updates made to the master calendar by the DW PM team will be shown when/if you reload the remote calendar. </p>

<p><em>Adding a stand-alone/disconnected Calendar</em><br />
If you want to create a stand-alone calendar that is not linked to the master calendar there is an option to download the associated .ics file for this calendar using the web-based calendar view, see here. Look for the "download" link on the left side of the web page. The .ics file can be manually added to Apple iCal. Obviously, changes made to the master calendar will not be pushed through to the stand-alone/disconnected calendar</p>

<p><strong>4) Enabling Time zone support for iPhone/iPod Touch</strong><br />
Please note if you want to synchronize the "DW OpenWorld 2009" calendar with you iPhone/iPod you need to make sure the feature to manage time zones is correctly set. There are two settings for controlling time zone support. In the settings screen, first select "Mail, Contacts, Calendars" and turn time zone support off for calendars. Secondly, under the General settings select "Date & Time" and set the time zone to either "automatic" if you are using an iPhone or manually select the required local time zone if you are using an iPod Touch. This should ensure that if you set the time zone to US Pacific all the OpenWorld events will be shown as starting at the correct local time.</p>

<p>I hope this is useful and if you have any questions just let me know.</p>]]>
      
   </content>
</entry>

<entry>
   <title>Partitioning or Backup Tables...?</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/08/solving_the_ilm_problem.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.13909</id>
   
   <published>2009-08-19T20:16:21Z</published>
   <updated>2009-08-19T18:44:17Z</updated>
   
   <summary>Got this comment on the blog from Satish. Satish is proposing (or forced) to create backup tables instead of using partitioning. I&apos;m not sure you want to go there. If you do, I think you will need to create views...</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="ilm" label="ILM" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="partitioning" label="Partitioning" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p><a href="http://blogs.oracle.com/datawarehousing/2009/08/a_not_so_fabulous_new_release_1.html#comment-145065">Got this comment on the blog from Satish</a>.  </p>

<p>Satish is proposing (or forced) to create backup tables instead of using partitioning. I'm not sure you want to go there.</p>

<p>If you do, I think you will need to create views to show the data as a single entity, this means that you need to change the application to read from the views rather than from the original tables (or rename the tables first, then create views with the original names). You then want to compress the backup tables. </p>

<p>So you do a CTAS with a filter to create your backup tables from the original ones, add them to views and you are in business. You also just built the same as we do with partitions and how we would recommend doing database ILM strategy.</p>

<p>In other words, you are creating an ETL process that runs once a year (or at whatever interval). You will lose out on transparency, you will also lose out on maintenance, you will get a more complex system.</p>

<p>If you use partitioning, you will also get the benefits of faster stats gathering (in 11g - using incremental stats and the new Synopsis we use there) and potentially of data loading.</p>

<p>So the choice is to either make this transparent and get various other benefits - but pay some money. Or build it all yourself...</p>

<p>Look at these posts:<br />
http://blogs.oracle.com/datawarehousing/2009/04/compressing_individual_partiti.html<br />
http://blogs.oracle.com/datawarehousing/2009/08/partitioning_andorwith_exadata.html</p>

<p>Any who, one other resource to look at are the OTN Forums for questions like this.</p>]]>
      
   </content>
</entry>

<entry>
   <title>Partitioning and/or/with Exadata?</title>
   <link rel="alternate" type="text/html" href="http://blogs.oracle.com/datawarehousing/2009/08/partitioning_andorwith_exadata.html" />
   <id>tag:blogs.oracle.com,2009:/datawarehousing//1237.13899</id>
   
   <published>2009-08-18T22:47:22Z</published>
   <updated>2009-08-18T23:41:48Z</updated>
   
   <summary>Got a little busy there with all the comments on the Netezza posting, but now we&apos;re back into some of the outstanding topics. This post is the next one on discussing some of the ODTUG session questions (see this post)....</summary>
   <author>
      <name>jean-pierre.dijcks</name>
      <uri>http://www.oracle.com/technology/products/warehouse/index.html</uri>
   </author>
   
      <category term="Functionality" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="datawarehouse" label="Data Warehouse" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="exadata" label="Exadata" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="partitioning" label="Partitioning" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="sql" label="SQL" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://blogs.oracle.com/datawarehousing/">
      <![CDATA[<p>Got a little busy there with all the comments on the Netezza posting, but now we're back into some of the outstanding topics. This post is the next one on discussing some of the ODTUG session questions (<a href="http://blogs.oracle.com/datawarehousing/2009/06/best_practices_at_odtug.html">see this post</a>).</p>

<p>One of the questions was about the use of partitioning and whether it is made obsolete by Exadata off-loading...</p>

<p>In other words, should you look at one, the other or both? The answer is that you will want both, and there are a variety of reasons for that.</p>

<p>First of all, on the query side you will hopefully be using partitioning (often range partitioning) for partition elimination. From an I/O perspective that looks roughly like this*:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="Partition_pruning.JPG" src="http://blogs.oracle.com/datawarehousing/Partition_pruning.JPG" width="500" height="140" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>In essence, partition pruning allows you to reduce a 5TB I/O operation to a much smaller I/O operation and therefore  much faster return of the information. Compression is something that may or may not be used. In this example we are compressing the data and further reducing the I/O numbers.</p>

<p>So far there is nothing new here, with Exadata however you will see a further reduction. After applying a smart scan, both the rows returned (remember Exadata is smart storage and actually knows rows and columns should be returned!) and the columns returned are further reduced. This is on top of partitioning. You will get something like this:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="ExadatawithPartition_pruning.JPG" src="http://blogs.oracle.com/datawarehousing/ExadatawithPartition_pruning.JPG" width="370" height="105" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>The conclusion from a query and I/O perspective is therefore that you will benefit from both.</p>]]>
      <![CDATA[<p>Obviously, in the above case both partitioning and smart scans work in conjunction to create nice dramatic effects... which are real in many scenarios. Oftentimes however a query will benefit from either pruning or from a smart scan. Not having one or the other will therefore diminish the overall performance.</p>

<p>There is another aspect that should be considered as well. That is parallel processing. With the Oracle Database Machine we put a whole bus load of cores onto your data. Both at the storage tier as well as the compute tier. This inevitably means you want to do most of the work - on large data sets - in parallel. Hash partitioning plays a crucial role in parallel processing as you can do parallel joins within a set of query slaves. The slaves do not have to talk to someone else if both sides of the join are hash partitioned on the same hash key. </p>

<p>So, on a Database Machine, partitioning will allow you to leverage the parallel power of the system. Again, you want partitioning on an Exadata system...</p>

<p>* Just as a disclaimer -- the numbers are for illustration only. </p>]]>
   </content>
</entry>

</feed>
