Thursday Jan 06, 2011

Controlling your Versions

A handy hint from one of my colleagues (thanks SG) is to populate the VERSION field on each record with a unique number for each of your various data sources (ie 1000,2000,3000) to allow for easy identification of converted data at a later date, as well as the allowing for identification of the associated conversion stream which created this record (assuming each source has a different set of mapping rules).

Thursday Apr 23, 2009

My Generation..

I am aware of a number of performance issues with converting a large amount of transactional data, specifically around the Key Generation routines and the method whereby these jobs cannot be multi-threaded (resulting in massive rollback segments as a single SQL statement attempts to process all transactions in one hit).

As a result, I have looked into the ability to tune this conversion phase and recommend the following:
1. Review the code in CIPVBSGK (via AppViewer for an example of how CC&B conversion generates keys).
2. Consider dropping all indexes on the CK_* tables before running KeyGen to reduce the amount of I/O (ensure that these are all rebuilt as soon as KeyGen has completed to ensure that other dependent tables do not suffer slowdown as a result of full table scans).
3. Ensure that Rollback Segments, Transaction tables (i.e. Bills, FTs, Payments, Meter Read, etc), Master Data Table (ie Account, SA, etc) and Indexes are allocated their own table-spaces and I/O Channel at the database level to ensure that maximum throughput is ensured (my personal preference is to split the Meter Read and Interval-related Data sets out to yet another table-space and IO channel, but this will be dependent on the disk structure on your machines).
4. Assess whether your partitioning is efficient (or even needed), Partitioning is done to speed up DB accesses where the application is I/O bound. It is not required in order for CC&B to run. If the Database is configured correctly on a single table-space, feel free to leave it as-is.
5. Be aware that running the validation processes for every x records does expose you to the risk that not all errors will be encountered, but it does allow you to quickly identify the big ticket items that are incorrect in your mapping and get these out of the way easily. I recommend starting out running every x records as part of your initial conversion runs and setting the value to 1 for all of your dress-rehearsal iterations.

Failing this, some sites have elected to customise the KeyGen, in the event that this is your preferred approach take care to:
1. Consider running custom SQL to perform your key generation, aligned with your partition ranges (i.e. 10 concurrent SQL statements with high and low ranges aligned with your 10 partitions, or a factor of this split (e.g. 20 or 30 SQL statements each performing bite-sized chunks of each partition range).
2. Consider running the base KeyGen process in De-duplication mode after your custom SQL has completed to ensure that no duplicates are encountered (if they are this process will 'bounce' the SQL-generated value to another unique value).
3. Note that no manipulation of the input keys is expected as part of your Extract or Load processes in order to complete any of this functionality.
4. Make sure that KeyGen (or your custom SQL) is executed in the order specified by the online help to ensure that inherited keys are generated correctly (e.g. Account should be run first, since most transactional data inherits a component of the primary key from the Account Id).
5. Ensure that you create the initial "null-key" value on the CK_* table to support optional foreign key logic performed as part of the insert into PROD.
6. Ensure that your key fields are all zero-filled numeric values with no trailing spaces (see my earlier posting in relation to Staging Table Keys).

Regardless of the approach adopted:
1. Consider bringing down the Staging CC&B application during your runs to reduce the risk of table/record locking by individuals logged on to the environment while the batches are running (feel free to bring it up at various points in the overall Conversion process to perform sanity checks, but these must be well defined milestones in the conversion plan, and the application should then be shut down again once this phase has completed).
2. The performance of the Validation steps seems to be affected, to a large degree, by the existence of Characteristics on entities, as a result, expect your runs to take longer if your data model calls for a large amount of characteristics.
3. Financial Transactions that are converted with REDUNDANT_SW = 'Y' will normally be exempt from all further processing in CC&B (with the exception of bill segment processing, direct queries and archiving), as a result the actual key value assigned is of less importance than active transactions.
4. Cross partition population of transactions for a single account will not "break" CC&B, it will simply incur a minor performance hit when retrieving these transactions.
5. Due to the fact that historic Bill Segments are normally accessed regardless of the REDUNDANT_SW by the next bill creation process, it is recommended that inherited key structures be retained for this entity.

Sunday Nov 23, 2008

Foreign Key Characteristic Values

We have encountered issues where Conversion runs in CC&B 2.2.0 (and possibly previous versions) results in Foreign key characteristics being populated with a value which contains trailing spaces. Whilst CC&B handles these correctly, it appears that the majority of reporting steps defined as part of the reconciliation processes do not do the required trimming of values, resulting in no match being found.

I recommend that all reconciliation reporting of foreign Key values includes the necessary 'TRIM' function calls in the selection criteria.

Sunday Oct 19, 2008

Running Keygen for Partial Conversion Runs

Hint: Whilst the majority of the CC&B Conversion process can be run as a subset of steps dependent on the type of conversion run being executed (ie. Convert on Person will only require that the Validation and Production steps specific to the Person entity be executed) it is important to note that once a project moves onto a conversion run involving a more complete 'V' structure, that the need to run all KeyGen steps becomes mandatory.

KeyGen will attempt to allocate a new CC&B key value for each record on the associated parent table, but it also creates a single 'blank' key value which is used by all Production Insert steps to link to optional key values. A prime example of this is the CI_ACCT MAILING_PREM_ID, this field is an optional field on the CI_ACCT records, but unless the associated CI_PREM KeyGen has been executed to create a ' ' entry on the CK_PREM table, none of the CI_ACCT records will be transferred from Staging to Production.

As a result, I recommend that once a site reaches the point where they are considering moving data from Staging to Production, that they include the full suite of KeyGen steps in the Conversion run schedule (regardless of how many of the CC&B data structures they are converting over). Given that these runs are being executed against a set of empty source tables, it is not anticipated that the overall run times will be extended to an unmanagable level.

Monday Sep 01, 2008

Threading...

A common question that I get asked is.. how do we force multi-threaded conversion jobs to split the key ranges evenly across each thread instance?

Unfortunately I don't have an easy answer. The base conversion jobs seem to take a simplistic approach of using the low and high override values (if supplied, otherwise they use values of 0000000000 and 9999999999) and then divide these by the number of threads to determine the ranges for each instance.

But.. We have found that we can force a predefined range of values and simulate threading but having separate batch controls defined for each thread, with predefined high/low values and these can then be run concurrently through standard batch submission processes.. e.g..
if we want to run VAL-ACCT in 4 threads and ensure that a known range of source keys between 0000000001 and 0000100000 is split between the 4 threads, we clone the existing batch control into
VAL-ACCT1, with OVRD-LOW-ID = 0000000001 and OVRD-HIGH-ID = 0000025000
VAL-ACCT2, with OVRD-LOW-ID = 0000025001 and OVRD-HIGH-ID = 0000050000
VAL-ACCT3, with OVRD-LOW-ID = 0000050001 and OVRD-HIGH-ID = 0000075000
VAL-ACCT4, with OVRD-LOW-ID = 0000075001 and OVRD-HIGH-ID = 0000100000

These four jobs can then be submitted alongside each other with Thread Number and Thread Count both set to 1, with no chance of them being rejected by the framework (this normally restricts the same batch control from running more than once at a time). The actual number of instances and the thread ranges of each will need to be adjusted in the same way as we normally do for our optimisation tasks, it is just a little harder since we have to adjust the batch control parameters rather than just the thread counts.

Tuesday Jul 29, 2008

Tidy Balances.. I don't really hate it

I have been asked why I tend to avoid running the Tidy Balances Conversion process, and the short answer is that 'I have never found a need to..'.

The Tidy Balances process has been designed to accept an input file of Accounts and balance 'buckets' and will produce adjustments of a specific type to ensure that the account has aged balances of these values. This could be handy in situations where minimal transaction data is being converted into CC&B via the traditional Extract-transform-and-load means (ie. convert all source data into CC&B Adjustments and Financial Transactions and run the required Validation and Keygen processes), it does not fit well with my experience of bringing 2-5 years of history over (which seems to be a common requirement on CC&B projects) and needing to create an opening balance for the Service agreements which is essentially just a placeholder transaction which is no longer subject to any aging or collection processing .

The effort required to create the Tidy Balances input table (i.e. summarise all historic transactions older than x days/months and generate a record on the work file for this account), can easily be serviced via the ConversionCentral data sources and some careful SQL, resulting in less effort required by the extraction team. The selection criteria for extracted data included in the standard mapping into Adjustments is normally in the format:

Select Source_Account, Adjustment_type, Adjustment_Amount,....
From Source System Tables
Where Transaction Date >= Conversion_Cutoff_Date


My approach to ensuring that all transactional data is processed in the run is to include a separate ControlCentral data source which generates a single 'Opening Balance' Adjustment record as:

Select Source_Account, 'Opening Balance Adjustment Code', SUM(Adjustment_Amount),....
From Source System Tables
Where Transaction Date <
Conversion_Cutoff_Date

With a bit of manipulation to ensure that this is attached to the earliest converted bill, we ensure that these transactions are not reprocessed by the first CC&B bill, whilst still ensuring that the Account and SA balances conform to expected values.

Given that these adjustments are older than 120 days, they are normally not subject to traditional aging and as such this solution proves to be easier (and quicker) than running a full Tidy Balances process.

Tuesday Jul 15, 2008

A picture is worth a thousand words...

CC&B Conversion Process
The attached image provides an overview of how the CC&B Conversion process hangs together. Hopefully this proves useful to you in understanding the various components of the process. For in-depth detail of the Server-side conversion modules, check the CC&B online help



Thursday Jun 19, 2008

Speeding up Conversion (or slowing down time)

Hint: One of the questions that I tend to get asked a lot is... how do we speed up conversion?

I tend to focus on a number of factors in an attempt to speed things up, but my top 11 is:

  1. Are the Staging and Production schemas on the same database/tablespace?
  2. Are the disks experiencing hotspots as a result of this shared implementation?
  3. If the two schemas are not on the same database, what is the theoretical maximum capacity of the link between the two instances?
  4. Is the machine CPU or Memory constrained?
  5. Can multithreaded runs be threaded further?
  6. Can Indexes be dropped for specific conversion steps? In most instances it it cheaper to drop the indexes before any mass insert steps and rebuild them afterwards than wear the cost of multiple I/Os for each insert.
  7. In instances where the indexes cannot be dropped, have these been created on their own database tablespace, with a dedicated I/O channel?
  8. For multi-threaded jobs, are the thread ranges reasonable, or can better results be achieved by running a number of single threaded artificial high/low limit jobs (CC&B Threading does not dynamically assign High/Low limits, it basically takes the max key range and divides by the number of threads, which can result in all of the work falling to the first couple of threads).
  9. How regular are database statistics being gathered... massive numbers of inserts can skew the optimum access path, so regular stats collections are essential.
  10. Can complex data manipulation be better served with the creation of a temporary table on the Staging or Input schemas?
  11. Consider whether all documented Conversion steps are required. e.g. I tend to avoid running Tidy Balances and prefer to build all required components and associated values into my data extracts and mapping rules.
About

Stuart Ramage

I am a Consulting Technical Manager for Oracle Corporation, and a member of the OU Black Belt Team, based in Hobart Tasmania.
I have worked in the Utility arena since 1999 on the Oracle UGBU product line, in a variety of roles including Conversion, Technical and Functional Architect.

Contact me on:

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today