Introduction

This blog describes the types of indexes that TimesTen Scaleout supports and when you should use them.
TimesTen Scaleout supports the following types of indexes:
- Hash Indexes
- Range Indexes
- Unique and non-unique indexes
- Indexes for primary keys and foreign keys
- Local Indexes
- Global Indexes
To better understand TimesTen Scaleout Indexes, you should first review TimesTen Scaleout data distribution and scalability.
Hash Indexes
Hash indexes in TimesTen are very fast O(1) [eg as little as a few microseconds for a unique lookup], but are only useful for equality tests on columns in the hash index.
ie they are great for PK or unique value lookups, but do not work for inequality [<, <=, > or >=] or in clauses.

Hash indexes can include up to 32 columns.
Under sized hash indexes are slow due to hash collisions

If your hash index pages [buckets] are under-sized then the hash function will result in hash collisions. If there is a hash collision, then TimesTen hash indexes use a linked list for the additional values. Traversing the linked list from a collision is a linear operation O(n). Hash collisions from under-sized pages [buckets] can significantly slow down TimesTen lookups. ie the lookup time will [eventaully] depend on the number of rows rather than taking a constant / small amount of time.
Over sized hash indexes waste RAM

If you make your hash indexes too large [ie allocate more pages than needed], then the index access will always be fast, but it wastes RAM. Computers have a finite amount of RAM and you want to use the RAM to store table data and only use sufficient RAM for indexes, but no more.
Sizing Hash Indexes
TimesTen hash indexes are allocated based on pages:
- You define the number of pages to size the hash index
- Page in TimesTen always have 256 entries
- A page can represent 256 rows
- 1000 rows in a table [or hash index] would need four pages [ie 256 * 4 = 1024]
A technique to size hash pages [buckets] for 10% growth is a follows:
For example, if your table has 100 million rows:
To size pages for 20% growth, use 1.2 rather than 1.1 in the calculation:
Use your knowledge of the workload to decide the growth factor. eg 5%, 10%, 20% etc.
Resizing Hash Indexes
Once your hash index has been created, you need to drop and re-create it if you need to resize it. You need to use the pages clause for a hash index to define the number of hash buckets.

You can display the indexes for a table via the ttIsql indexes command:

As long as the hash pages are correctly sized, the time to do a lookup for a unique value is independent of the number of rows in a table.
Range Indexes
Range indexes in TimesTen are more general purpose than hash indexes and are based on memory optimized B+-tree indexes.
Range indexes enable both equality and in-equality conditions for filtering. Range indexes are automatically balanced / re-sized at runtime by TimesTen.
Range indexes are slighty slower [O(log n)] than hash indexes as you may need to traverse several levels of branches to get to the leaf values.
Range indexes are the default index type.

The syntax to create and drop range indexes is similar to most other databases.

TimesTen range indexes can include up to 32 columns.
Unique and non-unique indexes
You can add the optional unique clause to either hash or range indexes.
If you do not specify the unique clause, then the index can have duplicate values.
Primary keys and foreign keys in TimesTen Scaleout
TimesTen Scaleout is a relational database, so by definition it supports both primary keys and foreign keys. Primary keys and foreign keys in TimesTen Scaleout are logically the same as for any other relational database. By default, a primary key creates a unique range index for the set of columns of the primary key.

Distribution Keys
As TimesTen Scaleout is a distributed database, it also has the concept of distribution keys. The distribution key is the set of columns which determines how rows of a table are spread across the different shards in TimesTen Scaleout.

By default, the distribution key for a table is based on the primary key.

The distribution key does not have to be based on the primary key [but it usually is].

You can choose to define a primary key with a hash index.

The TimesTen Scaleout primary key can use either a hash or range index.

Both hash and range indexes on the PK
TimesTen allows both hash and range indexes to be defined for the same set of columns. The TimesTen cost based optimizer will choose the hash index for equality conditions and the range index for in-equality conditions. As for any database, indexes tend to speed up data access, but will slow down writes as the indexes need to be maintained.

Local and Global Indexes in Timesten Scaleout
Any index which does not use the global clause is a local index. All of the above examples are local indexes.
As described in the TimesTen data distribution blog, a global index creates an underlying materialized view. This MV is distributed by hash and has a local range index.

Global indexes enable non PK columns to be distributed by hash. Any columns in the include clause are part of the MV, but are not part of the distribution key.

The benefit of global indexes
Joins based on global indexes can be fast and scalable.
Global indexes for the same column [types and values] on different tables will hash to the same shard.
This means that Scaleout table joins for columns which use global indexes will:
- Occur in-memory on the same element
- No network messages are required
- Will be fast and scalable

In the above example:
- There are global indexes on column Col5 on both table A and table B
- The data type for Col5 is the same for both table A and table B
- The hash value for value abc in table A is green and it mapps to shard 4
- The hash value for value abc in table B is green and it mapps to shard 4
- The hash value for value xyz in table A is blue and it mapps to shard 1
- The hash value for value xyz in table A is blue and it mapps to shard 1
- This means that an equality join condition for the global indexes will do the comparisons on the same shard. eg
When should you use global vs local indexes

Use Global Indexes When
Global indexes generally need the following pre-conditions to be faster than local indexes:
- You normally need at least six or more shards [replicasets] for global indexes to be faster than local indexes
- The more shards, the greater the benefit of global indexes
- Generally, the more rows the greater the benefit of global indexes
Global indexes can be significantly faster than local indexes when the above pre-conditions exist for the following use cases:
- Unique constraint checks for columns which are not the primary key
- The SYS.TINDEXES table gives lots of useful metadata about indexes
- Columns which are not the PK or FK that you want to join with other tables
- Queries that have an equality predicate that do not include all of the columns in the distribution key of the table.
- A group of columns that are frequently joined in queries with primary key columns.
- Non-index columns that are frequently used in queries. Define a global index with the
INCLUDEclause to include those non-index columns. In such a case, the table does not need to be accessed to satisfy the query. - An index where the index key is a prefix of the distribution key of the table
Global indexes are a space-time trade off:
- Global indexes require more space in RAM than local indexes as they require materialized views
- The resulting global indexes can be significantly faster than local indexes when used correctly
Use Local Indexes When
- The index key consists of only non-unique columns
- An index key that has the same columns as the distribution key for the table
- The situation where the distribution key of the table is a prefix of the index key
- Queries that have an equality predicate that includes all columns in the distribution key of the table
You can have both local and global indexes on the same columns, but there is not advantage in doing so. This will not speed up queries, and it will be slower for writes as both the indexes and MV need to be maintained.
Cost based optimizer
TimesTen uses a cost based optimizer.
You need to have up to date statistics to have good execution plans for any/all indexes types.

Summary
- Either hash or range indexes can be used for primary keys
- Both hash and range indexes can be used for non primary key columns
- Both hash and range indexes can support up to 32 columns
- You can have up to 500 indexes per table
- Both hash and range indexes can be unique or non-unique
- Hash indexes need to be sized via the pages clause
- Local indexes are local to the element
- Global indexes use distribute by hash via a materialized view
- Global indexes use a local range index on each elemnt in their MV
- Global indexes can be unique or non unique
- Keep you statistics up to date
Disclaimer: These are my personal thoughts and do not represent Oracle’s official viewpoint in any way, shape, or form.
