This article describes how table rows are distributed across nodes in a TimesTen Scaleout grid and how distributed SQL join queries are tested to ensure correct results in a wide range of operating conditions.
TimesTen Scaleout supports multiple data spaces for high availability and fault tolerance. A data space consists of a set of nodes containing a complete copy of the entire database. A node in a given data space has a duplicate node in another data space. These duplicate nodes are called replica sets.
A 3x2 grid is a TimesTen Scaleout grid configuration where there are three partitions of the database data set with two complete copies of the database for a total of 6 storage nodes.
A key feature of TimesTen Scaleout is that applications do not need to be aware of the physical data distribution and structure of a grid. Applications can connect to any node and execute any type of standard SQL read or write operation without restriction.
The storage node location of table rows in a TimesTen Scaleout grid is determined by selecting one of several different available methods.
- DISTRIBUTE BY HASH tables assign rows to particular nodes using a consistent hash key based on the values in one or more table columns.
- DISTRIBUTE BY REFERENCE tables co-locate child table rows with their foreign key linked parent rows. This optimizes join performance for related data.
- DUPLICATE tables store a complete copy of the table’s rows on each node, thereby avoiding the overhead of network operations.
Defining the Schema
For this particular join test, four tables (J1-J4) are defined with a common key on the ID column, but the row distribution method, which determines where each row is stored in the grid, is defined differently.
CREATE TABLE J1 (ID INTEGER PRIMARY KEY, VALUE VARCHAR2 (64)) DISTRIBUTE BY HASH (ID); CREATE TABLE J2 (ID INTEGER PRIMARY KEY, VALUE VARCHAR2 (64), FOREIGN KEY (ID) REFERENCES J1 (ID) ON DELETE CASCADE) DISTRIBUTE BY REFERENCE; CREATE TABLE J3 (ID INTEGER PRIMARY KEY, VALUE VARCHAR2 (64)) DISTRIBUTE BY HASH (ID, VALUE); CREATE TABLE J4 (ID INTEGER PRIMARY KEY, VALUE VARCHAR2 (64)) DUPLICATE;
Each table contains 16 rows and is populated with identical data.
INSERT INTO J1 SELECT ROWNUM, CASE WHEN MOD (ROWNUM, 2) = 0 THEN ROWNUM END FROM ALL_OBJECTS WHERE ROWNUM <= 16;
This graph shows where the individual rows for all four tables are stored in a 6x2 TimesTen Scaleout grid.
The y axis represents the value of the ID column for a particular table row and the x axis indicates the grid node where the row is stored. The top and bottom rows of nodes are exact duplicates of each other belonging to separate data spaces.
- The co-located rows for tables J1 and J2 are always stored on the same node given a foreign key relationship. Table J1 is distributed using HASH (ID) and table J2 is distributed BY REFERENCE based on the location of the corresponding row in J1.
- Table J3 uses a composite distribution key: HASH (ID, VALUE). The rows in J3 are distributed independently of the rows in J1 and J2.
- Table J4 is distributed using the DUPLICATE method. A complete copy of this table is located on each node for the best possible performance.
Note that the distribution of rows is not very uniform across nodes. This occurs because the test tables contain only 16 rows each. In practice, real tables containing thousands, millions or billions of rows are stored uniformly across grid nodes.
When joining table rows in a distributed database the best performance is realized when all of the joined rows are stored on the local executing node. These local joins are the ideal case, but in practice applications do require joins against rows stored on multiple nodes. TimesTen Scaleout executes distributed joins automatically via standard SQL without any special syntax, restrictions or operating modes.
When a distributed join is executed the SQL engine creates an access plan to retrieve the required rows. This equi join (shown in two equivalent forms) requires access to at least one copy of every row from each of the four tables in order to return the correct result to the application.
-- ANSI join syntax SELECT J1.*, J2.*, J3.*, J4.* FROM J1 JOIN J2 ON (J1.ID = J2.ID) JOIN J3 ON (J1.ID = J3.ID) JOIN J4 ON (J1.ID = J4.ID); -- equivalent traditional join syntax SELECT J1.*, J2.*, J3.*, J4.* FROM J1, J2, J3, J4 WHERE J1.ID = J2.ID AND J1.ID = J3.ID AND J1.ID = J4.ID;
This graph shows which rows were accessed to satisfy the equi join query request.
- The blue frame indicates that the query was executed on Node 7. Rows accessed on Node 7 did not require network operations.
- Since Node 7 is located in Data Space 2, only nodes in Data Space 2 were accessed for remote rows. The duplicate nodes in Data Space 1 (not shown) did not participate in the query.
- The row access pattern reveals that the 4-way join operation was executed on each of the participating nodes and the results from each node were then collected and returned to the application on Node 7.
Leveraging Randomness and Consistency Oracles
Testing the reliability and correctness of distributed SQL joins is challenging. The number of scenarios and combinations is endless. One effective strategy involves automated generation of SQL queries. This type of SQL test case automation is inspired by Don Slutz and the RAGS system described in Massive Stochastic Testing of SQL.
The SQL generation system used for TimesTen Scaleout relies on statement definitions expressed in a templating language. An abstract definition of a SQL join is first designed manually. Software then resolves the definition, using randomly selected elements, into a large number of concrete executable statements.
Any one of these machine generated queries looks like nonsense. But when thousands of these statements are systematically generated, executed and evaluated for correctness, a significant area of the input space can be sampled in a short period of time.
How are randomly generated queries evaluated for correct results? The answer involves consistency oracles. The SQL statements executed against TimesTen Scaleout can also be executed against other independent relational databases. The comparative results are stored and used to make automated judgements of what is correct in subsequent executions of the same statements. Thanks to the standardization of the SQL language, relational databases are full of consistency oracles that can be leveraged to find bugs.
Executing and Evaluating
The animation below uses data collected from an automated test procedure operating against a 6x2 TimesTen Scaleout grid. The row access patterns for twelve different machine generated join queries are displayed as the process executes each query on different nodes of the grid.
- All twelve nodes in the 6x2 grid are laid out on the x axis of each plot. The nodes belonging to Data Space 1 are on the left of each plot area and the duplicate nodes of Data Space 2 are on the right.
- In normal operating conditions, only rows located in the data space where the query is executing will be accessed.
- But when certain nodes become unavailable (depicted in dark grey), the system will automatically compensate by accessing duplicate nodes in the other data space.
Automated tests like this one are designed to make sure that the correct results for a wide range of potential queries are returned regardless of the grid topology, execution node, or system state - as long as at least one copy of the requested data is available.
This article describes just one of many different testing techniques applied during the development of Oracle TimesTen Scaleout. Effective testing of distributed systems is no less challenging than designing and building them. Automation of test case generation, not just execution checking, in conjunction with test process data analysis, makes a real difference in the quality of the final product.
Disclaimer: These are my personal thoughts and do not represent Oracle's official viewpoint in any way, shape, or form.