Good Database schema design is important for performance. One of
the most basic optimizations is to design your tables to take as little
space on the disk as possible
, this makes disk reads faster and
uses less memory for query processing.
You should use
the smallest data types possible
, especially for indexed fields.
The smaller your data types, the more indexes (and data) can fit into a
block of memory, the faster your queries will be.
Database Normalization eliminates redundant data, which usually makes
updates faster since there is less data to change. However a Normalized
schema causes joins for queries, which makes queries slower,
denormalization speeds retrieval. More normalized schemas are better
for applications involving many transactions, less normalized are
better for reporting types of applications. You should normalize
your schema first, then de-normalize later. Applications often
need to mix the approaches, for example use a partially normalized
schema, and duplicate, or cache, selected columns from one table in
another table. With JPA O/R mapping you can use the @Embedded
annotation for denormalized columns to specify a persistent field whose
type can be stored as an intrinsic part of the owning entity and share
the identity of the entity.
Database Normalization and Mapping Inheritance Hiearchies
The Class Inheritance hierarchy shown below will be used as an example
of JPA O/R mapping.
In the Single table per class mapping shown below, all
classes in the hierarchy are mapped to a single table in the database
This table has a discriminator column (mapped by @DiscriminatorColumn)
identifies the subclass. Advantages: This is fast for querying, no joins are
required. Disadvantages: wastage of space since all inherited fields are in every row, a deep inheritance hierarchy will result in wide tables with many, some empty columns
In the Joined Subclass mapping shown below, the root of the class hierarchy is
represented by a single table, and each subclass has a separate table
that only contains those fields specific to that subclass. This is
normalized (eliminates redundant data) which is better for storage and
updates. However queries cause joins which makes queries slower
especially for deep hierachies, polymorphic queries and relationships.
In the Table per Class mapping (in JPA 2.0, optional in JPA 1.0),
class is mapped
to a table in the database and all the inherited state is repeated in
that table. This is not
normlalized, inherited data is repeated which wastes space.
Queries for Entities of the same type are fast, however
polymorphic queries cause unions which are slower.
Know what SQL is executed
You need to understand the SQL queries your application makes and
evaluate their performance. Its a good idea to enable SQL logging, then
go through a use case scenario to check the executed SQL. Logging
is not part of the JPA specification, With EclipseLink you can enable logging
of SQL by setting the following property in the persistence.xml file:
<property name="eclipselink.logging.level" value="FINE"/>
With Hibernate you set the following property in the persistence.xml file:
<property name="hibernate.show_sql" value="true" />
Basically you want to make your queries access less data, is your
application retrieving more data than it needs, are queries accessing
too many rows or columns? Is the database query analyzing more rows
than it needs? Watch out for the following:
- queries which execute too often to retrieve needed data
- retrieving more data than needed
- queries which are too slow
- you can use EXPLAIN to see where you should add indexes
With MySQL you can use the slow query log
to see which queries are executing slowly, or you can use the MySQL query analyzer
to see slow queries, query execution counts, and results of EXPLAIN statements.
For slow queries, you can precede a SELECT statement with the keyword EXPLAIN
to get information about the query execution plan, which explains how
it would process the SELECT, including information about how
tables are joined and in which order. This helps find missing indexes
early in the development process.
You should index columns that are frequently used in Query WHERE, GROUP
BY clauses, and columns frequently used in joins, but be aware that
indexes can slow down inserts and updates.
Lazy Loading and JPA
With JPA many-to-one and many-to-many relationships lazy load by
default, meaning they will be loaded when the entity in the
relationship is accessed. Lazy loading is usually good, but if you need
to access all of the "many" objects in a relationship, it will cause
n+1 selects where n is the number of "many" objects.
You can change the relationship to be loaded eagerly as follows :
However you should be careful with eager loading which could cause
SELECT statements that fetch too much data. It can cause a Cartesian
product if you eagerly load entities with several related collections.
If you want to override the LAZY fetch type for specific use cases,
you can use Fetch Join. For example this query would eagerly load the
In General you should lazily load relationships, test your use case
scenarios, check the SQL log, and use @NameQueries with JOIN FETCH to
eagerly load when needed.
the main goal of partitioning is to reduce the amount of data read for
particular SQL operations so that the overall response time is reduced
splits tables with many columns into multiple tables with fewer
columns, so that only certain columns are included in a particular
dataset, with each partition including all rows.
segments table rows
so that distinct groups of physical row-based datasets are formed. All columns defined to a table are found in
each set of partitions. An
example of horizontal partitioning might be a table that contains
historical data being partitioned by date.
In the example of vertical partitioning below a table that contains a
number of very wide text or BLOB columns that aren't referenced often
is split into two tables with the most referenced columns in one table
and the seldom-referenced text or BLOB columns in another.
By removing the large data columns from the table, you get a faster
query response time for the more frequently accessed Customer data.
Wide tables can slow down queries, so you should always ensure that all
columns defined to a table are actually needed.
The example below shows the JPA mapping for the tables above. The
Customer data table with the more frequently accessed and smaller data
types is mapped to the Customer Entity, the CustomerInfo table
with the less frequently accessed and larger data types is mapped to
the CustomerInfo Entity with a lazily loaded one to one relationship to
The major forms of horizontal partitioning are by Range, Hash, Hash Key, List, and Composite.
Horizontal partitioning can make queries faster because the query
optimizer knows what partitions contain the data that will satisfy a
particular query and will access only those necessary partitions during
query execution. Horizontal Partitioning works best for large database
Applications that contain a lot of query activity that targets specific
ranges of database tables.
Partitioning data horizontally into "Shards"
is used by google
, and others to give extreme scalability for very large amounts of data. eBay "shards" data horizontally along its primary access path. Hibernate Shards
is a framework that is designed to encapsulate support for horizontal partitioning into the Hibernate Core.
Caching JPA Level 2 caching
avoids database access for already loaded entities, this make reading
reading frequently accessed unmodified entities faster, however it can
give bad scalability for frequent or concurrently updated entities.
You should configure L2 caching for entities that are:
- read often
- modified infrequently
- Not critical if stale
You should also configure L2 (vendor specific) caching for maxElements, time to expire, refresh...
References and More Information: JPA
Best Practices presentation MySQL
for Developers Article MySQL
for developers presentation MySQL
for developers screencast Keeping
a Relational Perspective for Optimizing Java Persistence Java Persistence with Hibernate Pro EJB 3: Java
Persistence API Java
Persistence API 2.0: What's New ? High Performance MySQL book Pro MySQL, Chapter 6: Benchmarking and Profiling EJB 3 in Action sharding the hibernate way JPA Caching Best Practices for Large-Scale Web Sites: Lessons from eBay