Friday Aug 28, 2009

JPA Performance, Don't Ignore the Database

jpaconcurrency


Database Schema

Good Database schema design is important for performance. One of the most basic optimizations is to design your tables to take as little space on the disk as possible , this makes disk reads faster and uses less memory for query processing.

Data Types

You should use the smallest data types possible, especially for indexed fields. The smaller your data types, the more indexes (and data) can fit into a block of memory, the faster your queries will be.

Normalization

Database Normalization eliminates redundant data, which usually makes updates faster since there is less data to change. However a Normalized schema causes joins for queries, which makes queries slower, denormalization speeds retrieval. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of applications.  You should normalize your schema first, then de-normalize later.  Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. With JPA O/R mapping you can use the @Embedded annotation for denormalized columns to specify a persistent field whose @Embeddable type can be stored as an intrinsic part of the owning entity and share the identity of the entity.



Database Normalization and Mapping Inheritance Hiearchies

The Class Inheritance hierarchy shown below will be used as an example of JPA O/R mapping.


In the Single table per class mapping shown below, all classes in the hierarchy are mapped to a single table in the database. This table has a discriminator column (mapped by @DiscriminatorColumn), which identifies the subclass.  Advantages: This is fast for querying, no joins are required. Disadvantages:  wastage of space since all inherited fields are in every row, a deep inheritance hierarchy will result in wide tables with many, some empty columns.



In the Joined Subclass mapping shown below, the root of the class hierarchy is represented by a single table, and each subclass has a separate table that only contains those fields specific to that subclass. This is normalized (eliminates redundant data) which is better for storage and updates. However queries cause joins which makes queries slower especially for deep hierachies, polymorphic queries and relationships.


In the Table per Class mapping (in JPA 2.0, optional in JPA 1.0),  every concrete class is mapped to a table in the database and all the inherited state is repeated in that table. This is not normlalized, inherited data is repeated which wastes space.  Queries for Entities of the same type are fast, however  polymorphic queries cause unions which are slower.



Know what SQL is executed

You need to understand the SQL queries your application makes and evaluate their performance. Its a good idea to enable SQL logging, then go through a use case scenario to check the executed SQL.  Logging is not part of the JPA specification, With EclipseLink you can enable logging of SQL by setting the following property in the persistence.xml file:

<properties>
    <property name="eclipselink.logging.level" value="FINE"/>
</properties>


With Hibernate you set the following property in the persistence.xml file:

<properties>
    <property name="hibernate.show_sql" value="true" />
</properties>


Basically you want to make your queries access less data, is your application retrieving more data than it needs, are queries accessing too many rows or columns? Is the database query analyzing more rows than it needs? Watch out for the following:
  • queries which execute too often to retrieve needed data
  • retrieving more data than needed
  • queries which are too slow
    • you can use EXPLAIN to see where you should add indexes

With MySQL you can use the slow query log to see which queries are executing slowly, or you can use the MySQL query analyzer to see slow queries, query execution counts, and results of EXPLAIN statements.

Understanding EXPLAIN

For slow queries, you can precede a SELECT statement with the keyword EXPLAIN  to get information about the query execution plan, which explains how it would process the SELECT,  including information about how tables are joined and in which order. This helps find missing indexes early in the development process.




You should index columns that are frequently used in Query WHERE, GROUP BY clauses, and columns frequently used in joins, but be aware that indexes can slow down inserts and updates.

Lazy Loading and JPA

With JPA many-to-one and many-to-many relationships lazy load by default, meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects.




You can change the relationship to be loaded eagerly as follows :




However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections.


If you want to override the LAZY fetch type for specific use cases, you can use Fetch Join. For example this query would eagerly load the employee addresses:

In General you should lazily load relationships, test your use case scenarios, check the SQL log, and use @NameQueries with JOIN FETCH to eagerly load when needed.

Partitioning

the main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reduced

Vertical Partitioning  splits tables with many columns into multiple tables with fewer columns, so that only certain columns are included in a particular dataset, with each partition including all rows.

Horizontal Partitioning segments table rows so that distinct groups of physical row-based datasets are formed. All columns defined to a table are found in each set of partitions. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.

Vertical Partitioning


In the example of vertical partitioning below a table that contains a number of very wide text or BLOB columns that aren't referenced often is split into two tables with the most referenced columns in one table and the seldom-referenced text or BLOB columns in another.

By removing the large data columns from the table, you get a faster query response time for the more frequently accessed Customer data. Wide tables can slow down queries, so you should always ensure that all columns defined to a table are actually needed.

The example below shows the JPA mapping for the tables above. The Customer data table with the more frequently accessed and smaller data types  is mapped to the Customer Entity, the CustomerInfo table with the less frequently accessed and larger data types is mapped to the CustomerInfo Entity with a lazily loaded one to one relationship to the Customer.



Horizontal Partitioning

The major forms of horizontal partitioning are by Range, Hash, Hash Key, List, and Composite.

Horizontal partitioning can make queries faster because the query optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. Horizontal Partitioning works best for large database Applications that contain a lot of query activity that targets specific ranges of database tables.


Hibernate Shards

Partitioning data horizontally into "Shards" is used by google, linkedin, and others to give extreme scalability for very large amounts of data. eBay "shards" data horizontally along its primary access path.

Hibernate Shards is a framework that is designed to encapsulate support for horizontal partitioning into the Hibernate Core.


Caching

JPA Level 2 caching avoids database access for already loaded entities, this make reading reading frequently accessed unmodified entities faster, however it can give bad scalability for frequent or concurrently updated entities.

You should configure L2 caching for entities that are:
  • read often
  • modified infrequently
  • Not critical if stale
You should also configure L2 (vendor specific) caching for maxElements, time to expire, refresh...

References and More Information:

JPA Best Practices presentation
MySQL for Developers Article
MySQL for developers presentation
MySQL for developers screencast
Keeping a Relational Perspective for Optimizing Java Persistence
Java Persistence with Hibernate
Pro EJB 3: Java Persistence API
Java Persistence API 2.0: What's New ?
High Performance MySQL book
Pro MySQL, Chapter 6: Benchmarking and Profiling
EJB 3 in Action
sharding the hibernate way
JPA Caching
Best Practices for Large-Scale Web Sites: Lessons from eBay




Friday Aug 21, 2009

JPA Caching


JPA Level 1 caching

JPA has 2 levels of caching. The first level of caching is the persistence context.

The JPA Entity Manager maintains a set of Managed Entities in the Persistence Context.

The Entity Manager guarantees that within a single Persistence Context, for any particular database row, there will be only one object instance. However the same entity could be managed in another User's transaction, so you should use either optimistic or pessimistic locking  as explained in JPA 2.0 Concurrency and locking

The code below shows that a find on a managed entity with the same id and class as another in the same persistence context , will return the same instance.

@Stateless public ShoppingCartBean implements ShoppingCart {

 @PersistenceContext EntityManager entityManager;

 public OrderLine createOrderLine(Product product,Order order) {
        OrderLine orderLine = new OrderLine(order, product);
        entityManager.persist(orderLine);   //Managed
        OrderLine orderLine2 =entityManager.find(OrderLine,
orderLine.getId()));
     (orderLine == orderLine2  // TRUE
        return (orderLine);
    }

}

The diagram below shows the life cycle of an Entity in relation to the Persistent Context.

The code below illustrates the life cycle of an Entity. A reference to a container managed EntityManager is injected using the persistence context annotation. A new order entity is created and the entity has the state of new. Persist is called, making this a managed entity. because it is a stateless session bean it is by default using container managed transactions , when this transaction commits , the order is made persistent in the database. When the orderline entity is returned at the end of the transaction it is a detached entity.

The Persistence Context can be either Transaction Scoped-- the Persistence Context 'lives' for the length of the transaction, or Extended-- the Persistence Context spans multiple transactions. With a Transaction scoped Persistence Context, Entities are "Detached" at the end of a transaction.

As shown below, to persist the changes on a detached entity, you call the EntityManager's merge() operation, which returns an updated managed entity, the entity updates will be persisted to the database at the end of the transaction.

An Extended Persistence Context spans multiple transactions, and the set of Entities in the Persistence Context stay Managed. This can be useful in a work flow scenario where a "conversation" with a user spans multiple requests.

The code below shows an example of a Stateful Session EJB with an Extended Persistence Context in a use case scenario to add line Items to an Order. After the Order is persisted in the createOrder method, it remains managed until the EJB remove method is called. In the addLineItem method , the Order Entity can be updated because it is managed, and the updates will be persisted at the end of the transaction.


The example below contrasts updating the Order using a transaction scoped Persistence Context verses an extended Persistence context. With the transaction scoped persistence context, an Entity Manager find must be done to look up the Order, this returns a Managed Entity which can be updated. With the Extended Persistence Context the find is not necessary. The performance advantage of not doing a database read to look up the Entity, must be weighed against the disadvantages of memory consumption for caching, and the risk of cached entities being updated by another transaction.  Depending on the application and the risk of contention among concurrent transactions this may or may not give better performance / scalability.

JPA second level (L2) caching

JPA second level (L2) caching shares entity state across various persistence contexts.


JPA 1.0 did not specify support of a second level cache, however, most of the persistence providers provided support for second level cache(s). JPA 2.0 specifies support for basic cache operations with the new Cache API, which is accessible from the EntityManagerFactory, shown below:


If L2 caching is enabled, entities not found in persistence context, will be loaded from L2 cache, if found.

The advantages of L2 caching are:
  • avoids database access for already loaded entities
  • faster for reading frequently accessed  unmodified entities
The disadvantages of L2 caching are:
  • memory consumption for large amount of objects
  • Stale data for updated objects
  • Concurrency for write (optimistic lock exception, or pessimistic lock)
  • Bad scalability for frequent or concurrently updated entities

You should configure L2 caching for entities that are:
  • read often
  • modified infrequently
  • Not critical if stale
You should protect any data that can be concurrently modified with a locking strategy:
  • Must handle optimistic lock failures on flush/commit
  • configure expiration, refresh policy to minimize lock failures
The Query cache is useful for queries that are run frequently with the same parameters, for not modified tables.

The EclipseLink JPA persistence provider caching Architecture

The  EclipseLink caching Architecture is shown below.


Support for second level cache in EclipseLink is turned on by default, entities read are L2 cached. You can disable the L2 cache. EclipseLink caches entities in L2, Hibernate caches entity id and state in L2. You can configure caching by Entity type or Persistence Unit with the following configuration parameters:
  • Cache isolation, type, size, expiration, coordination, invalidation,refreshing
  • Coordination (cluster-messaging)
  • Messaging: JMS, RMI, RMI-IIOP, …
  • Mode: SYNC, SYNC+NEW, INVALIDATE, NONE
The example below shows configuring the L2 cache for an entity using the @Cache annotation

The Hibernate JPA persistence provider caching Architecture

The Hibernate JPA persistence provider caching architecture is different than EclipseLink: it is not configured by default, it does not cache enities just id and state, and you can plug in different L2 caches. The diagram below shows the different L2 cache types that you can plug into Hibernate.

The configuration of the cache depends on the type of caching plugged in. The example below shows configuring the hibernate L2 cache for an entity using the @Cache annotation

For More Information:

Introducing EclipseLink
EclipseLink JPA User Guide
Hibernate Second Level Cache
Speed Up Your Hibernate Applications with Second-Level Caching
Hibernate caching
Java Persistence API 2.0: What's New ?
Beginning Java™ EE 6 Platform with GlassFish™ 3
Pro EJB 3: Java Persistence API (JPA 1.0)





About

caroljmcdonald

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today