Hibernate - Tuning queries Using Paging, Batch Size, and Fetch Joins

Hi, 
Another entry about queries - this time a tuning test case and the relations between simple queries, join fetch queries, paging query results, and batch size.

Paging the Query Results

Since I have never said anything about paging in this blog I start with a short introduction about paging in EJB3:

To support paging the EJB3 Query interface defines the following two methods:
  • setMaxResults - sets the number of maximum rows to retrieve from the database
  • setFirstResult - sets the first row to retrieve

For example if our GUI displays a list of customers and we have 500,000 customers (database rows) in out database we wouldn't like to display all 500,000 records is one view (even if we put performance considerations aside - nobody can do anything with a list of 500,000 rows). The GUI design would usually include paging - we break the list of records to display into logical pages (for example 100 records per page) and the user can navigate between pages (same as Google's results navigator down the search page). 

When using the paging support it is important to remember that the query has to be sorted otherwise we can't be sure that when fetching the "next page" it will really be the next page (since in the absence of the 'order by' clause form a SQL query the order in which rows are fetch is unpredictable).

Here is a sample use, for fetching the first tow pages of 100 rows each:


        Query q = entityManager.createQuery("select c from Customer c order by c.id");
        q.setFirstResult(0).setMaxResults(100);

        .... next page ...
        Query q = entityManager.createQuery("select c from Customer c order by c.id");
        q.setFirstResult(100).setMaxResults(100);

This is a simple API and it's important (for performance) to remember using it when we need to fetch only parts of the results. 

Test Case Description

This test cased is based on a real tuning I did for an application, I just changed the class names to Customer and Order. Let's assume that I have a Customer entity with a set of orders (lazily fetched - but it happens in eager fetch as well) and we need to:

  1. Fetch customers and their orders
  2. Do it in a "paging mode" - 100 customers per page


Tuning Requirement #1 - Fetch Customers and Their Orders


There are two possibilities to perform this kind of fetch:

    

        Simple select: select c from customer c order by c.id 

        Join fetch: select distinct c from Customer c left outer join fetch c.orders order by c.id
 

The simple select is as simple as it can be, we load a list of customers with a proxy collection in their orders field. The orders collection will be filled with data once I access it (for example c.getOrders().getSize() ). The 'join fetch' means that we want to fetch an association as an integral part of the query execution. The joined fetched entities (in the example above: c.orders) must be part of an association that is referenced by an entity returned from the query (in the example above: c). The 'join fetch' is one of the tools used for improving queries performance (see more in here). The Hibernate core documentations explains that "a 'fetch' join allows associations or collections of values to be initialized along with their parent objects, using a single select" (see here). 


I have in my database 18,998 customer records, each with few orders. Let's compare execution time for the two queries. My code looks the same for both queries (except of the query itself), I execute the query, then I iterate the results checking the size of of each customer orders collection and print the execution time and number of records fetch (as a sanity for the query syntax):


    

        Query q = entityManager.createQuery(queryStr);

        long a = System.currentTimeMillis();
        List l = q.getResultList();
        for (Customer c : l) {
            c.getOrders().size();
        }
        long b = System.currentTimeMillis();

        System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );


And to the numbers (avg. 3 executions):
  • Simple select: 24,984 millis
  • Join fetch: 1,219 millis
The join fetch query execution time was 20 times faster(!) than the simple query. The reason is obvious, using the join fetch select I had only one round trip to the database. While using a simple select I had to fetch the customers (1 round trip to the database) and each time I accessed a collection I had another round trip (that's 18,998 additional round trips!).

The winner is 'join fetch'. But does it? wait for the next one - the paging...

Tuning Requirement #2 - Use Paging

The second requirement was to do it in paging - each page will have 100 customers (so we will have 18,900/100+1 pages - the last page has 98 customers). So let's change the code above a little bit:


    

        Query q = entityManager.createQuery(queryStr);
         
          q.setFirstResult(pageNum*100).setMaxResults(100);

        
        long a = System.currentTimeMillis();
        List l = q.getResultList();
        for (Customer c : l) {
            c.getOrders().size();
        }
        long b = System.currentTimeMillis();

        System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );

I added the second line which limits the query result to a specific page with up to 100 records per page. And the numbers are (avg. 3 executions):
  • Simple select: 328 millis
  • Join fetch: 1,660 millis
The wheel has turned over. Why? First a quote from the EJB3 Persistence specification:

"The effect of applying setMaxResults or setFirstResult to a query involving fetch joins over collections is undefined" (section 3.6.1 - Query Interface)

We could have stopped here but it is interesting to understand the issue and to see what Hibernate does. 

To implement the paging features Hibernate delegates the work to the database using its syntax to limit the number of records fetched by the query. Each database has its own proprietary syntax for limiting the number of fetched records, some examples:
  • Postgres uses LIMIT and OFFSET
  • Oracle has rownum 
  • MySQL uses its version of LIMIT and OFFSET
  • MSSQL has the TOP keyword in the select
  • and so on

The important thing to remember here is meaning of such limit: the database returns a subset of the query result. So if we asked for the first 100 customers which their names contain 'Eyal' the outcome is logically the same as building a table in memory out of all customers that match the criteria and take from there the first 100 rows. And here is the catch: if the query with the limit includes a join clause for a collection than the first 100 row in the "logical table" will not necessarily be the first 100 customers. the outcome of the join might duplicate customers in the "logical tables" but the database doesn't aware or care about that - it performs operations on tables not on objects!. For example think of the extreme case, the customer 'Eyal' has 100 orders. The query will return 100 rows, hibernate will identify that all belong to the same customer and return only one Customer as the query result - this is not what we were asking for.

This also works, of course, the other way around. If a customer had more than 100 orders and the result set size was limited to 100 rots the orders collection would not contain all of the customer's orders. 

To deal with that limitation Hibernate actually doesn't issue an SQL statement with a LIMIT clause. Instead it fetches all of the records and performs the paging in memory. This explains why using the 'join fetch' statement with paging took more than the one without paging - the delta is the in-memory paging done by Hibernate. If you look at Hibernate logs you will find the next warning issued by Hibernate:


    

WARNING: firstResult/maxResults specified with collection fetch; applying in memory!


Final Tuning - BatchSize

Does it mean that in the case of paging we shouldn't use a join fetch? usually it does (unless your page size is very close to the actual number of records). But even if you use a simple select this is a classic case for using the @BatchSize annotation. 

If my session/entity manager has 100 customers attached to it than, be default, for each first access to one of the customers' order collection Hibernate will issue a SQL statement to fill that collection. At the end I will execute 100 statements to fetch 100 collections. You can see it in the log:



Hibernate: /* select c from Customer c order by c.id */ select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_, customer0_.fixedDiscount as fixedDis5_0_, customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ?

Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
............
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?

The @BatchSize annotation can be used to define how many identical associations to populate in a single database query. If the session has 100 customers attached to it and the mapping of the 'orders' collection is annotated with @BatchSize of size n. It means that whenever Hibernate needs to populate a lazy orders collection it checks the session and if it has more customers which their orders collections need to be populated it fetches up to n collections. Example: if we had 100 customers and the batch size was set to 16 when iterating over the customers to get their number of orders hibernate will go to the database only 7 times (6 times to fetch 16 collections and one more time to fetch the 4 remaining collections - see the sample below). If our batch size was set to 50 it would go only twice. 


    @OneToMany(mappedBy="customer",cascade=CascadeType.ALL, fetch=FetchType.LAZY)
    @BatchSize(size=16)
    private Set orders = new HashSet();

And in the log:
Hibernate: /* select c from Customer c order by c.id */ select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_, customer0_.fixedDiscount as fixedDis5_0_, customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?)


Back to our test case. In my example setting the batch size to 100 looks like a nice tuning opportunity. And indeed when setting it to 100 the total execution time dropped to 188 millis (that's an 132 (!!!) times faster than worse result we had). The batch size can also be set globally by setting the hibernate.default_batch_fetch_size property for the session factory.



Please notice: I had recover this blog post from my old blog at http://www.jroller.com/eyallupu since jroller.com is no longer available. As such the styling might be a bit wobbly ... If something seems 'too broken' please contact me and I'll adjust

Comments

Popular posts from this blog

New in Spring MVC 3.1: CSRF Protection using RequestDataValueProcessor

Hibernate Exception - Simultaneously Fetch Multiple Bags

Hibernate Derived Properties - Performance and Portability