Hibernate Derived Properties - Performance and Portability

Hi all,

This time about derived properties, maybe this is not a commonly used feature, and maybe even a little bit hidden one (I don't think I have ever been asked about it in any of the Hibernate courses I had lectured in and this usually a sign that people are not familiar with that feature) but once you're familiar with that it is a powerful feature - however, as always, there are considerations regarding of how and when to use it.

What Is a Derived Property?

A derived, or calculated, property is a read only property which its value is calculated at fetch time using SQL expressions. For example a Product class might have a price and a calculated
final price which is the price including VAT. The first (not so good) solution might be something like that:

@Entity
@Table(name="PRODUCTS")
public class Product {

  @Column(name="PRICE")
  private float price;
  
  public float getFinalPrice() { 
    return VAT*price; 
  } 
}

The getFinalPrice() method in the example above is kind of a calculated property - but it doesn't use Hibernate support for at all - and it is not a derived property, this is only a simple getter which performs a calculation. In first look it seems good enough but it will work only for the simple cases when we wish to perform calculation in memory (for example for display purpose) but what if we would like to query ("I want all of the products which their final price is larger than $10") or sort by the calculated value? Here Hibernate comes to our help.

Configuring a Derived Property

To configure the derived property we use the @Formula (or the <formula> xml element) annotation:

@Entity
@Table(name="PRODUCTS")
public class Product {

  @Column(name="PRICE")
  private float price;

  @Formula("PRICE*1.155")
  private float finalPrice;

 .....

  public float getPrice() {
    return price;
  }

  public void setPrice(float price) {
    this.price = price;
  }

  public float getFinalPrice() {
    return finalPrice;
  }
}

In the example above we can see the @Formula annotation with the PRICE*1.55 expression as its value. It is important to remember that the value of the annotation is a SQL expression so the 'PRICE' in the formula is the column (not the property!) name. When Hibernate fetches such an entity it generated the following SQL statement:

/* select p from Product p */ select product0_.id as id2_, product0_.NAME as NAME2_, product0_.PRICE as
 PRICE2_, product0_.PRICE*1.155 as formula2_ from PRODUCTS product0_

We can see that hibernate copies the formula's expression into the SQL as is, so now we can use the derived property anywhere in our query:

Query q = entityManager.createQuery("select p.name, p.finalPrice from Product p where p.finalPrice > 5");
List<Product> l = q.getResultList();

And in the log we can see the following SQL:
/* select p from Product p where p.finalPrice > 5 */ select product0_.NAME as col_0_0_, product0_.PRICE*1.155
as col_1_0_ from PRODUCTS product0_ where product0_.PRICE*1.155>5

In the first example we couldn't reference the 'finalPrice' property in the HQL - Hibernate wasn't aware to that - but now we can query about it as if it was a regular property, from the example you can see that I use it both in the projection and in the where clause.

More Complicated Example

In the following example we have a Customer entity which owns a set of Orders, each order has a creation date (the date the order was created in the database), the customer has three derived properties:

firstOrderDate (type: java.util.Date) - the date of the oldest order the customer has placed
lastOrderDate (type: java.util.Date) - the date of the newest order the customer has placed
daysSinceLastOrder (int) - the number of days passed since the last order placed by the customer

As we can see from the definitions above the derived properties can be of any type supported by Hibernate, in my example I use dates and boolean, let's see and analyze the code:

@OneToMany(mappedBy="customer",cascade=CascadeType.ALL, fetch=FetchType.LAZY)
@BatchSize(size=100)
private Set<Order> orders = new HashSet<Order>();

@Formula("(select min(o.creation_date) from Orders o where o.customer_id = id)")
private Date firstOrderDate;

@Formula("(select max(o.creation_date) from Orders o where o.customer_id = id)")
private Date lastOrderDate;

@Formula("(select coalesce(extract ('day' from age(max(o.creation_date))), 9999) from Orders o where o.customer_id = id)")
private int daysSinceLastOrder;

 ...

public Date getFirstOrderDate() {
  return firstOrderDate;
}

public Date getLastOrderDate() {
  return lastOrderDate;
}

public int getDaysSinceLastOrder() {
   return daysSinceLastOrder;
}


 ....And the Order entity (just to complete the picture)...:

@Entity
@Table(name = "ORDERS")
public class Order {

 ...
  @NotNull
  @Column(name="CREATION_DATE")
  private Date creationDate;

  @ManyToOne
  @JoinColumn(name="CUSTOMER_ID")
  private Customer customer;

  @PrePersist 
  public void prePersist()  {
    creationDate = new Date();
  }
}

This example is more complicated - it includes few derived columns, more complicated SQL statements and queries. Personally I don't think I would have use this mapping tactic in real life application (we will see later what I would have done) but this is a good example.

Referencing the Owning Entity

Take a look at the "o.customer_id=id" part of the where clause: by not declaring an alias on a column (the "id" column in here) I can reference the owning entity columns - this is usually required when my formula needs to fetch associated entities, in my example I have to fetch all of the customer's orders, so I must use the customer id. In our first example the formula PRICE*1.155 was also referring to the owning entity's by using its price column. However remember that 'SOME TEXT' can also be used as a formula expression, same for 'select some_column from some_table' so referring to the owning entity is not a must.

Proprietary SQL Expressions

Another thing worth paying attention to is the daysSinceLastOrder property - the issue in here is that the SQL expression used to calculate the expression uses the target database specific SQL syntax (PostgreSQL 8.3 in here) - this can happen in any of the derived property but
in the daysSinceLastOrder property we can clearly see it. This is one of the important side effects of using formulas: since the formula syntax is an arbitrary SQL it is very easy to unintentionally use dialect specific SQL statements even if we planned for a portable code.

Performance

Let's take a look at the SQL statement generated by Hibernate when I fetched all customers ordered by their daysSinceLastOrder property (HQL: "select c from Customer c order by c.daysSinceLastOrder"):

/* select c from Customer c order by c.daysSinceLastOrder */ select 
    customer0_.id as id0_, 
    customer0_.CC_NUMBER as CC3_0_, 
    customer0_.NAME as NAME0_,
    customer0_.FIXED_DISCOUNT as FIXED5_0_,
    (select coalesce(extract ('day' from age(max(o.creation_date))), 9999) from Orders o  
       where o.customer_id = customer0_.id) as formula0_,
    (select min(o.creation_date) from Orders o where o.customer_id = customer0_.id) as formula1_, 
    (select max(o.creation_date) from Orders o where o.customer_id = customer0_.id) as formula2_,  
    customer0_.DTYPE as DTYPE0_ 
   from CUSTOMERS customer0_ 
   order by 
    (select coalesce(extract ('day' from age(max(o.creation_date))), 9999) from Orders o 
       where o.customer_id = customer0_.id)

The concern about this query is its performance, since whenever Hibernate needs to use a derived property he must replace it with its formula we might end up with complicated and inefficient queries when we fetch our entities - notice how the "coalesce(extract ('day' from age(max(o.creation_date))), 9999)" is calculated twice.

Another Approach

In this section I want to illustrate other approach for derived properties, the advantages of this approach is that it eliminates (to some extent) the portability issues and the performance penalty. Based on the fact that derived properties are read only values I would like to use database views and an additional entity which contains the calculated (derived) properties. Let's
start by creating our view:

CREATE OR REPLACE VIEW customers_derived AS
select 
  max(o.creation_date) as last_order_date,
  min(o.creation_date) as first_order_date,
  coalesce(extract ('day' from age(max(o.creation_date))), 9999) as days_since_last_order,
  c.id as customer_id
from Orders o, customers c where o.customer_id = c.id group by c.id;

Now I create an additional entity (CustomerDerived) which its only role is to be a container of the derived properties:

@Entity
@Table(name="CUSTOMERS_DERIVED")
@Immutable
public class CustomerDerived {

  @Id
  @Column(name="CUSTOMER_ID")
  long id;

  @Column(name="FIRST_ORDER_DATE")
  private Date firstOrderDate;

  @Column(name="LAST_ORDER_DATE")
  private Date lastOrderDate;

  @Column(name="DAYS_SINCE_LAST_ORDER")
  private int daysSinceLastOrder;

  public Date getFirstOrderDate() {
    return firstOrderDate;
  }

  public Date getLastOrderDate() {
   return lastOrderDate;
  }

  public int getDaysSinceLastOrder() {
    return daysSinceLastOrder;
  }
}

The different from regular entities is the @Immutable annotation which marks the CustomerDerived entity as read-only. The last step is to include the derived properties container entity in our customer entity, since the CustomerDerived entity is calculated by the view and its identity column equals the owning customer identity (the customer_id view's column) I use the
@PrimaryKeyJoinColumn annotation to create the link between the two entities:

public class Customer {

  ...
  @OneToOne
  @PrimaryKeyJoinColumn(name="CUSTOMER_ID")
  private CustomerDerived derived;

  ...
}

Lets analyze this structure:

Propriety SQL Expressions

In this solution I have no SQL expressions in my entity mapping - all of the dialect propriety SQL syntax can be done in the view definition. Using the view the entity sees an abstraction of the different SQL dialects. For example the generated SQL for the same query as before (notice that the HQL was slightly changed to use join fetch - see here for more details):

/* select c from Customer c join fetch c.derived d order by d.daysSinceLastOrder */ select customer0_.id as
 id0_0_, customerde1_.CUSTOMER_ID as CUSTOMER1_1_1_, customer0_.CC_NUMBER as CC3_0_0_, customer0_.NAME as
 NAME0_0_, customer0_.FIXED_DISCOUNT as FIXED5_0_0_, customer0_.DTYPE as DTYPE0_0_,
 customerde1_.DAYS_SINCE_LAST_ORDER as DAYS2_1_1_, customerde1_.FIRST_ORDER_DATE as FIRST3_1_1_,
 customerde1_.LAST_ORDER_DATE as LAST4_1_1_ from CUSTOMERS customer0_ inner join CUSTOMERS_DERIVED customerde1_
 on customer0_.id=customerde1_.CUSTOMER_ID order by customerde1_.DAYS_SINCE_LAST_ORDER

As we can see this is a simpler SQL but more important than that it is fully generated by Hibernate - there are no formulas in this SQL and as long as we are using a dialect supported by Hibernate we should be portable.

Performance

In the example we have seen earlier each of our SQL formulas was included in the statement as a subselect, and if we were referencing the same derived property more than once in a HQL query it would have been included multiple time (see how the formula for daysSinceLastOrder in the previous example is included both in the order by and the select clauses of the generated SQL). In the last example all of the derived properties are fetched using one join with the view which itself is calculated using a single select (few numbers just to see the difference: on my laptop the first example takes 15 seconds to fetch 2100 customers, the second one takes less than two second!).

Another nice option about this solution is that we can use database tools to optimize it without affecting our code. In the Customer/Orders sample maybe the system analyst would be satisfied enough if the first/last order dates and the days since last order will be calculated once an hour - in this case I could have replaced my view with a materialized view which refreshes once an hour.

For complicated calculated views with large datasets this can make a huge different without any changes to my code.

The Fine Prints

As illustrated above using entities and views we can improve performance and portability of derived properties, but as always there are the fine prints (otherwise where was the fun :-) ):

Object model - to use the strategy illustrated here we have to change our object model and to push our calculated properties into a container class. This is not of my favorite habits to change the object model to solve database issues but in this case I can live with that. It is interesting to
mention that I could try and workaround this limitation in the following way:

Use a secondary table (the @SecondaryTable annotation) which maps to the view
Map the derived properties using insertable=false and updateable=false in the @Column annotation and refer to the secondary table as the hosting table

This will work fine when fetching customers but I will not be able to persist any new customer - since even if all of my properties on the secondary table are immutable when persisting an entity Hibernate will try to create a record with the entity primary key in the secondary table (which is actually a view) resulting in an error. Maybe this can be workaround too using the sqlInsert and sqlUpdate attributes of Hibernate's @Table annotation (org.hibernate.annotations.Table) but by now it is starting to feel more like a hack than a solution. By the way, going back to the general solution described in here, I would expect to have the same problem when trying to persist a new entity if I had made the @OneToOne relation a none optional one. That configuration requires an object and when trying to persist the parent entity Hibernate will complain about a "none-transient entity…." (Everybody has seen this error at least once). If I keep the relation optional I can just leave it null when persisting the entity and Hibernate will ignore it.

We also have to remember that when the properties migrate from within the entity into another entity we have to notice our queries - notice the uses join fetch in my code, otherwise I might have additional round trips to the database (but this is as always with entities associations and Hibernate gives us the tools to tune it).

Use parenthesis - it's a good habit to use parenthesis in your formula. Remember that the formula is injected into the Hibernate generated SQL and parenthesis are sometimes required by the syntax.

Comments

Jeffrey said…

Very clear and useful! Thank you!

June 10, 2011 at 5:08 AM

Alexey Burov said…

Thanks for a post. Documentation about this opportunity is very poor.

July 27, 2011 at 11:24 PM

Anonymous said…

This is a very nice article. Thanks for the clear explanation.

Does it make any sense to use @Transient along with the @Formula annotation?

March 12, 2012 at 2:08 AM

Eyal Lupu said…

Hi
A @Formula annotated property is read only so the @Transient is implied.

Eyal

March 12, 2012 at 3:21 AM

Thanks for your quick reply.

March 12, 2012 at 11:56 PM

PHP guy learning Java said…

Thank you very much. I have just started working on a project that uses hibernate and I've been having a hard time figuring it out. Your article helped me quite a bit, so thanks!

June 27, 2012 at 12:00 PM

Turbo said…

It's a great article. Can you help me, though?

I've managed to get something almost working, but when I use "coalesce" Hibernate assumes that it is a columnname from the super table.

It transforms it into "tablename0_.coalesce" which does not work.

June 24, 2013 at 1:17 AM

My apologies. It turns out that "coalesce (stuff, stuff2)" is turned into "table0_.coalesce (stuff, stuff2)" because of the space after coalesce.

"coalesce(stuff, stuff2)" works as it should.

Darnit, spaces should not matter in this sort of stuff!

June 24, 2013 at 2:04 AM

Unknown said…

This is a very useful article. Thanks

May 11, 2016 at 2:11 PM

Search This Blog

Eyal Lupu Software Blog