Hibernate/JPA Identity Generators

Introduction

As usually it has been a long time since I have last posted in my blog, and even longer (about half a year) since the last time I wrote about Hibernate but finally I have fond the tome for that. This post is about Hibernate standard compatible (TABLE, SEQUENCE, IDENTITY, and AUTO) identity generators: it explains what the identity generators are and illustrated the different considerations need to be taken when choosing identity generation strategy.

Environment
  • Hibernate - 3.5.6-Final
  • PostgreSQL - 8.4

What are Surrogate Keys and Identity Generators?
Each persistent entity must have a primary key by which it can be identified in the database. A common pattern, called surrogate keys, is to make a these surrogate keys totally independent from the application data – usually by automatically generate their values. Hibernate generates and assigns surrogate keys to persistent entities using instances of org.hibernate.id.IdentifierGenerator which are automatically invoked when a transient entity is being persisted. The IdentityGenerator interface is Hibernate's specific implementation but the concept exists in the JPA specification as well which defines the following standard generators (section 11.1.17 in JSR-317):

  • Table -the persistence provider must assign primary keys for the entity using an underlying database table to ensure uniqueness 
  • Sequence - specify the use of a database sequence to ensure uniqueness
  • Identity - specify the use of a database identity column
  • Auto - the persistence provider should pick an appropriate strategy for the particular database
Choosing an identity generation strategy seems to be an easy task but there are few factors which we would like to consider when assigning a generation strategy to entities the most important ones are:

  • Performance - few of the generators hit the database for each entity being persisted while others can preallocate identities and hit the database only once in a while
  • Portability – in this context there are two aspects to portability
    • Code Portability – using identity column generator for an entity class means that this entity cannot be persisted on a database which doesn’t support this feature (there are workarounds for these type of portability issues but using the entity ‘out-of-the-box’ is not always
      supported)
    • Data Portability – while code portability points out the fact that a specific entity may not be portable to all databases, data portability looks into the data. When migrating an existing entity data from one database to another, after solving the code portability issue, one still has to make sure that the data is properly migrated. For example an entity using identity column generation is migrated to a database which doesn’t support identity column. The developer has decided to modify the generation strategy to table HiLo, he must not forget to populate the HiLo table with the correct Hi values bead on the new Lo value – something like: max(existing  ids)/new_Lo_value + x (for safety) - more on HiLo later
  • Application constraints - sometimes the application logic or the execution environment enforce its own restriction on identity generators. Example: a new application sharing its keys space with a legacy application. Such an application might force the use of triggers to assign database identities to new records (this type of generator is Hibernate specific and not JPA – like the ‘select’ generator)
  • Mapping constraints - even the mapping themselves might have effect on our generation strategy. The classic case is a mapping of entities hierarchy using a table per concrete class hierarchy approach which will forbid the use identity column
  • Clustering – Some generators (as the Hibernate specific ‘increment’ generator) cannot be safely used in a clustered environment

The @GeneratedValue and @GenericGenerators Annotations

An entity's primary keys generation strategy is set by applying the JPA standard @GeneratedValue annotation to the entity's primary key field or property. The annotation has two elements:

  • strategy - sets the generation strategy as defined in the javax.persistence.GenerationType enumeration, this can be one of: AUTO, TABLE, SEQUENCE, IDENTITY (more on that below)
  • generator - a name of a generator configuration, the configurations are donated using the @TableGenerator, @SequenceGenerator, or @GenericGenerator (Hibernate specific)

The sample below illustrates a simple usage of the @GeneratedValue annotation using its default (AUTO) strategy.



import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity @Table(name="ENTITY_WITH_ID")
public class EntityWithId {

    @Id @GeneratedValue
    private Long id;


The generator can be configured using either JPA standard annotations or Hibernate's specific @GenericGenerator annotation. The example bellow illustrates the usage of @GenericGenerator to configure a TableHiloGenerator (donates single identities set to all of the entities mapped using this generator configuration) instance:



import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;

import org.hibernate.annotations.GenericGenerator;
import org.hibernate.annotations.Parameter;

@Entity @Table(name="ENTITY_WITH_ID")
public class EntityWithId {
   //I configure a generator and name the configuration as 'table-hilo-generator'
 @GenericGenerator(name="table-hilo-generator", strategy="org.hibernate.id.TableHiLoGenerator",
  parameters={@Parameter(value="hibernate_id_generation", name="table")})

 //I use the generator configured above
 @GeneratedValue(generator="table-hilo-generator") @Id 
 private Long id;
...
}


The @GenericGenerator annotation is used to create a 'named generator configuration' which can be later referenced using the @GeneratedValue annotation. In the rest of this post I will use the standard JPA annotations to configure the generators and not the @GenericGenerator annotation.

The HiLo Algorithm

Down this post I use the HiLo algorithm for keys generation so before moving on here is a short background of it. The HiLo (High/Low) algorithm knows how to generate unique number series using two values: the high and the low. The high value is used as a base for a series (or range) of numbers, while the size of this series is donated by the low value. A unique series is generated using the following steps:
  1. Load the and atomically increment the high value
  2. Multiple the high value by the low value (max*low), the result is the first number (lower bound) of the current series
  3. The last number (higher bound) of the current series is donated by the following calculation: (max*low)+low-1 (other variation is to reduce 1 from the lower bound and to add one at its upper bound but it doesn't really matter as long as we make our boundaries clear)
  4. When a client needs to obtain a number the next one from the current is used, once the entire series has been exhausted the algorithm goes back to step 1
Here is a concrete example: suppose that the current high value in the database is 52 and the low value is configured to be 32,767. When the algorithm starts is loads the high value from the database and increments it in the same transaction (the new high value in the database is now 53). The range of the current numbers series can now be calculated:
  • Lower bound = 52*32767 = 1,703,884
  • Upper bounds = 1,703,884+32,767-1 = 1,736,650
All of the numbers in the range of 1,703,884 to 1,736,650 can be safely allocated to clients, once this keys pool has been exhausted the algorithm needs to access the database again to allocate a new keys pool. This time the high value is 53 (immediately incremented to 54) and the keys range is:
  • Lower bound = 53*32,767 = 1,736,651
  • Upper bounds = 1,736,651+32,767-1 = 1,769,417
And so on

The big advantage of this algorithm is keys preallocation which can dramatically improve performance. Based on the low value we can control the database hit ratio. As illustrated using the 32,767 we hit the database only once in a 32,767 generated keys. The downside (at least by some people - but in my opinion this is a none-issue) is that each time the algorithm restarts it leaves a 'hole' in the keys sequence.

Hibernate has several HiLo based generators:
TableHiLoGenerator
A simple HiLo generator, uses a table to store the HiLo high value. The generator accepts the following parameters
  • table - the table name, defaults to 'hibernate_unique_key'
  • column - the name of the column to store the next high value, defaults to 'next_hi' 
  • max_low - the low number (the range) defaults to 32,767
    (Short.MAX_VALUE)
MultipleHiLoPerTableGenerator
A table HiLo generator which can store multiple key sets (multiple high values each for a different entity). This is useful when we need each entity (or some of the entities) has its own keys range. It supports the following parameters:

  • table - the table name, default to 'hibernate_sequences'
  • primary_key_column - key column name, defaults to 'sequence_name'
  • value_column - the name of the column to store the next high value, defaults to 'sequence_next_hi_value'
  • primary_key_value - key value for the current entity (or current keys set), default to the entity's primary table name
  • primary_key_length - length of the key column in DB represented as a varchar, defaults to 255
  • max_low - the low numer (the range) defaults to 32,767 (Short.MAX_VALUE)
All the defaults above are from the actual generator class - this can be changed depending on the specific configuration method (for example the @TableGenerator annotations configures SequenceHiLoGenerator's max_low to 50 instead of 32,767).


The generator uses a single table to store multiple high values (multiple series), when having multiple entities using the same generator Hibernate matches an entity to a high value using the primary_key_value which is usually the entity name. A sample table can look like

sequence_name sequence_next_high_value
ENTITY1 234
ENTITY2 876
ENTITY3 8
 
SequenceHiLoGenerator
A simple HiLo generator but instead of a table uses a sequence as the high value provider.

  • sequence - the sequence name, defaults to 'hibernate_sequence'
  • max_low - the low number (the range) defaults to 9.
All the defaults above are from the actual generator class - this can be changed depending on the specific configuration method (for example the @SequenceGenerator annotations configures SequenceHiLoGenerator's max_low to 50 instead of 9).

The JPA Standard Generators

This section describes Hibernate behavior using the JPA standard strategies (TABLE, SEQUENCE, IDENTITY, and AUTO)
 
GenerationType.TABLE Strategy
Using the TABLE strategy we instruct the JPA provider (Hibernate) to assign primary keys to an entity using an underlying database table to ensure uniqueness. Each JPA provider is free to choose its own approach to meet this requirement - Hibernate's default is to use an instance of org.hibernate.id.MultipleHiLoPerTableGenerator (multiple HiLo sequences in a single table). The generator can be configured using the standard @TableGenerator annotation.


import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;
import javax.persistence.TableGenerator;

...
 @Id 
 @GeneratedValue(strategy=GenerationType.TABLE, generator="tbl-gen")
 @TableGenerator(name="tbl-gen", 
   pkColumnName="ENTITY_TBL_NAME", allocationSize=150,
   table="GENERATORS")
 private Long id;
...


In the example above I configure the 'tbl-gen' to store the high value in a table named GENERATORS which has a column named ENTITY_TBL_NAME used as a key to identify the actual sequence I also set the allocation size (the low value) is set to 150.


Pros and cons of this generator:

  • Performance -good, as explained above, keys preallocation enables low database hit ratio that can be configured using the low value
  • Portability - good, the generator uses very simple SQL statements and has built-in support from the dialects for table locking while updating the low value (Oracle: "select ... for update", SQL Server: "select ... with (updlock, rowlock)", and so on).
  • Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
  • Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.

GenerationType.SEQUENCE Strategy
When this strategy is used Hibernate uses an instance of org.hibernate.id.SequenceHiLoGenerator as the id generator. This generator can be configured using the @SequenceGenerator annotation. In the sample below I configure the sequence generator to use a sequence named 'MY_SEQ_GEN' and set its initial value to 25 and the HiLo's low value to 12.


import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.SequenceGenerator;
import javax.persistence.Table;

@Entity @Table(name="MY_ENT")

public class MyEntity {
 @Id 
 @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seq-gen")
 @SequenceGenerator(name="seq-gen", sequenceName="MY_SEQ_GEN", initialValue=25, allocationSize=12
 private Long id;
...


Pros and cons of the sequence generator

  • Performance -good, as explained above, low database hit ratio and when hitting the database a sequence is very efficient
  • Portability - Not that well, it requires a database support (for example Oracle, PostgreSQL but not SQL Server)
  • Clusters - good, since this generator protects the value generation logic using the database ACID even if we have multiple VMs accessing the same database we are still guaranteed not to generate duplicate keys.
  • Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.

GenerationType.IDENTITY Strategy
This identity generator uses an identity column in the database. This generator is very simple but it has its problems - the most important two are portability and performance.


import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity @Table(name="MY_ENT")

public class MyEntity {
 @Id 
 @GeneratedValue(strategy=GenerationType.IDENTITY)

 private Long id;
...


Pros and cons of this generator:

  • Performance - moderate to poor. Since identities are generated by the table for each inserted row there is no identity preallocation - instead Hibernate has to obtain the generated identity for each new entity.  If a pre JDBC3 or pre JDK14 is used (or the hibernate.jdbc.use_get_generated_keys property is set to false) Hibernate will have to issue two SQL statements when persisting an entity, the first to insert a row to the database and the other to obtain its identity. This performance penalty can be significant in batch processing.
  • Portability - Not that well. A prerequisite for using the identity strategy is that the underlying database will support it (similar to sequences). However I have noticed that Hibernate has an interesting workaround for it: when using PostgreSQL (which has sequences but not identities) Hibernate creates a sequence named <the entity table name>_<id column name>_seq from which it gets identities to the entity (taking the next value each time – notice that Hibernate doesn’t use HiLo in that case) - I haven’t tried it on another database – so I cannot tell how will Hibernate behave.
  • Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
The IDENTITY strategy has an additional restriction which the other strategies don't have - not all mapping options support it. For example when mapping entities Hierarchy using a concrete table per sub-class (javax.persistence.InheritanceType.TABLE_PER_CLASS) the identity strategy is forbidden. The reason can be easily explained: assuming that we have a base class named Base and a derived class named Derived which are mapped as:
  • Base class into the BASE table using identity column
  • Derived class into the DERIVED table using identity column

If we try to load a base entity using the following code: em.find(Base.class, key) we might locate more than one entity from the Base type with the same id.

GenerationType.AUTO Strategy
Using this strategy the JPA provider decides, based on the underlying database vendor, which is the prefer strategy to use (table, sequence, or identity), it make the code very portable but it has the price of difficult database migration: If a migration is planned from one database vendor to another the keys migration can be very complicated. As an example we can assume a migration from a sequence strategy to a table strategy .  It will have to include careful work of building the HiLo base values table which should include:
  • Identifying the correct keys in the MultiHilo table (the 'sequence' names) - with special care about entity hierarchies
  • Calculating the correct high value for each such 'sequence' based on the existing keys and the new low value

Conclusion

There are few considerations need to be done when choosing a keys generation strategy - in my opinion the most important one are performance and portability, it is also important to remember that Hibernate has many proprietary generators (uuid, guid, select, increment and others) which -even though they are not compatible with the spec - might be useful is some scenarios.

Comments

Anonymous said…
Thanks for writing a very good article on identity in JPA/Hibernate.

One change that would be helpful is to change the font color of your text. There is not much contrast between your text color and the background which made it harder to read.

I'm using a Macbook Pro, with FireFox.

Thanks again for providing such useful information.
Eyal Lupu said…
Thanks cor the comment - I have made some CSS changes - I hope it is easier to read now.

Eyal
David said…
Hi Eyal,

I was trying to create a OneToMany association with a JoinTable using a surrogate key for the mapping table and your great blog is a big help with that.

Have you've been thinking on writing a book on Hibernate? Your blog is among the best source for Hibernate on the web.

If you do you've got yourself a tester (for both text and code) :).

Thanks!

David Zonsheine
Anonymous said…
Your explanation about MultipleHiLoPerTableGenerator is not clear.

Read it a few times, still not clear what is happening there.
Anonymous said…
It's Really nice and clear to novice..
Anonymous said…
Thanks for your detailed explanation, very helpful. However I suspect one point to be possibly misleading for readers: You choose 32767 as the "low value" - this is also the hibernate default for max_lo, see for example here. However with this value for max_lo the range between upper and lower bound is 32768 (max_lo + 1) and not 32767, see also here
Tobias Stening said…
Nice blog and thank you for the detailed information. I have a question about generated ids.

I am using a SequenceGenerator for a id-column table:

@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "BOOKMARK_SEQUENCE_GENERATOR")
@SequenceGenerator(name = "BOOKMARK_SEQUENCE_GENERATOR", sequenceName = "BOOKMARKS_SEQUENCE", allocationSize = 20)

In the ORACLE 11.2 database the LAST_NUMBER of the BOOKMARKS_SEQUENCE is 41 but when I am using Hibernate, the last generated id is 462!

Do you have any idea why the ids of the BOOKMARKS_SEQUENCE and the generated ids are different?

The BOOKMARKS_SEQUENCE is definately used by hibernate, cause if I delete it from the databse an exception is thrown during runtime.

Example:

I can see in the log, that hibernate uses

select BOOKMARKS_SEQUENCE.nextval from dual;

which returns the expected value, e.g. 42.

Later in the table the id is not 42, but 463!

Best Regards
Tobias
Eyal Lupu said…
Hi,
The sequence generator is an HiLo implementation (this is just the JPA standard name for Hibernate's SequenceHiLoGenerator ).


Eyal
Unknown said…
Thanks for this post--helped me set up my ID generation scheme in a JPA/Hibernate 4 app.
ORACLE PL/SQL said…
Hi,

I have one problem, my java application(using struts and EJB) is using sequence(with cache option) created on oracle database.When i try to insert value from my application using this sequence its not picking the nextval which it suppose to pick and using higher value than nextval for that sequence.

Whether this is code issue or sequence problem from backend?

Please help me on this?
Balaji Reddy said…
This comment has been removed by the author.
Balaji Reddy said…
Thanks for writing good article. I feel the content is good but English has to be improved a bit.
Anonymous said…
Thanks for such a nice post on migration of DB for java web.
Unknown said…
Excellent and a dedicated effort. thanks very much for the post.
Anonymous said…
And I have a problem. I'm using Oracle sequences to generate the keys for a table:

@Id
@SequenceGenerator(name="SEQ_FORM_ID", sequenceName="SEQ_FORM_ID", allocationSize=1, initialValue=1)
@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="SEQ_FORM_ID")
@Column(name="FORM_ID", unique=true, nullable=false)
private long formId;

This works quite well, but it looks like the object that I persist is not updated once the insert happens (ie the generated key is not inserted into the object). That means that, once I've persisted the object, I can't find it again! Any ideas on how I can get the generated value?
Kumaraguru said…
A very nice post even a beginner can understand :)
Syam said…
Nice topic Eyal.Could you plz let me know how to specify @GeneratedValue(strategy=GenerationType.TABLE, generator="tbl-gen") for composite primary keys(ie @EmbeddedId in Entity class)
This is really good article.
It would be interesting to see how these ID generation strategies work in multi-tenant systems.
We are using Hibernate 4 based multi-tenant systems. What we observed that, when we used Increment strategy the ID generation happens across all the multi-tenant schemas.
Well, I would say that There are some considerations need to be fulfill while choosing a keys generation strategy and the most important one are performance and portability.
Anonymous said…
Hi Eyal, really great and helpful article.
Need help on following problem. DB-Oracle. Application is in production.
1. WE have used @SequenceGenerator and strategy=GenerationType.SEQUENCE and not specified allocation size(taking default 50), results in huge gaps in the id column.
Want to avoid the gaps with minimum code and database impact.
One solution i.e set allocation size=1 but not sure whether its works correctly on existing database.
Please share your thoughts/inputs on this.

Thanks

Popular posts from this blog

New in Spring MVC 3.1: CSRF Protection using RequestDataValueProcessor

Hibernate Exception - Simultaneously Fetch Multiple Bags

Hibernate Derived Properties - Performance and Portability