Hibernate/JPA Identity Generators
Introduction
As usually it has been a long time since I have last posted in my blog, and even longer (about half a year) since the last time I wrote about Hibernate but finally I have fond the tome for that. This post is about Hibernate standard compatible (TABLE, SEQUENCE, IDENTITY, and AUTO) identity generators: it explains what the identity generators are and illustrated the different considerations need to be taken when choosing identity generation strategy.Environment
- Hibernate - 3.5.6-Final
- PostgreSQL - 8.4
What are Surrogate Keys and Identity Generators?
Each persistent entity must have a primary key by which it can be identified in the database. A common pattern, called surrogate keys, is to make a these surrogate keys totally independent from the application data – usually by automatically generate their values. Hibernate generates and assigns surrogate keys to persistent entities using instances of org.hibernate.id.IdentifierGenerator which are automatically invoked when a transient entity is being persisted. The IdentityGenerator interface is Hibernate's specific implementation but the concept exists in the JPA specification as well which defines the following standard generators (section 11.1.17 in JSR-317):- Table -the persistence provider must assign primary keys for the entity using an underlying database table to ensure uniqueness
- Sequence - specify the use of a database sequence to ensure uniqueness
- Identity - specify the use of a database identity column
- Auto - the persistence provider should pick an appropriate strategy for the particular database
- Performance - few of the generators hit the database for each entity being persisted while others can preallocate identities and hit the database only once in a while
- Portability – in this context there are two aspects to portability
- Code Portability – using identity column generator for an entity class means that this entity cannot be persisted on a database which doesn’t support this feature (there are workarounds for these type of portability issues but using the entity ‘out-of-the-box’ is not always
supported) - Data Portability – while code portability points out the fact that a specific entity may not be portable to all databases, data portability looks into the data. When migrating an existing entity data from one database to another, after solving the code portability issue, one still has to make sure that the data is properly migrated. For example an entity using identity column generation is migrated to a database which doesn’t support identity column. The developer has decided to modify the generation strategy to table HiLo, he must not forget to populate the HiLo table with the correct Hi values bead on the new Lo value – something like: max(existing ids)/new_Lo_value + x (for safety) - more on HiLo later
- Application constraints - sometimes the application logic or the execution environment enforce its own restriction on identity generators. Example: a new application sharing its keys space with a legacy application. Such an application might force the use of triggers to assign database identities to new records (this type of generator is Hibernate specific and not JPA – like the ‘select’ generator)
- Mapping constraints - even the mapping themselves might have effect on our generation strategy. The classic case is a mapping of entities hierarchy using a table per concrete class hierarchy approach which will forbid the use identity column
- Clustering – Some generators (as the Hibernate specific ‘increment’ generator) cannot be safely used in a clustered environment
The @GeneratedValue and @GenericGenerators Annotations
An entity's primary keys generation strategy is set by applying the JPA standard @GeneratedValue annotation to the entity's primary key field or property. The annotation has two elements:- strategy - sets the generation strategy as defined in the javax.persistence.GenerationType enumeration, this can be one of: AUTO, TABLE, SEQUENCE, IDENTITY (more on that below)
- generator - a name of a generator configuration, the configurations are donated using the @TableGenerator, @SequenceGenerator, or @GenericGenerator (Hibernate specific)
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity @Table(name="ENTITY_WITH_ID")
public class EntityWithId {
@Id @GeneratedValue
private Long id;
The generator can be configured using either JPA standard annotations or Hibernate's specific @GenericGenerator annotation. The example bellow illustrates the usage of @GenericGenerator to configure a TableHiloGenerator (donates single identities set to all of the entities mapped using this generator configuration) instance:
import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Table; import org.hibernate.annotations.GenericGenerator; import org.hibernate.annotations.Parameter; @Entity @Table(name="ENTITY_WITH_ID") public class EntityWithId { //I configure a generator and name the configuration as 'table-hilo-generator' @GenericGenerator(name="table-hilo-generator", strategy="org.hibernate.id.TableHiLoGenerator", parameters={@Parameter(value="hibernate_id_generation", name="table")}) //I use the generator configured above @GeneratedValue(generator="table-hilo-generator") @Id private Long id; ... }
The @GenericGenerator annotation is used to create a 'named generator configuration' which can be later referenced using the @GeneratedValue annotation. In the rest of this post I will use the standard JPA annotations to configure the generators and not the @GenericGenerator annotation.
The HiLo Algorithm
Down this post I use the HiLo algorithm for keys generation so before moving on here is a short background of it. The HiLo (High/Low) algorithm knows how to generate unique number series using two values: the high and the low. The high value is used as a base for a series (or range) of numbers, while the size of this series is donated by the low value. A unique series is generated using the following steps:- Load the and atomically increment the high value
- Multiple the high value by the low value (max*low), the result is the first number (lower bound) of the current series
- The last number (higher bound) of the current series is donated by the following calculation: (max*low)+low-1 (other variation is to reduce 1 from the lower bound and to add one at its upper bound but it doesn't really matter as long as we make our boundaries clear)
- When a client needs to obtain a number the next one from the current is used, once the entire series has been exhausted the algorithm goes back to step 1
- Lower bound = 52*32767 = 1,703,884
- Upper bounds = 1,703,884+32,767-1 = 1,736,650
- Lower bound = 53*32,767 = 1,736,651
- Upper bounds = 1,736,651+32,767-1 = 1,769,417
The big advantage of this algorithm is keys preallocation which can dramatically improve performance. Based on the low value we can control the database hit ratio. As illustrated using the 32,767 we hit the database only once in a 32,767 generated keys. The downside (at least by some people - but in my opinion this is a none-issue) is that each time the algorithm restarts it leaves a 'hole' in the keys sequence.
Hibernate has several HiLo based generators:
TableHiLoGenerator
A simple HiLo generator, uses a table to store the HiLo high value. The generator accepts the following parameters- table - the table name, defaults to 'hibernate_unique_key'
- column - the name of the column to store the next high value, defaults to 'next_hi'
- max_low - the low number (the range) defaults to 32,767
(Short.MAX_VALUE)
MultipleHiLoPerTableGenerator
A table HiLo generator which can store multiple key sets (multiple high values each for a different entity). This is useful when we need each entity (or some of the entities) has its own keys range. It supports the following parameters:- table - the table name, default to 'hibernate_sequences'
- primary_key_column - key column name, defaults to 'sequence_name'
- value_column - the name of the column to store the next high value, defaults to 'sequence_next_hi_value'
- primary_key_value - key value for the current entity (or current keys set), default to the entity's primary table name
- primary_key_length - length of the key column in DB represented as a varchar, defaults to 255
- max_low - the low numer (the range) defaults to 32,767 (Short.MAX_VALUE)
The generator uses a single table to store multiple high values (multiple series), when having multiple entities using the same generator Hibernate matches an entity to a high value using the primary_key_value which is usually the entity name. A sample table can look like
sequence_name | sequence_next_high_value |
ENTITY1 | 234 |
ENTITY2 | 876 |
ENTITY3 | 8 |
SequenceHiLoGenerator
A simple HiLo generator but instead of a table uses a sequence as the high value provider.- sequence - the sequence name, defaults to 'hibernate_sequence'
- max_low - the low number (the range) defaults to 9.
The JPA Standard Generators
This section describes Hibernate behavior using the JPA standard strategies (TABLE, SEQUENCE, IDENTITY, and AUTO)GenerationType.TABLE Strategy
Using the TABLE strategy we instruct the JPA provider (Hibernate) to assign primary keys to an entity using an underlying database table to ensure uniqueness. Each JPA provider is free to choose its own approach to meet this requirement - Hibernate's default is to use an instance of org.hibernate.id.MultipleHiLoPerTableGenerator (multiple HiLo sequences in a single table). The generator can be configured using the standard @TableGenerator annotation.import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; import javax.persistence.TableGenerator; ... @Id @GeneratedValue(strategy=GenerationType.TABLE, generator="tbl-gen") @TableGenerator(name="tbl-gen", pkColumnName="ENTITY_TBL_NAME", allocationSize=150, table="GENERATORS") private Long id; ...
In the example above I configure the 'tbl-gen' to store the high value in a table named GENERATORS which has a column named ENTITY_TBL_NAME used as a key to identify the actual sequence I also set the allocation size (the low value) is set to 150.
Pros and cons of this generator:
- Performance -good, as explained above, keys preallocation enables low database hit ratio that can be configured using the low value
- Portability - good, the generator uses very simple SQL statements and has built-in support from the dialects for table locking while updating the low value (Oracle: "select ... for update", SQL Server: "select ... with (updlock, rowlock)", and so on).
- Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
- Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.
GenerationType.SEQUENCE Strategy
When this strategy is used Hibernate uses an instance of org.hibernate.id.SequenceHiLoGenerator as the id generator. This generator can be configured using the @SequenceGenerator annotation. In the sample below I configure the sequence generator to use a sequence named 'MY_SEQ_GEN' and set its initial value to 25 and the HiLo's low value to 12.import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.SequenceGenerator; import javax.persistence.Table; @Entity @Table(name="MY_ENT") public class MyEntity { @Id @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seq-gen") @SequenceGenerator(name="seq-gen", sequenceName="MY_SEQ_GEN", initialValue=25, allocationSize=12 private Long id; ...
Pros and cons of the sequence generator
- Performance -good, as explained above, low database hit ratio and when hitting the database a sequence is very efficient
- Portability - Not that well, it requires a database support (for example Oracle, PostgreSQL but not SQL Server)
- Clusters - good, since this generator protects the value generation logic using the database ACID even if we have multiple VMs accessing the same database we are still guaranteed not to generate duplicate keys.
- Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.
GenerationType.IDENTITY Strategy
This identity generator uses an identity column in the database. This generator is very simple but it has its problems - the most important two are portability and performance.import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity @Table(name="MY_ENT")
public class MyEntity {
@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
private Long id;
...
Pros and cons of this generator:
- Performance - moderate to poor. Since identities are generated by the table for each inserted row there is no identity preallocation - instead Hibernate has to obtain the generated identity for each new entity. If a pre JDBC3 or pre JDK14 is used (or the hibernate.jdbc.use_get_generated_keys property is set to false) Hibernate will have to issue two SQL statements when persisting an entity, the first to insert a row to the database and the other to obtain its identity. This performance penalty can be significant in batch processing.
- Portability - Not that well. A prerequisite for using the identity strategy is that the underlying database will support it (similar to sequences). However I have noticed that Hibernate has an interesting workaround for it: when using PostgreSQL (which has sequences but not identities) Hibernate creates a sequence named <the entity table name>_<id column name>_seq from which it gets identities to the entity (taking the next value each time – notice that Hibernate doesn’t use HiLo in that case) - I haven’t tried it on another database – so I cannot tell how will Hibernate behave.
- Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
- Base class into the BASE table using identity column
- Derived class into the DERIVED table using identity column
GenerationType.AUTO Strategy
Using this strategy the JPA provider decides, based on the underlying database vendor, which is the prefer strategy to use (table, sequence, or identity), it make the code very portable but it has the price of difficult database migration: If a migration is planned from one database vendor to another the keys migration can be very complicated. As an example we can assume a migration from a sequence strategy to a table strategy . It will have to include careful work of building the HiLo base values table which should include:- Identifying the correct keys in the MultiHilo table (the 'sequence' names) - with special care about entity hierarchies
- Calculating the correct high value for each such 'sequence' based on the existing keys and the new low value
Comments
One change that would be helpful is to change the font color of your text. There is not much contrast between your text color and the background which made it harder to read.
I'm using a Macbook Pro, with FireFox.
Thanks again for providing such useful information.
Eyal
I was trying to create a OneToMany association with a JoinTable using a surrogate key for the mapping table and your great blog is a big help with that.
Have you've been thinking on writing a book on Hibernate? Your blog is among the best source for Hibernate on the web.
If you do you've got yourself a tester (for both text and code) :).
Thanks!
David Zonsheine
Read it a few times, still not clear what is happening there.
I am using a SequenceGenerator for a id-column table:
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "BOOKMARK_SEQUENCE_GENERATOR")
@SequenceGenerator(name = "BOOKMARK_SEQUENCE_GENERATOR", sequenceName = "BOOKMARKS_SEQUENCE", allocationSize = 20)
In the ORACLE 11.2 database the LAST_NUMBER of the BOOKMARKS_SEQUENCE is 41 but when I am using Hibernate, the last generated id is 462!
Do you have any idea why the ids of the BOOKMARKS_SEQUENCE and the generated ids are different?
The BOOKMARKS_SEQUENCE is definately used by hibernate, cause if I delete it from the databse an exception is thrown during runtime.
Example:
I can see in the log, that hibernate uses
select BOOKMARKS_SEQUENCE.nextval from dual;
which returns the expected value, e.g. 42.
Later in the table the id is not 42, but 463!
Best Regards
Tobias
The sequence generator is an HiLo implementation (this is just the JPA standard name for Hibernate's SequenceHiLoGenerator ).
Eyal
I have one problem, my java application(using struts and EJB) is using sequence(with cache option) created on oracle database.When i try to insert value from my application using this sequence its not picking the nextval which it suppose to pick and using higher value than nextval for that sequence.
Whether this is code issue or sequence problem from backend?
Please help me on this?
@Id
@SequenceGenerator(name="SEQ_FORM_ID", sequenceName="SEQ_FORM_ID", allocationSize=1, initialValue=1)
@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="SEQ_FORM_ID")
@Column(name="FORM_ID", unique=true, nullable=false)
private long formId;
This works quite well, but it looks like the object that I persist is not updated once the insert happens (ie the generated key is not inserted into the object). That means that, once I've persisted the object, I can't find it again! Any ideas on how I can get the generated value?
It would be interesting to see how these ID generation strategies work in multi-tenant systems.
We are using Hibernate 4 based multi-tenant systems. What we observed that, when we used Increment strategy the ID generation happens across all the multi-tenant schemas.
Need help on following problem. DB-Oracle. Application is in production.
1. WE have used @SequenceGenerator and strategy=GenerationType.SEQUENCE and not specified allocation size(taking default 50), results in huge gaps in the id column.
Want to avoid the gaps with minimum code and database impact.
One solution i.e set allocation size=1 but not sure whether its works correctly on existing database.
Please share your thoughts/inputs on this.
Thanks