Hibernate/JPA Identity Generators
>> Friday, January 28, 2011
Introduction
As usually it has been a long time since I have last posted in my blog, and even longer (about half a year) since the last time I wrote about Hibernate but finally I have fond the tome for that. This post is about Hibernate standard compatible (TABLE, SEQUENCE, IDENTITY, and AUTO) identity generators: it explains what the identity generators are and illustrated the different considerations need to be taken when choosing identity generation strategy.Environment
- Hibernate - 3.5.6-Final
- PostgreSQL - 8.4
What are Surrogate Keys and Identity Generators?
Each persistent entity must have a primary key by which it can be identified in the database. A common pattern, called surrogate keys, is to make a these surrogate keys totally independent from the application data – usually by automatically generate their values. Hibernate generates and assigns surrogate keys to persistent entities using instances of org.hibernate.id.IdentifierGenerator which are automatically invoked when a transient entity is being persisted. The IdentityGenerator interface is Hibernate's specific implementation but the concept exists in the JPA specification as well which defines the following standard generators (section 11.1.17 in JSR-317):- Table -the persistence provider must assign primary keys for the entity using an underlying database table to ensure uniqueness
- Sequence - specify the use of a database sequence to ensure uniqueness
- Identity - specify the use of a database identity column
- Auto - the persistence provider should pick an appropriate strategy for the particular database
- Performance - few of the generators hit the database for each entity being persisted while others can preallocate identities and hit the database only once in a while
- Portability – in this context there are two aspects to portability
- Code Portability – using identity column generator for an entity class means that this entity cannot be persisted on a database which doesn’t support this feature (there are workarounds for these type of portability issues but using the entity ‘out-of-the-box’ is not always
supported) - Data Portability – while code portability points out the fact that a specific entity may not be portable to all databases, data portability looks into the data. When migrating an existing entity data from one database to another, after solving the code portability issue, one still has to make sure that the data is properly migrated. For example an entity using identity column generation is migrated to a database which doesn’t support identity column. The developer has decided to modify the generation strategy to table HiLo, he must not forget to populate the HiLo table with the correct Hi values bead on the new Lo value – something like: max(existing ids)/new_Lo_value + x (for safety) - more on HiLo later
- Application constraints - sometimes the application logic or the execution environment enforce its own restriction on identity generators. Example: a new application sharing its keys space with a legacy application. Such an application might force the use of triggers to assign database identities to new records (this type of generator is Hibernate specific and not JPA – like the ‘select’ generator)
- Mapping constraints - even the mapping themselves might have effect on our generation strategy. The classic case is a mapping of entities hierarchy using a table per concrete class hierarchy approach which will forbid the use identity column
- Clustering – Some generators (as the Hibernate specific ‘increment’ generator) cannot be safely used in a clustered environment
The @GeneratedValue and @GenericGenerators Annotations
An entity's primary keys generation strategy is set by applying the JPA standard @GeneratedValue annotation to the entity's primary key field or property. The annotation has two elements:- strategy - sets the generation strategy as defined in the javax.persistence.GenerationType enumeration, this can be one of: AUTO, TABLE, SEQUENCE, IDENTITY (more on that below)
- generator - a name of a generator configuration, the configurations are donated using the @TableGenerator, @SequenceGenerator, or @GenericGenerator (Hibernate specific)
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity @Table(name="ENTITY_WITH_ID")
public class EntityWithId {
@Id @GeneratedValue
private Long id;The generator can be configured using either JPA standard annotations or Hibernate's specific @GenericGenerator annotation. The example bellow illustrates the usage of @GenericGenerator to configure a TableHiloGenerator (donates single identities set to all of the entities mapped using this generator configuration) instance:
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
import org.hibernate.annotations.GenericGenerator;
import org.hibernate.annotations.Parameter;
@Entity @Table(name="ENTITY_WITH_ID")
public class EntityWithId {
//I configure a generator and name the configuration as 'table-hilo-generator'
@GenericGenerator(name="table-hilo-generator", strategy="org.hibernate.id.TableHiLoGenerator",
parameters={@Parameter(value="hibernate_id_generation", name="table")})
//I use the generator configured above
@GeneratedValue(generator="table-hilo-generator") @Id
private Long id;
...
}The @GenericGenerator annotation is used to create a 'named generator configuration' which can be later referenced using the @GeneratedValue annotation. In the rest of this post I will use the standard JPA annotations to configure the generators and not the @GenericGenerator annotation.
The HiLo Algorithm
Down this post I use the HiLo algorithm for keys generation so before moving on here is a short background of it. The HiLo (High/Low) algorithm knows how to generate unique number series using two values: the high and the low. The high value is used as a base for a series (or range) of numbers, while the size of this series is donated by the low value. A unique series is generated using the following steps:- Load the and atomically increment the high value
- Multiple the high value by the low value (max*low), the result is the first number (lower bound) of the current series
- The last number (higher bound) of the current series is donated by the following calculation: (max*low)+low-1 (other variation is to reduce 1 from the lower bound and to add one at its upper bound but it doesn't really matter as long as we make our boundaries clear)
- When a client needs to obtain a number the next one from the current is used, once the entire series has been exhausted the algorithm goes back to step 1
- Lower bound = 52*32767 = 1,703,884
- Upper bounds = 1,703,884+32,767-1 = 1,736,650
- Lower bound = 53*32,767 = 1,736,651
- Upper bounds = 1,736,651+32,767-1 = 1,769,417
The big advantage of this algorithm is keys preallocation which can dramatically improve performance. Based on the low value we can control the database hit ratio. As illustrated using the 32,767 we hit the database only once in a 32,767 generated keys. The downside (at least by some people - but in my opinion this is a none-issue) is that each time the algorithm restarts it leaves a 'hole' in the keys sequence.
Hibernate has several HiLo based generators:
TableHiLoGenerator
A simple HiLo generator, uses a table to store the HiLo high value. The generator accepts the following parameters- table - the table name, defaults to 'hibernate_unique_key'
- column - the name of the column to store the next high value, defaults to 'next_hi'
- max_low - the low number (the range) defaults to 32,767
(Short.MAX_VALUE)
MultipleHiLoPerTableGenerator
A table HiLo generator which can store multiple key sets (multiple high values each for a different entity). This is useful when we need each entity (or some of the entities) has its own keys range. It supports the following parameters:- table - the table name, default to 'hibernate_sequences'
- primary_key_column - key column name, defaults to 'sequence_name'
- value_column - the name of the column to store the next high value, defaults to 'sequence_next_hi_value'
- primary_key_value - key value for the current entity (or current keys set), default to the entity's primary table name
- primary_key_length - length of the key column in DB represented as a varchar, defaults to 255
- max_low - the low numer (the range) defaults to 32,767 (Short.MAX_VALUE)
The generator uses a single table to store multiple high values (multiple series), when having multiple entities using the same generator Hibernate matches an entity to a high value using the primary_key_value which is usually the entity name. A sample table can look like
| sequence_name | sequence_next_high_value |
| ENTITY1 | 234 |
| ENTITY2 | 876 |
| ENTITY3 | 8 |
SequenceHiLoGenerator
A simple HiLo generator but instead of a table uses a sequence as the high value provider.- sequence - the sequence name, defaults to 'hibernate_sequence'
- max_low - the low number (the range) defaults to 9.
The JPA Standard Generators
This section describes Hibernate behavior using the JPA standard strategies (TABLE, SEQUENCE, IDENTITY, and AUTO)GenerationType.TABLE Strategy
Using the TABLE strategy we instruct the JPA provider (Hibernate) to assign primary keys to an entity using an underlying database table to ensure uniqueness. Each JPA provider is free to choose its own approach to meet this requirement - Hibernate's default is to use an instance of org.hibernate.id.MultipleHiLoPerTableGenerator (multiple HiLo sequences in a single table). The generator can be configured using the standard @TableGenerator annotation.import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; import javax.persistence.TableGenerator; ... @Id @GeneratedValue(strategy=GenerationType.TABLE, generator="tbl-gen") @TableGenerator(name="tbl-gen", pkColumnName="ENTITY_TBL_NAME", allocationSize=150, table="GENERATORS") private Long id; ...
In the example above I configure the 'tbl-gen' to store the high value in a table named GENERATORS which has a column named ENTITY_TBL_NAME used as a key to identify the actual sequence I also set the allocation size (the low value) is set to 150.
Pros and cons of this generator:
- Performance -good, as explained above, keys preallocation enables low database hit ratio that can be configured using the low value
- Portability - good, the generator uses very simple SQL statements and has built-in support from the dialects for table locking while updating the low value (Oracle: "select ... for update", SQL Server: "select ... with (updlock, rowlock)", and so on).
- Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
- Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.
GenerationType.SEQUENCE Strategy
When this strategy is used Hibernate uses an instance of org.hibernate.id.SequenceHiLoGenerator as the id generator. This generator can be configured using the @SequenceGenerator annotation. In the sample below I configure the sequence generator to use a sequence named 'MY_SEQ_GEN' and set its initial value to 25 and the HiLo's low value to 12.import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.SequenceGenerator;
import javax.persistence.Table;
@Entity @Table(name="MY_ENT")
public class MyEntity {
@Id
@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seq-gen")
@SequenceGenerator(name="seq-gen", sequenceName="MY_SEQ_GEN", initialValue=25, allocationSize=12
private Long id;
...Pros and cons of the sequence generator
- Performance -good, as explained above, low database hit ratio and when hitting the database a sequence is very efficient
- Portability - Not that well, it requires a database support (for example Oracle, PostgreSQL but not SQL Server)
- Clusters - good, since this generator protects the value generation logic using the database ACID even if we have multiple VMs accessing the same database we are still guaranteed not to generate duplicate keys.
- Cross database - none, this implementation of the HiLo algorithm generates identifies that are unique only in a particular database.
GenerationType.IDENTITY Strategy
This identity generator uses an identity column in the database. This generator is very simple but it has its problems - the most important two are portability and performance.import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity @Table(name="MY_ENT")
public class MyEntity {
@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
private Long id;
...Pros and cons of this generator:
- Performance - moderate to poor. Since identities are generated by the table for each inserted row there is no identity preallocation - instead Hibernate has to obtain the generated identity for each new entity. If a pre JDBC3 or pre JDK14 is used (or the hibernate.jdbc.use_get_generated_keys property is set to false) Hibernate will have to issue two SQL statements when persisting an entity, the first to insert a row to the database and the other to obtain its identity. This performance penalty can be significant in batch processing.
- Portability - Not that well. A prerequisite for using the identity strategy is that the underlying database will support it (similar to sequences). However I have noticed that Hibernate has an interesting workaround for it: when using PostgreSQL (which has sequences but not identities) Hibernate creates a sequence named <the entity table name>_<id column name>_seq from which it gets identities to the entity (taking the next value each time – notice that Hibernate doesn’t use HiLo in that case) - I haven’t tried it on another database – so I cannot tell how will Hibernate behave.
- Clusters - good, since this generator protects the value generation logic using the database ACID even if multiple VMs access the same database we are still guaranteed not to generate duplicate keys.
- Base class into the BASE table using identity column
- Derived class into the DERIVED table using identity column
GenerationType.AUTO Strategy
Using this strategy the JPA provider decides, based on the underlying database vendor, which is the prefer strategy to use (table, sequence, or identity), it make the code very portable but it has the price of difficult database migration: If a migration is planned from one database vendor to another the keys migration can be very complicated. As an example we can assume a migration from a sequence strategy to a table strategy . It will have to include careful work of building the HiLo base values table which should include:- Identifying the correct keys in the MultiHilo table (the 'sequence' names) - with special care about entity hierarchies
- Calculating the correct high value for each such 'sequence' based on the existing keys and the new low value
Subscribe via Email
7 comments:
Thanks for writing a very good article on identity in JPA/Hibernate.
One change that would be helpful is to change the font color of your text. There is not much contrast between your text color and the background which made it harder to read.
I'm using a Macbook Pro, with FireFox.
Thanks again for providing such useful information.
Thanks cor the comment - I have made some CSS changes - I hope it is easier to read now.
Eyal
Thumbs up!
Hi Eyal,
I was trying to create a OneToMany association with a JoinTable using a surrogate key for the mapping table and your great blog is a big help with that.
Have you've been thinking on writing a book on Hibernate? Your blog is among the best source for Hibernate on the web.
If you do you've got yourself a tester (for both text and code) :).
Thanks!
David Zonsheine
Your explanation about MultipleHiLoPerTableGenerator is not clear.
Read it a few times, still not clear what is happening there.
It's Really nice and clear to novice..
Thanks for your detailed explanation, very helpful. However I suspect one point to be possibly misleading for readers: You choose 32767 as the "low value" - this is also the hibernate default for max_lo, see for example here. However with this value for max_lo the range between upper and lower bound is 32768 (max_lo + 1) and not 32767, see also here
Post a Comment