When might we need @Id generation strategy in Spring Boot?

If we are using Spring Boot Data JPA and have @Entity models in our application, we need to have an id field of some kind in the model and @Id annotation added to it, otherwise the IDE will complain as there will be no primary key for the entity (even if you have a field named _id_).

IDE complaining as we have not defined a primary key

As you may notice, errors are gone as soon as we add @Id annotation from jakarta.persistence package to the field, and we are not required to specify the ID generation strategy.

IDE does not display any more errors after marking the field as a primary key

That’s because, if no specific strategy is provided, Spring Data JPA automatically applies GenerationType.AUTO. But, we might want to take a look at the available options.

Why do we care about ID generation?

Choosing the right ID generation strategy matters, because it can have impact on performance, scalability, and database compatibility. And each of them is (usually) important for our application.

What are the available strategies?

There are 4 strategies:

GenerationType.AUTO

@Entity
public class Blog {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
}

GenerationType.IDENTITY

@Entity
public class Blog {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
}

GenerationType.SEQUENCE

@Entity
public class Blog {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "blogSeqGen")
    @SequenceGenerator(name = "blogSeqGen", sequenceName = "blog_sequence", allocationSize = 100)
    private Long id;
}

GenerationType.TABLE

@Entity
public class Blog {
    @Id
    @GeneratedValue(strategy = GenerationType.TABLE)
    private Long id;
}

What happens if I don’t choose a strategy myself?

JPA defaults to AUTO, meaning that the persistence provider decides what to use, based on the database dialect (which is tied to the vendor).

This also means that dialect must match your database, as Hibernate relies upon it to decide how to map strategies.

Exactly why is GenerationType.SEQUENCE more efficient than GenerationType.IDENTITY for batch operations?

We find that it has to do with how the IDs are allocated and when inserts happen.

How does pooled sequence work for GenerationType.SEQUENCE?

It turns out that Hibernate can apply further optimizations to SEQUENCE that reduce the number of roundabout trips to DB to get new ID values, called optimizers. They determine how Hibernate handles the ID range fetched before via allocationSize.

No pooling

When allocationSize = 1, no pooling happens, meaning that each time a new ID is needed, nextval() is being called on the sequence in the DB, every insert = round trip.

Pooled-lo optimizer

When allocationSize is greater than 1, poolod-lo optimizer is used by default. In the example below, Hibernate fetches a single value from the database sequence — for example 10. Then it internally calculates a block of IDs using that value as a base Final ID = hi * allocationSize + lo, in memory, without touching the DB (so, 10 * 50 = 500 up to 549). After the block is used up, next value will be fetched (11 in our case) and process will be repeated (550 up to 559).

@SequenceGenerator(
    name = "blogSeqGen", 
    sequenceName = "blog_sequence", 
    allocationSize = 50
)

One note from personal experience here, if your business logic somehow relies on ID values being continuous, you might need to set _allocationSize_ to 1, because in other cases, if the app crashes midway for some reason, unused IDs in the current block will get lost and you will have a gap between ID values.

Pooled optimizer

@GenericGenerator(
    name = "blogSeqGen",
    strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
    parameters = {
        @Parameter(name = "sequence_name", value = "blog_sequence"),
        @Parameter(name = "optimizer", value = "pooled"),
        @Parameter(name = "increment_size", value = "50")
    }
)

The explicit pooled optimizer works similarly to pooled-lo, but instead of keeping the hi value in Hibernate’s memory, it is stored directly in the database sequence, and increased by allocationSize on every fetch.

For example, when querying the database for a value, it returns 1. While the ID range 1 + allocationSize will be stored in memory and used as needed, database will increment its sequence by allocationSize, so next time we query, it will return 1 + allocationSize (51, following the above example). No math calculations, Hibernate just reads the sequence value and treats it as the start of the next block.

This works especially well if we have several instances and need to use the same sequence without overlapping values. However, be careful, because for this optimizer to work, it should be manually configured in the database to increment by the same allocationSize you define in your entity. If they don’t match, you will end up with gaps or collisions.

How is post-insertion fetching of ID in GenerationType.IDENTITY approach beneficial at all?

In fact, this design choice was deliberate, giving the following advantages:

What happens if I select a strategy, but my database does not support it?

Interestingly, several things can happen depending on the strategy we have chosen.

If GenerationType.TABLE is so slow, why does it still exist?

Seems that reasons are mostly historical and technical as well. TABLE was designed to ensure portability across databases that did not support sequences, and didn’t have true IDENTITY columns either, like early MySQL. And since it just uses a regular table to emulate a sequence, it’s a good fallback that works everywhere (in theory).

There are several cases where it can be especially useful:

We can think of it like a spare tire.

How do I pick the right strategy?

Maybe start by answering the following questions:

… and then go deeper with carefully considering the pros and cons above. There is not a one answer, there are different circumstances.

Catch you in the next blog ✨