Native enums or CHECK constraints in PostgreSQL?
Recently, we had a discussion about whether we should use native enums in PostgreSQL, or rely on regular string columns with CHECK
constraints. In the end, we decided that we wanted to go with the latter.
Tag along if you want to learn why.
Native enums in PostgreSQL
Native enums in PostgreSQL are full-blown types. You declare a column as an enum the same way you would declare a column of any built-in type:
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
name text,
current_mood mood
);
They come with ordering (from the order in which the values were declared), type safety (you cannot compare two values coming from different enums, even if their string or numerical representations are the same), and space efficiency (they are stored in the tuples as references to the actual enum values that exist in the catalogue tables). Although enums are intended for static sets of values, you can add values to the type and rename existing values. But enums also come with some limitations: for example, you cannot remove an existing value from an enum. To do that, you need to create a new enum in the form you want it to have, and then change all columns to use that new type (well, technically there are alternatives, see below).
Creating a new enum and swapping existing columns to use the new type can be tricky. For the most straightforward cases, you will need something like this:
-- Rename the original enum type.
ALTER TYPE mood RENAME TO mood_old;
-- Create the new enum.
CREATE TYPE mood AS ENUM ('sad', 'happy');
-- Perform data migration to remove the enum value that will be dropped.
-- Update existing columns to use the new enum.
ALTER TABLE person ALTER COLUMN current_mood
TYPE mood USING current_mood::text::mood;
-- Remove the old enum.
DROP TYPE mood_old;
However, depending on the size of the table, the ALTER TABLE
command can have serious performance implications, because it acquires an ACCESS EXLUSIVE
lock. This lock is the most restrictive of all locks in PostgreSQL: your transaction is the only transaction that can access that table while it exists. Once it has that lock, it will perform a full table scan to migrate the data and ensure it conforms to the new type. This could make this alternative unfeasible to your application.
String columns with CHECK
constraints
String columns with CHECK
constraints keep the most important property we want with enums: we can enforce data correctness in the database. It also comes with more flexibility: updating a CHECK
constraint in the most complex cases is more manageable. And because you use the same method in all cases of updating the constraint, you don't have to remember multiple options. But it does come with one big downside: it is less space efficient. Because the actual values are stored in the tuples themselves, and not just a reference to the values, it can potentially take a lot of disk space.
To update a CHECK
constraint, we do something like this:
CREATE TABLE person (
name text,
current_mood text CHECK (current_mood IN ('sad', 'ok', 'happy'))
);
-- Drop the original `CHECK` constraint.
-- The `person_current_mood_check` name is autogenerated by the database.
ALTER TABLE person
DROP CONSTRAINT person_current_mood_check;
-- Perform any data migrations that may be necessary to conform with the new
-- constraint.
-- Create the new one, but in a way that doesn't lock anything.
-- Note the `NOT VALID` syntax.
ALTER TABLE person
ADD CONSTRAINT person_current_mood_check
CHECK (current_mood IN ('sad', 'happy'))
NOT VALID;
-- Finally, tell the database to make sure all rows are good.
ALTER TABLE person
VALIDATE CONSTRAINT person_current_mood_check;
Let's go through each of these commands in more detail:
- Drop the previous
CHECK
constraint. This is aO(1)
operation. Nice and quick! - Migrate the data, to make it conform to the new constraint you are going to create. (If you are just dropping a value from the set of possible values, this can also be done before dropping the original constraint.)
- Create the constraint in the new form you need, but with
NOT VALID
. This is also anO(1)
operation: the constraint will not be enforced for existing rows, but it will be enforced for rows being created or updated. - Run
VALIDATE CONSTRAINT
to make sure all rows are good. The validation command acquires a more permissive lock, theSHARE UPDATE EXCLUSIVE
lock, which allows concurrent updates to the table: basically, only schema changes and vacuum operations are blocked while validating aCHECK
constraint.
Native enums: Alternatives to update without locking
As I mentioned above, there are alternatives to sidestep the locking of the database when you want to update an enum:
- You can alter the catalogue tables directly. However, by doing that, you will have to make sure data is consistent yourself instead of relying on the database for that, and there's always a chance of corrupting the database.
- Removing a value from an enum requires more care:
- Migrate the data.
- Add a constraint (with
NOT VALID
) requiring that the value that you want to drop from the enum is not used. - Validate the constraint.
- Run
REINDEX CONCURRENTLY
for all indexes that use the enum.- This is required because the enum value you are dropping may still be used internally in the index as a decision node to help the database know where to go when looking for a value. The internal functions that compare enums wouldn't know what to do when they find the dropped value.
- Remove the value from the catalogue tables directly.
- Drop the constraint.
However, these methods are too involved for most use cases, and they do carry some risk of corrupting the database, since you are directly modifying tables that are supposed to be internal.
Conclusion
Because of not having to fully lock down the database for the migration of the CHECK
constraint, and the relatively small disadvantages of the update procedure being a little more elaborate even in the simpler cases, and the usage of space being less efficient, we've decided to go with CHECK
constraints instead of native enums in PostgreSQL.