Tightening Constraints in Postgres

Imagine we have a user table, and users have emails.

1
create extension if not exists "uuid-ossp";
2
3
create table "user" (
4
id uuid primary key default uuid_generate_v4(),
5
email text not null
6
constraint is_valid_email_address
7
check (
8
email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
9
)
10
);

After a while, we realize our constraint was too permissive, and that invalid emails exist in our data.

1
insert into "user" (id, email)
2
values ('4e8a9373-bdef-4266-92e2-54b6b22fa970', 'john..rockefeller@standard-oil.com');

We can fix this like this.

Step 1. Add a new not valid constraint

When we mark a constraint as not valid, it is checked against new inserts and updates, but not upon constraint creation.

1
alter table "user"
2
add constraint is_valid_email_address_v2
3
check (
4
email ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'
5
)
6
not valid;

1
insert into "user" (email)
2
values ('henry..flagler@standard-oil.com');
3
-- ERROR: new row for relation "user" violates check constraint "is_valid_email_address_v2"

Step 2. Find invalid rows and fix them

1
select id, email
2
from "user"
3
where not (
4
email ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'
5
);

1
update "user"
2
set email = 'john.rockefeller@standard-oil.com'
3
where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';

Now, all data is valid.

Step 3. Validate the constraint

1
alter table "user"
2
validate constraint is_valid_email_address_v2;

Now we know for sure all rows in the table adhere to the constraint.

Step 4. Clean up the constraints

1
alter table "user"
2
drop constraint is_valid_email_address;
3
4
alter table "user"
5
rename constraint is_valid_email_address_v2
6
to is_valid_email_address;

Problem: After step 1, invalid rows are uneditable

The previous solution is great, but it has a problem. While the not valid constraint is in place, we can’t update other columns on existing, invalid rows, where they remain in an invalid state.

After step 1, we would get this error.

1
update "user" set id = id;
2
ERROR: new row for relation "user" violates check constraint "is_valid_email_address_v2"

And we’re not trying to edit email!

An alternate approach, that doesn’t have this problem, is to add a trigger initially, rather than a not valid constraint.

1
-- ✅ blocks new rows
2
insert into "user" (email)
3
values ('henry..flagler@standard-oil.com');
4
-- ERROR: invalid email: henry..flagler@standard-oil.com
5
6
-- ✅ allows updating of existing rows
7
update "user" set id = id;
8
-- UPDATE 1
9
10
-- ✅ allows us to fix the data
11
update "user" set email = 'john.rockefeller@standard-oil.com' where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';
12
UPDATE 1
13
14
-- ✅ does not allow us to break it again
15
update "user" set email = 'john..rockefeller@standard-oil.com' where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';
16
-- ERROR: invalid email: john..rockefeller@standard-oil.com

We can use this to validate new data, as we root out the invalid data.

Once we’ve fixed the data though, we should move to a constraint over a trigger, as it’s declarative and simpler.

”not valid” constraints work for domains as well

The same situation we’ve explored here applies to domains as well.

We can add a not valid constraint to a domain.

1
alter domain email
2
add constraint is_valid_email_address_v2
3
check (
4
value ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'
5
)
6
not valid;

And we can validate it.

1
alter domain email
2
validate constraint is_valid_email_address_v2;

If that succeeds, we can drop the original, weaker constraint.

1
alter domain email drop constraint is_valid_email_address;
2
alter domain email rename constraint is_valid_email_address_v2 to is_valid_email_address;

Wow! You read the whole thing. People who make it this far sometimes want to receive emails when I post something new.

I also have an RSS feed.