Diffing Tables in PostgreSQL

I sometimes have to diff two tables in PostgreSQL. My time-honored way is to use except:

First, I'll create some temporary tables to play with. Note the use of table as a command that's syntactic sugar for select * from.

create temporary table t1 (a int not null, b int not null);
create temporary table t2 (a int not null, b int not null);
insert into t1 (a, b) values (1, 2), (3, 4), (5, 6);
insert into t2 (a, b) values         (3, 4), (5, 6), (7, 8);

table t1;
┌───┬───┐
│ a │ b │
├───┼───┤
│ 1 │ 2 │
│ 3 │ 4 │
│ 5 │ 6 │
└───┴───┘
(3 rows)

table t2;
┌───┬───┐
│ a │ b │
├───┼───┤
│ 3 │ 4 │
│ 5 │ 6 │
│ 7 │ 8 │
└───┴───┘
(3 rows)

Here's how I would usually diff the tables: use except to get the rows in t1 that were not in t2, and then again to get the rows in t2 that were not in t1.

using except.

table t1 except table t2;
┌───┬───┐
│ a │ b │
├───┼───┤
│ 1 │ 2 │
└───┴───┘
(1 row)

table t2 except table t1;
┌───┬───┐
│ a │ b │
├───┼───┤
│ 7 │ 8 │
└───┴───┘
(1 row)

Some tips at The jooq blog go way further. First, why not combine our efforts using union?

(table t1 except table t2)
union
(table t2 except table t1);
┌───┬───┐
│ a │ b │
├───┼───┤
│ 7 │ 8 │
│ 1 │ 2 │
└───┴───┘
(2 rows)

Getting fancier, it might be nice to order the returned rows:

(table t1 except table t2)
union
(table t2 except table t1)
order by a, b;
┌───┬───┐
│ a │ b │
├───┼───┤
│ 1 │ 2 │
│ 7 │ 8 │
└───┴───┘
(2 rows)

The jooq blog shows us something fancier using natural full join. It's fancier in two ways: 1) it uses natural full join to join on any identically-named columns from both tables (in this case, all of the columns, because our test tables have the same definitions), and 2) it only does a full table scan of t1 once and t2 once, rather than twice for each table as with the previous methods we were using.

First, let's look at the result of a natural full join where we purposefully add a column specific to each table we are comparing:

select * from
(select t1.*, 't1' as tbl_t1 from t1) as t1
natural full join
(select t2.*, 't2' as tbl_t2 from t2) as t2;
┌───┬───┬────────┬────────┐
│ a │ b │ tbl_t1 │ tbl_t2 │
├───┼───┼────────┼────────┤
│ 1 │ 2 │ t1     │ [NULL] │
│ 3 │ 4 │ t1     │ t2     │
│ 5 │ 6 │ t1     │ t2     │
│ 7 │ 8 │ [NULL] │ t2     │
└───┴───┴────────┴────────┘
(4 rows)

Nifty. Of course, we only want the rows that were different, so we would do:

select * from
(select t1.*, 't1' as tbl_t1 from t1) as t1
natural full join
(select t2.*, 't2' as tbl_t2 from t2) as t2
where tbl_t1 is null or tbl_t2 is null
order by a, b;
┌───┬───┬────────┬────────┐
│ a │ b │ tbl_t1 │ tbl_t2 │
├───┼───┼────────┼────────┤
│ 1 │ 2 │ t1     │ [NULL] │
│ 7 │ 8 │ [NULL] │ t2     │
└───┴───┴────────┴────────┘
(4 rows)

Nice. But BE WARE: once your tables have nulls in them, except will work more reliably, because natural full join will not join identical rows in both tables when those identical rows contain nulls; because null does not equal null. except will still work, though.