PostgreSQL 9.4 Vacuum Test

I'm going to assume that you have PostgreSQL 9.4.5 already on your system, perhaps even custom-compiled, as shown on this website.

Making PostgreSQL 9.4 Vacuuming Keep Up with Heavy Updates

28 Dec 2015

Let's say PostgreSQL has a table of 100 million rows that gets heavily updated. My experience has shown that default vacuum settings don't work very well for such a table; the dead tuples accumulate very quickly, and autovacuum doesn't kick in soon enough. Manual vacuuming, kicked off by cron, is the only way to go.

But this blog post pointed out a great way of telling autovacuum to vacuum your tables more agressively!

I figured I had to try it for myself.

Sadly, I don't have access to a beefy server with RAID 10 to play with, so I'll just have to use my laptop. Seeing as don't want to wear down my SSD before its time, I'll plug in external spinning platters and do this over USB 3.0. This will not be a realistic test in terms of the hardware, but hopefully I can still learn something.

As root, set up the external hard drives for PostgreSQL.

# mkdir /mnt/pg_data
# mkdir /mnt/pg_data2

Let's be sure to mount these so that every read is not also a write:

# mount /dev/sdb1 /mnt/pg_data -o noatime
# mount /dev/sdc1 /mnt/pg_data2 -o noatime

# mkdir /mnt/pg_data/pg_data
# mkdir /mnt/pg_data2/pg_data2
# chown postgres:postgres /mnt/pg_data/pg_data
# chown postgres:postgres /mnt/pg_data2/pg_data2
# chmod 700 /mnt/pg_data/pg_data
# chmod 700 /mnt/pg_data2/pg_data2

Set the external HDDs to write to their platters right away on fsync so we have durability:

# hdparm -W0 /dev/sdb
# hdparm -W0 /dev/sdc

Now let's become the postgres user and initialize a database on one of our external HDDs:

# su - postgres
$ /usr/local/pgsql-9.4.5/bin/initdb \
--pgdata=/mnt/pg_data/pg_data \
--encoding=UTF8 \
--no-locale

...

Success. You can now start the database server using:

    /usr/local/pgsql-9.4.5/bin/postgres -D /mnt/pg_data/pg_data
or
    /usr/local/pgsql-9.4.5/bin/pg_ctl -D /mnt/pg_data/pg_data -l logfile start

First, let's make postgresql perform reasonably well on our system. Here's a diff of the changes I made to postgresql.conf.

$ su -
# su - postgres
$ cd /mnt/pg_data/pg_data/
$ diff postgresql.conf /usr/local/pgsql-9.4.5/data/postgresql.conf 
59c59
< #listen_addresses = 'localhost'		# what IP address(es) to listen on;
---
> listen_addresses = '*'		# what IP address(es) to listen on;
115c115
< shared_buffers = 128MB			# min 128kB
---
> shared_buffers = 4GB			# min 128kB
119c119
< #temp_buffers = 8MB			# min 800kB
---
> temp_buffers = 80MB			# min 800kB
126,127c126,127
< #work_mem = 4MB				# min 64kB
< #maintenance_work_mem = 64MB		# min 1MB
---
> work_mem = 10MB				# min 64kB
> maintenance_work_mem = 1GB		# min 1MB
147c147
< #shared_preload_libraries = ''		# (change requires restart)
---
> shared_preload_libraries = 'pg_stat_statements'		# (change requires restart)
190c190
< #wal_buffers = -1			# min 32kB, -1 sets based on shared_buffers
---
> wal_buffers = 16MB			# min 32kB, -1 sets based on shared_buffers
199c199
< #checkpoint_segments = 3		# in logfile segments, min 1, 16MB each
---
> checkpoint_segments = 64		# in logfile segments, min 1, 16MB each
287c287
< #effective_cache_size = 4GB
---
> effective_cache_size = 8GB

Now let's start our postgres database.

$ /usr/local/pgsql-9.4.5/bin/postgres \
-D /mnt/pg_data/pg_data

This is now running in a terminal. Let's go to another terminal and generate some mock data.

We will do this as a regular user (doesn't have to be root; doesn't hav to be postgres)

$ /usr/local/pgsql-9.4.5/bin/psql -U postgres -d postgres

Let's create a tablespace on our second drive and put our table there, so that our WAL file on the first drive does not have to share IO with our table.

postgres@[local]:5432/postgres# create tablespace data2 location '/mnt/pg_data2/pg_data2';

Let's create a table to hold an id and some data. Do not bother picking a primary key or creating any indexes, so that when we populate our table with test data, it will go quickly. But, do create it in the tablespace on the disk that does not hold WAL.

postgres@[local]:5432/postgres# create table things (id bigint not null, data text not null) tablespace data2;

Now let's create our test data. We have to do this in batches of 10 million, because doing all 100 million at once made the OOM killer kick in on my little laptop, which was certainly interesting!

postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(1, 10000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(10000001, 20000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(20000001, 30000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(30000001, 40000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(40000001, 50000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(50000001, 60000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(60000001, 70000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(70000001, 80000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(80000001, 90000000) as id, md5(random()::text) as data; commit;
postgres@[local]:5432/postgres# insert into things (id, data) select generate_series(90000001, 100000000) as id, md5(random()::text) as data; commit;

Even over USB 3, this took a long while, but it did give me the large table I need to do experiments with:

$ ls /mnt/pg_data2/pg_data2/PG_9.4_201409291/12173/
16385    16385.2  16385.4  16385.6  16385_fsm  16390
16385.1  16385.3  16385.5  16385.7  16388

We have 8 files that our table is spread across.

Now I need to build an index to make my updates go quickly.

postgres@[local]:5432/postgres# set maintenance_work_mem = '8GB';
postgres@[local]:5432/postgres# alter table things add constraint things_pk primary key (id); commit;

Let's decide how we will tune our table for vacuuming!

I think the following is a very good start. What we are saying is that instead of waiting for a certain percentage of dead tuples to appear in a 100 million row database, instead, use an absolute number instead of a ratio. When there are more than 5000 rows updated, vacuum and gather stats.

postgres@[local]:5432/postgres# alter table things set (autovacuum_vacuum_scale_factor = 0.0);
postgres@[local]:5432/postgres# alter table things set (autovacuum_vacuum_threshold = 5000);
postgres@[local]:5432/postgres# alter table things set (autovacuum_analyze_scale_factor = 0.0);
postgres@[local]:5432/postgres# alter table things set (autovacuum_analyze_threshold = 5000);

Also, the default statistics are anemic!

postgres@[local]:5432/postgres# show default_statistics_target; rollback;
┌───────────────────────────┐
│ default_statistics_target │
├───────────────────────────┤
│ 100                       │
└───────────────────────────┘

So let's fix them. Apparently, the largest number of samples that can be collected for a particular column of a particular table is 10,000, so that's what I will ask for, seeing as our table is 100 million rows large!

postgres@[local]:5432/postgres# alter table things alter column id set statistics 10000;

I happen to have some pg extensions installed, so let's ask our table how many dead tuples it has:

postgres@[local]:5432/postgres# \x
Expanded display is on.
postgres@[local]:5432/postgres# select * from pgstattuple('things'); rollback;
┌─[ RECORD 1 ]───────┬────────────┐
│ table_len          │ 7656112128 │
│ tuple_count        │ 100000000  │
│ tuple_len          │ 6500000000 │
│ tuple_percent      │ 84.9       │
│ dead_tuple_count   │ 264        │
│ dead_tuple_len     │ 17160      │
│ dead_tuple_percent │ 0          │
│ free_space         │ 29923712   │
│ free_percent       │ 0.39       │
└────────────────────┴────────────┘

That's not a lot of dead tuples. (There normally would be zero after a freshly-loaded table, but I was testing my update program.)

Also, how often does autovacuum check to see if a table should be vacuumed?

postgres@[local]:5432/postgres# show autovacuum_naptime; rollback;
┌────────────────────┐
│ autovacuum_naptime │
├────────────────────┤
│ 1min               │
└────────────────────┘
(1 row)

Every minute. That should be sufficient. Also nice to know we can make it more agressive!

Here is the program I will use to perform a lot of updates against the things table:

package main

import (
	"fmt"
	"math/rand"
	"os"
	"time"

	"github.com/jackc/pgx"
)

func main() {
	rand.Seed(time.Now().Unix())
	for i := 1; i < 4; i++ {
		go pummel(100*(i+1), i)
	}
	// I could presumably also have a guy feed a channel forever here.
	pummel(100, 0)
}

func pummel(waitTime int, workerNumber int) {
	var id int
	var runtimeParams map[string]string
	runtimeParams = make(map[string]string)
	runtimeParams["application_name"] = "pummel"
	connConfig := pgx.ConnConfig{
		User:              "postgres",
		Password:          "postgres",
		Host:              "localhost",
		Port:              5432,
		Database:          "postgres",
		TLSConfig:         nil,
		UseFallbackTLS:    false,
		FallbackTLSConfig: nil,
		RuntimeParams:     runtimeParams,
	}
	conn := connect(connConfig)
	defer conn.Close()
	prep(conn)
	for {
		// random int, 1..100,000,000
		time.Sleep(time.Duration(waitTime) * time.Millisecond)
		id = rand.Intn(100000000) + 1
		fmt.Printf("Worker %d updating id %d\n", workerNumber, id)
		update(conn, id)
	}
}

func connect(connConfig pgx.ConnConfig) (conn *pgx.Conn) {
	conn, err := pgx.Connect(connConfig)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Unable to establish connection: %v", err)
		os.Exit(1)
	}
	return conn
}

func prep(conn *pgx.Conn) (ps *pgx.PreparedStatement) {
	sql := "update things set data = md5(random()::text) where id = $1"
	ps, err := conn.Prepare("foo", sql)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Unable to prepare statement: %v", err)
		os.Exit(1)
	}
	return ps
}

func update(conn *pgx.Conn, id int) {
	_, err := conn.Exec("foo", id)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Unable to update: %v", err)
		os.Exit(1)
	}
}

When pg-pummel is running, output like this scrolls by.

Worker 1 updating id 5925228
Worker 0 updating id 19234380
Worker 3 updating id 71721442
Worker 2 updating id 6166060
Worker 0 updating id 85792830
Worker 1 updating id 77373282
Worker 0 updating id 9007396

Now this is interesting: I am running a lot of updates, and monitoring the dead tuples in the things table, and it seems to keep going up, even though the vacuumer is supposed to kick in after 5000 tuples:

postgres@[local]:5432/postgres# select * from pgstattuple('things'); rollback;
┌─[ RECORD 1 ]───────┬────────────┐
│ table_len          │ 7656816640 │
│ tuple_count        │ 100000000  │
│ tuple_len          │ 6500000000 │
│ tuple_percent      │ 84.89      │
│ dead_tuple_count   │ 9301       │
│ dead_tuple_len     │ 604565     │
│ dead_tuple_percent │ 0.01       │
│ free_space         │ 29938828   │
│ free_percent       │ 0.39       │
└────────────────────┴────────────┘

I went looking for the vacuum process just using ps -ef, as well as checking for the vacuum qury in pg_stat_activity. I could not find it.

Meanwhile, I'm running my pg-pummel program, so more updates are happening all the time.

OK, let's try to manually vacuum.

postgres@[local]:5432/postgres# vacuum things; commit;

At this point, with the vacuum running, my inserts seemed to have blocked, and a quick look in pg_stat_activity showed that they had:

postgres@[local]:5432/postgres# select waiting from pg_stat_activity; rollback;
┌─────────┐
│ waiting │
├─────────┤
│ f       │
│ f       │
│ t       │
│ t       │
│ t       │
│ f       │
└─────────┘
(6 rows)

However, as the manual vacuum was still ongoing, occasionally, the inserts would continue, slowly, and then stop again.

The manual vacuum finally stopped after 764450.571 ms (roughly 12.7 minutes), but the dead tuple count did go down:

postgres@[local]:5432/postgres# select * from pgstattuple('things'); rollback;
┌─[ RECORD 1 ]───────┬────────────┐
│ table_len          │ 7657553920 │
│ tuple_count        │ 99999935   │
│ tuple_len          │ 6499995775 │
│ tuple_percent      │ 84.88      │
│ dead_tuple_count   │ 3666       │
│ dead_tuple_len     │ 238290     │
│ dead_tuple_percent │ 0          │
│ free_space         │ 31052188   │
│ free_percent       │ 0.41       │
└────────────────────┴────────────┘

I then ran another manual vacuum to see how long that would take, and it took only a few seconds. This pointed to the idea that I'd pretty much learned from dealing with other large PostgreSQL databases: frequent vacuuming rewards you with each individual vacuum taking less time.

Sadly, it also seemed to indicate that at least with my setup, autovacuuming wasn't being as agressive as I wanted, because it seemed not to kick in even after more that twice the allowable number of dead tuples were created.

It was time to turn to cron.

It should also be noted that my external hard drives were running so slowly, I'd sometimes get this in the PostgreSQL log:

LOG:  using stale statistics instead of current ones because stats collector is not responding

Anyway, here is the cron I set up to go every 2 minutes:

$ crontab -l

0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58 * * * * /usr/local//pgsql-9.4.5/bin/psql -X -U postgres -d postgres -c 'vacuum things' > /dev/null

Then, I re-launched my pg-pummel program and just left it on, periodically checking the things table with pgstattuple.

That worked! I watched things' dead_tuple_count make its way up to 1685 before falling back down to 349 after a vacuum. I checked pgstattuple a few more times, waiting for the next two minute mark, and, again, the cycle repeated: more dead tuples getting over 1000, but then falling into the low hundreds after another vacuum kicked off by our cron job.

Very nice!

I couldn't get autovacuum to do the agressive vacuuming for me, but the cron job every 2 minutes on a heavly-updated table worked, quite well.