Recommended .psqlrc
Here's what this .psqlrc does:
- Show date on startup
- Set client encoding to SQL_ASCII rather than operating-system default (which would be UTF8 on Linux)
- Always commit manually.
- Command-line prompt formatting: show the db username at the database hostname, at the database, at the port; show if we are in a transaction, and show either > or # at the prompt depending on whether or not our database user has admin-like powers.
- Instead of displaying nulls as blank space, which look the same as empty strings (but are not the same!), show nulls as <<NULL>>.
- Show pretty unicode lines around rows and columns in select results
- Show pretty lines not just between columns and rows in select results, but around the outside too.
- Turn off the pager so that results just keep scrolling by, rather than stopping
- Within columns, wrap long lines so that select results still fit on the display
- Show how long it takes to run each query.
- For pg 9.0 and above, show the application_name in pg_stat_activity; good database citizens set this field so we know who to blame when a query hogs resources, or somebody stays idle in transaction for too long.
- Set bytea output to show as many ASCII letters as possible. (Handy if you are storing text whose encoding you do not know in bytea columns.)
\echo `date +"%Y-%m-%d %H:%M:%S"` \encoding SQL_ASCII \set AUTOCOMMIT off \set PROMPT1 '%n@%m:%>/%/%x%# ' \pset null '<<NULL>>' \pset linestyle unicode \pset border 2 \pset pager off \pset format wrapped \timing set application_name to manni_desktop; commit; set bytea_output to escape; commit;
Set up Linux kernel parameters for PostgreSQL
Run this bash script:
#!/bin/bash set -e set -u SYSCTL=/sbin/sysctl echo "# add the output of this script to /etc/sysctl.conf," echo "# and then, as root, run" echo echo "# sysctl -p /etc/sysctl.conf" echo echo "# to load change the kernel settings for these parameters." echo PAGE_SIZE=$(getconf PAGE_SIZE) echo "# page size is: $PAGE_SIZE" NUM_PHYS_PAGES=$(getconf _PHYS_PAGES) echo "# number of physical pages on this box: $NUM_PHYS_PAGES" CURR_SHMALL=$($SYSCTL -n kernel.shmall) PREF_SHMALL=$(expr $NUM_PHYS_PAGES / 2) echo "# kernel.shmall should be half of the number of pages. Current kernel.shmall, in pages, is: $CURR_SHMALL" echo "# kernel.shmall should be:" echo echo "kernel.shmall = $PREF_SHMALL" echo CURR_SHMMAX=$($SYSCTL -n kernel.shmmax) PREF_SHMMAX=$(expr $PREF_SHMALL \* $PAGE_SIZE) echo "# kernel.shmmax should be half of available RAM, in kB. Current kernel.shmmax, in kB, is: $CURR_SHMMAX" echo "# kernel.shmmax should be:" echo echo "kernel.shmmax = $PREF_SHMMAX" echo # CURR_SHMMIN=$($SYSCTL -n kernel.shmmin) # XXX: does not exist on linux # CURR_SHMSEG=$($SYSCTL -n kernel.shmseg) # XXX: does not exist on linux CURR_SHMMNI=$($SYSCTL -n kernel.shmmni) echo "# kernel.shmmni is usually set to a sane amount on Linux. Currently, it is: $CURR_SHMMNI" # CURR_SEMMNI=$($SYSCTL -n kernel.semmni) # XXX: does not exist on linux # CURR_SHMMNI=$($SYSCTL -n kernel.semmns) # XXX: does not exist on linux # CURR_SHMMSL=$($SYSCTL -n kernel.semmsl) # XXX: does not exist on linux # CURR_SHMMSL=$($SYSCTL -n kernel.semmap) # XXX: does not exist on linux # CURR_SHMMSL=$($SYSCTL -n kernel.semmvx) # XXX: does not exist on linux CURR_SEM=$($SYSCTL -n kernel.sem) echo "# kernel.sem usually has sane defauls. They are currently: $CURR_SEM"
Initialise a new database cluster
I create new db clusters just infrequently enough that I never remember the flags I use when creating the cluster. In this example, I create a database using the SQL_ASCII character encoding (which is sort of the non-character-encoding character encoding, as opposed to, let's say, UTF8 or LATIN1) and collating/sorting based on the C locale, which is the "no locale" (as opposed to, let's say, en_US or fr_CA):
initdb \
--pgdata=/u1/pg/data \
--encoding=SQL_ASCII \
--locale=C
Typical changes to postgresql.conf for large database installations
If you run a postgresql database with few connections but large jobs, here are some recommended settings for postgrsql.conf. Assume a sever with 75GB of RAM.
listen_address='localhost' ==> listen_address='*' # according to http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html, # shared_buffers should be 25% of RAM, but no higher than 8GB. shared_buffers = 32MB ==> shared_buffers = 8GB temp_buffers = 8MB ==> temp_buffers = 80MB work_mem = 1MB ==> work_mem = 1GB # set higher on a per-session basis maintenance_work_mem = 16MB ==> maintenance_work_mem = 20GB # set quite high; used to build indexes and do data loads # according to http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html, # wal_buffers should be 3% of shared_buffers up to a maximum of 16MB, the size of a single WAL segment. wal_buffers = 64kB ==> wal_buffers = 16MB # according to https://www.packtpub.com/article/server-configuration-tuning-postgresql, # checkpoint segments should be much higher than the default of 3: checkpoint_segments = 3 ==> checkpoint_segments = 64 # if you are on RAID 10, the cost of accessing a random page should be much closer to the cost # of sequential IO. (In fact, if you are on SSDs, seq_page_cost and random_page_cost should equal each other.) random_page_cost = 4.0 ==> random_page_cost = 2.0 # according to http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server, # "Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, # and 3/4 of memory is a more aggressive but still reasonable amount." effective_cache_size = 128MB ==> effective_cache_size = 37GB # settings for logging: log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_min_messages = debug1 # obviously rather verbose; drop to warning if your logs fill up log_line_prefix = '%t'
Create a user who can, in turn, create databases and other users
Instead of using the postgres user to create and destroy databases and users, you can create a user for that task instead:
$ sudo su - postgres $ psql template1 template1=# create user myuser superuser createdb createrole password 'mypassword'; template1=# \q
Create a new database
I create new dbs just infrequently enough that I never remember the flags I use when creating the db. In this example, I create a database using the SQL_ASCII character encoding (which is sort of the non-character-encoding character encoding, as opposed to, let's say, UTF8 or LATIN1) and collating/sorting based on the C locale, which is the "no locale" (as opposed to, let's say, en_US or fr_CA):
createdb \
-h myhost \
-U myuser \
--echo \
--owner=myuser \
--locale=C \ # only psql 9.0 and above
--encoding=SQL_ASCII \
my_new_database
Or, from the db:
$ sudo su - postgres
$ psql template0
template1=# create database my_new_database with owner = myuser template = template1
encoding = 'SQL_ASCII' lc_collate = 'C' lc_ctype = 'C';
template1=# \q
database startup script for /etc/init.d
#!/bin/bash
set -e
set -u
# chkconfig: 2345 98 02
# description: PostgreSQL RDBMS
# where to find commands like su, echo, etc...
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DB_ENCODING=SQL_ASCII
DB_LOCALE=C
PG_INSTALL_DIR=/u99/local/pgsql-9.0
PG_DATA_DIR="${PG_INSTALL_DIR}/data"
PG_SERVER_LOG="$PG_DATA_DIR/serverlog"
PG_UNIX_USER=postgres
POSTGRES="$PG_INSTALL_DIR/bin/postgres"
PG_CTL="$PG_INSTALL_DIR/bin/pg_ctl"
INITDB="$PG_INSTALL_DIR/bin/initdb"
# die on first failure; do not keep trucking
set -e
if [ $# -ne 1 ]; then
echo "please enter start/stop/restart etc..." 1>&2
exit 1
fi
# Only start if we can find postgres and pg_ctl.
if [ ! -x $PG_CTL ]; then
echo "$PG_CTL not found" 1>&2
exit 1
fi
if [ ! -x $POSTGRES ]; then
echo "$POSTGRES not found" 1>&2
exit 1
fi
case $1 in
init)
su - $PG_UNIX_USER -c "$INITDB --pgdata='$PG_DATA_DIR' --encoding=$DB_ENCODING --locale=$DB_LOCALE"
;;
start)
echo -n "Starting PostgreSQL: "
su - $PG_UNIX_USER -c "$PG_CTL start -D '$PG_DATA_DIR' -l $PG_SERVER_LOG"
echo "ok"
;;
stop)
echo -n "Stopping PostgreSQL: "
su - $PG_UNIX_USER -c "$PG_CTL stop -D '$PG_DATA_DIR' -s -m fast"
echo "ok"
;;
restart)
echo -n "Restarting PostgreSQL: "
su - $PG_UNIX_USER -c "$PG_CTL stop -D '$PG_DATA_DIR' -s -m fast -w"
su - $PG_UNIX_USER -c "$PG_CTL start -D '$PG_DATA_DIR' -l $PG_SERVER_LOG"
echo "ok"
;;
reload)
echo -n "Reload PostgreSQL: "
su - $PG_UNIX_USER -c "$PG_CTL reload -D '$PG_DATA_DIR' -s"
echo "ok"
;;
status)
su - $PG_UNIX_USER -c "$PG_CTL status -D '$PG_DATA_DIR'"
;;
*)
# Print help
echo "Usage: $0 {start|stop|restart|reload|status}" 1>&2
exit 1
;;
esac
exit 0
copy a database
When you don't specify --template for createdb, the template1 database will get used. However, a fun trick to copy a database is to just specify the database you want copied using the --template flag:
createdb \
-h myhost \
-U myuser \
--echo \
--owner=myuser \
--template=my_old_database \
my_new_database
Get sizes of objects using built-in functions
Get the size of a database:
select pg_size_pretty(pg_database_size('my_db'));
Get the size of a table (but not its toast tables or indices):
select pg_size_pretty(pg_relation_size('my_table'));
Get the size of a table and its toast tables (but not its indices):
select pg_size_pretty(pg_table_size('my_table'));
Get the size of a table and its toast tables and its indices:
select pg_size_pretty(pg_total_relation_size('my_table'));
Force use of indexes by query planner
set enable_seqscan = off; commit;
Set a sequence to the max value of a table column
select setval('my_sequence', max(my_column)) from my_table;
commit;
Set work_mem higher for this session.
If you are going to build a large index or do some other large query that can benefit from access to more RAM, set work_mem larger for your session
set work_mem = '56GB'; commit; -- Do your work here. -- Then be a good neighbour. set work_mem = default; commit;
Get a partial database dump that can be restored to a different user
How to dump a PostgreSQL database in such a way that restoring to a slightly different version of PostgreSQL on a different machine using perhaps a different user will not be a problem:
pg_dump \
--file=/path/to/dump/file.dump \
--format=custom \
--ignore-version \
--schema=public \
--schema=otherschema \
--no-owner \
--verbose \
--no-privileges \
-U user -h myhost mydatabase
How to dump only particular tables from particular schemas
pg_dump \
--file=/path/to/dump/file.dump \
--format=custom \
--ignore-version \
--table=myschema.mytable \
--table=myschema.myothertable \
--table=myotherschema.mythirdtable \
--no-owner \
--verbose \
--no-privileges \
-U user -h myhost mydatabase
How to restore a dump flexibly
Restore a dump, not caring if the user matches nor if the exact point version of PostgreSQL matches.
pg_restore \
--format=custom \
--ignore-version \
--no-owner \
--no-privileges \
--verbose \
-U user -h myhost -d mydatabase \
/path/to/dump/file.dump
How to restore only some items from a dump file
If you have a dump file that has more stuff in it than you want, or that has commands to build indexes and you don't want to build indices at restore time, you can do this:
First, get a list of all the items in the dump file, directing that list to a text file.
pg_restore \
-l /path/to/dump/file.dump \
> /path/to/dump.list
Now edit dump.list, removing anything you do not want to restore (particularly indexes).
Now feed that edited list to pg_restore.
pg_restore \
--format=custom \
--ignore-version \
--no-owner \
--use-list /path/to/dump.list \
--verbose \
--no-privileges \
-U user -h myhost -d mydatabase \
/path/to/dump/file.dump
How to list all of a table's indices
Remember, the easy way, when you are in psql, is to just do
\d mytableand look for the "Indexes" portion of the table description. Otherwise...
select t.relname as table_name,
i.relname as index_name
from pg_class as i
join pg_index as idxjoin
on i.oid = idxjoin.indexrelid
join pg_class as t
on idxjoin.indrelid = t.oid
where t.relname = 'my_table'
and i.relkind = 'i'
order by t.relname,
i.relname;
How to drop all of a table's indices
create or replace function drop_indexes_on_table(a_schema text, a_table text)
returns int
language 'plpgsql'
as $$
declare
sql_to_run text;
idx_rec record;
num_idxs_dropped int;
begin
num_idxs_dropped := 0;
raise notice 'looking for indexes for table %.%', a_schema, a_table;
for idx_rec in
select idx_info.relname as index_name
from pg_index as idx
join pg_class as tbl
on tbl.oid = idx.indrelid
join pg_namespace as schm
on tbl.relnamespace = schm.oid
join pg_class as idx_info
on idx.indexrelid = idx_info.oid
where schm.nspname = a_schema
and tbl.relname = a_table
loop
num_idxs_dropped := num_idxs_dropped + 1;
raise notice 'about to drop index % for table %.%', idx_rec.index_name, a_schema, a_table;
sql_to_run := 'drop index if exists ' || a_schema || '.' || idx_rec.index_name;
execute sql_to_run;
end loop;
return num_idxs_dropped;
end;
$$;
commit;
Get the size of a database
select relname as relname,
pg_size_pretty(sum(relpages) * 8192) as size
from pg_class
group by relname;
Find all tables with a certain column name
select n.nspname,
c.relname,
c.relnamespace,
c.relkind,
a.attname,
a.attrelid
from pg_attribute as a
join pg_catalog.pg_class as c
on a.attrelid = c.oid
join pg_catalog.pg_namespace as n
on c.relnamespace = n.oid
where a.attname = 'my_column'
and n.nspname = 'my_schema'
and c.relkind = 'r';
Get the columns that an index operates on
First get the oid of the table, the oid of the indexes, and the column numbers used by those indexes. The column numbers will be something like "1 3 4", which means the first, third, and fourth columns of the searched-for table are used by the index. (Usually, there's just one number because many indexes are made for one column.)
select tbl.oid as table_oid,
idx.indexrelid as index_oid,
idx.indkey as column_numbers,
idx_info.relname as index_name
from pg_index as idx
join pg_class as tbl on tbl.oid = idx.indrelid
join pg_namespace as schm on tbl.relnamespace = schm.oid
join pg_class as idx_info on idx.indexrelid = idx_info.oid
where schm.nspname = 'my_schema'
and tbl.relname = 'my_table';
table_oid | index_oid | column_numbers | index_name
-----------+-----------+----------------+---------------------------------
48645160 | 215366881 | 3 | my_table_my_col_idx
48645160 | 48877631 | 1 | my_table_my_other_col_idx
Now you can get the names of the columns operated on by each index. For each row (representing one index) from the above query, split i.indkey by space into column numbers. For each column number, run the following query to get the column name.
select c.attname as column_name from pg_attribute as c where c.attrelid = $table_oid -- from previous query and c.attnum = $column_number -- from previous query
Find schemas associated with a particular user
Select schemas in current db associated with user myuser, even if there are no tables in the schema, including public schema:
select 'public' as nspname
union all
select nspname
from pg_catalog.pg_namespace as nsp
join pg_catalog.pg_roles as rls
on nsp.nspowner = rls.oid
where rls.rolname = 'myuser';
Get IO stats for a particular table
select relname as "table",
heap_blks_read as "heap from disc",
heap_blks_hit as "heap from cache",
idx_blks_read as "index from disc",
idx_blks_hit as "index from cache",
toast_blks_read as "toast from disc",
toast_blks_hit as "toast from cache",
tidx_blks_read as "toast index disc",
tidx_blks_hit as "toast index cache"
from pg_statio_user_tables
where relname = 'my_table';
Read activity stats for a particular table
select relname as "table",
seq_scan as "table scans",
idx_scan as "index lookups"
from pg_stat_user_tables
where relname = 'my_table';
Read enhanced activity stats for a particular table
select relname as "table",
seq_scan as "table scans",
seq_tup_read as "tuples scanned",
idx_scan as "index lookups",
idx_tup_fetch as "tuples fetched via index"
from pg_stat_user_tables
where relname = 'my_table';
Find the size of a table
In bytes:
select relpages * 8192 as size_in_bytes from pg_class where relname = 'mytable';
In human-readable form:
select pg_size_pretty(relpages * 8192) as size from pg_class where relname = 'mytable';
Find out how many pages and tuples are used by a table
select relname as "table",
reltuples as "number of tuples",
relpages as "number of 8kb pages"
from pg_class
where relname = 'my_table';
Find live vs. dead tuples of a table
select relname as "table",
n_live_tup as "live tuples",
n_dead_tup as "dead tuples"
from pg_stat_user_tables
where relname = 'my_table';
Find out when a table was last analyzed or vacuumed
select relname as "table",
last_vacuum as "last manual vacuum",
last_autovacuum as "last auto vacuum",
last_analyze as "last manual analyze",
last_autoanalyze as "last auto analyze"
from pg_stat_user_tables
where relname = 'my_table';
Find the number of connections to your database
select datname,
procpid,
usename,
client_addr,
client_port
from pg_catalog.pg_stat_activity;
Kill a user's session based on procpid from pg_stat_activity
pg_terminate_backend(procpid);
Cancel a user's query based on procpid from pg_stat_activity
pg_cancel_backend(procpid);
Set up automatic updating of tables with updated columns
Finally: a legitimate use for triggers!
Many of your table definitions likely have these two columns:
create table users (
name text not null,
-- ... other columns ...
created timestamp with time zone not null default now(),
updated timestamp with time zone not null default now());
create table customers (
name text not null,
-- ... other columns ...
created timestamp with time zone not null default now(),
updated timestamp with time zone not null default now());
But how to you ensure that every update to a row in this table actually updates the "updated" column? A trigger, that's how:
-- Define this stored procedure once
create or replace function set_updated_column_to_now()
returns trigger as
$body$
begin
NEW.updated = now();
return NEW; -- allow the row through for the actual update
end;
$body$
language plpgsql
volatile;
-- Define a trigger for each table that has an updated column
-- (The PostgreSQL docs say that multiple triggers on the same
-- table are executed in alphabetical order, so these are named
-- with a leading aaa_ on the assumption that we want them to
-- set the updated column before other triggers operate on the
-- row.)
create trigger aaa_set_users_updated_column_to_now
before update
on users
for each row
when (NEW.updated = OLD.updated)
execute procedure set_updated_column_to_now();
create trigger aaa_set_customers_updated_column_to_now
before update
on customers
for each row
when (NEW.updated = OLD.updated)
execute procedure set_updated_column_to_now();
A function to return the extension of a filename
create or replace function extract_file_extension(filename text) returns text as $$
declare
extension text;
begin
select regexp_replace(filename, E'^.*\\.', '') into extension;
if extension = filename then
return '';
else
return extension;
end if;
end;
$$ language plpgsql;
Something you might want to turn on before exporting floats with the copy command:
Why is this not the default? I do not know...
set extra_float_digits to 3; commit;