PostgreSQL and bash Stuff

Best way to execute sql scripts using psql

Usually, the most desired way to run an sql script through psql is 1) with the ability to feed it variables that get evaluated by the script, and 2) with the expectation that the script die after the first error that gets encountered (rather than continuing on, as is the default).

Here's a sample that shows these two traits in action.

First, make a wrapper shell script to control how you feed your sql script to psql. In this example, we assume that there are two exactly two required arguments we want to give our shell script that we want to feed to our sql script.

#!/bin/bash

set -e
set -u

if [ $# != 2 ]; then
    echo "please enter a db host and a table suffix"
    exit 1
fi

export DBHOST=$1
export TSUFF=$2

psql \
    -X \
    -U user \
    -h $DBHOST \
    -f /path/to/sql/file.sql \
    --echo-all \
    --set AUTOCOMMIT=off \
    --set ON_ERROR_STOP=on \
    --set TSUFF=$TSUFF \
    --set QTSTUFF=\'$TSUFF\' \
    mydatabase

psql_exit_status = $?

if [ $psql_exit_status != 0 ]; then
    echo "psql failed while trying to run this sql script" 1>&2
    exit $psql_exit_status
fi

echo "sql script successful"
exit 0

A few notes on the above shell script:

TNUM is fed to psql as both a bare value and a value wrapped in single quotes, so that we can use the value in table or schema names (TSUFF, unquoted) or in strings (QTSUFF, quoted)
We use -X to ensure we do not use the current unix users's .psqlrc file
We echo all messages to the console, so we know what sql is being executed (handy in the face of failures)
we use ON_ERROR_STOP to stop our sql script as soon as something goes wrong
We set AUTOCOMMIT off so that we do not get a commit after every statement in file.sql; instead, wherever there is a "commit;" in file.sql, there will be a commit. If you want the entire script to run as one transaction, put a "commit;" at then end of your sql script (and nowhere else); otherwise, the script will run successfully and then not get committed (roll back) at the end! See also the next tip on the --single-transaction flag.

Here are the contents of /path/to/sql/file.sql:

begin;
drop index this_index_:TSUFF;
commit;

begin;
create table new_table_:TSUFF (
    greeting text not null default '');
commit;

begin;
insert into new_table_:TSUFF (greeting)
values ('Hello from table ' || :QTSUFF);
commit;

Using the PG* environmental variables to make your shell scripts more terse.

For unit testing at a certain place I once worked for, this came in handy for database setup/teardown scripts. We would use the PG* environmental variables in our shell scripts, so that instead of running psql with -U for user and -h for host, we would allow it to pick up PGUSER and PGHOST from the environment we were running the shell script in. For safety, we would use bash's ability to provide a default for each PG* environmental variable as we were setting it. This worked quite well.

#!/bin/bash

set -e
set -u

# Set these environmental variables to override them,
# but they have safe defaults.
export PGHOST=${PGHOST-localhost}
export PGPORT=${PGPORT-5432}
export PGDATABASE=${PGDATABASE-my_database}
export PGUSER=${PGUSER-my_user}
export PGPASSWORD=${PGPASSWORD-my_password}

RUN_PSQL="psql -X --set AUTOCOMMIT=off --set ON_ERROR_STOP=on "

${RUN_PSQL} <<SQL
select blah_column 
  from blahs 
 where blah_column = 'foo';
rollback;
SQL

Run a bunch of sql commands in a single transaction.

Often times, you will have a text file filled with SQL commands that you want to execute as a single transaction, and it's easy to forget to put the "commit;" as the last command in the file. There's a way around this, using the --single-transaction flag:

psql \
    -X \
    -U myuser \
    -h myhost \
    -f /path/to/sql/file.sql \
    --echo-all \
    --single-transaction \
    --set AUTOCOMMIT=off \
    --set ON_ERROR_STOP=on \
    mydatabase

And so the contents of file.sql could be:

insert into foo (bar) values ('baz');
insert into yikes (mycol) values ('hello');

and both inserts will be wrapped in a begin/commit.

Make running lots of multi-line sql statements prettier

#!/bin/bash

set -e
set -u

RUN_ON_MYDB="psql -X -U myuser -h myhost --set ON_ERROR_STOP=on --set AUTOCOMMIT=off mydb"

$RUN_ON_MYDB <<SQL
drop schema if exists new_my_schema;
create table my_new_schema.my_new_table (like my_schema.my_table);
create table my_new_schema.my_new_table2 (like my_schema.my_table2);
commit;
SQL

$RUN_ON_MYDB <<SQL
create index my_new_table_id_idx on my_new_schema.my_new_table(id);
create index my_new_table2_id_idx on my_new_schema.my_new_table2(id);
commit;
SQL

Note too that you can use fun bash tricks to assign to multiline variables and feed those to psql later:

CREATE_MY_TABLE_SQL=$(cat <<EOF
    create table foo (
        id bigint not null,
        name text not null);
EOF
)

$RUN_ON_MYDB <<SQL
$CREATE_MY_TABLE_SQL
commit;
SQL

How to get a single scalar select value into a shell variable

CURRENT_ID=$($PSQL -X -U $PROD_USER -h myhost -P t -P format=unaligned $PROD_DB -c "select max(id) from users")
let NEXT_ID=CURRENT_ID+1
echo "next user.id is $NEXT_ID"

echo "about to reset user id sequence on other database"
$PSQL -X -U $DEV_USER $DEV_DB -c "alter sequence user_ids restart with $NEXT_ID"

How to get a one row select into bash variables named for each column in the row

This is my new favourite way:

read username first_name last_name <<< $(psql \
    -X \
    -U myuser \
    -h myhost \
    -d mydb \
    --single-transaction \
    --set ON_ERROR_STOP=on \
    --no-align \
    -t \
    --field-separator ' ' \
    --quiet \
    -c "select username, first_name, last_name from users where id = 5489")

echo "username: $username, first_name: $first_name, last_name: $last_name"

How sweet is that? Another way is to read it into an array:

#!/bin/bash

set -e
set -u

declare -a ROW=($(psql \
    -X \
    -h myhost \
    -U myuser \
    -c "select username, first_name, last_name from users where id = 5489" \
    --single-transaction \
    --set AUTOCOMMIT=off \
    --set ON_ERROR_STOP=on \
    --no-align \
    -t \
    --field-separator ' ' \
    --quiet \
    mydb))

username=${ROW[0]}
first_name=${ROW[1]}
last_name=${ROW[2]}

echo "username: $username, first_name: $first_name, last_name: $last_name"

How to iterate through a result set from a bash script

12 May 2014

This is my new favourite way:

#!/bin/bash

set -e
set -u

PSQL=/usr/bin/psql

DB_USER=myuser
DB_HOST=myhost
DB_NAME=mydb

$PSQL \
    -X \
    -h $DB_HOST \
    -U $DB_USER \
    -c "select username, password, first_name, last_name from users" \
    --single-transaction \
    --set AUTOCOMMIT=off \
    --set ON_ERROR_STOP=on \
    --no-align \
    -t \
    --field-separator ' ' \
    --quiet \
    -d $DB_NAME \
| while read username password first_name last_name ; do
    echo "USER: $username $password $first_name $last_name"
done

Also, you can read into an array:

#!/bin/bash

set -e
set -u

PSQL=/usr/bin/psql

DB_USER=myuser
DB_HOST=myhost
DB_NAME=mydb

$PSQL \
    -X \
    -h $DB_HOST \
    -U $DB_USER \
    -c "select username, password, first_name, last_name from users" \
    --single-transaction \
    --set AUTOCOMMIT=off \
    --set ON_ERROR_STOP=on \
    --no-align \
    -t \
    --field-separator ' ' \
    --quiet \
    $DB_NAME | while read -a Record ; do

    username=${Record[0]}
    password=${Record[1]}
    first_name=${Record[2]}
    last_name=${Record[3]}

    echo "USER: $username $password $first_name $last_name"
done

How use bash to do a job in chunks using a control table

Let's pretend you have a job so large that you want to do it a bit at a time. You decide you can do your job an item at a time, and that will be easier on the database instead of doing one long-running query. You create a table called my_schema.items_to_process, which has the item_id of each item you want to process, and you add a column to that items_to_process table called done, which defaults to false. You can then use a script that gets each not-done item from items_to_process, processes it, then updates that item to done = true in items_to_process. A bash script to do that would look something like this:

#!/bin/bash

set -e
set -u

PSQL="/u99/pgsql-9.1/bin/psql"
DNL_TABLE="items_to_process"
#DNL_TABLE="test"
FETCH_QUERY="select item_id from my_schema.${DNL_TABLE} where done is false limit 1"

process_item() {
    local item_id=$1
    local dt=$(date)
    echo "[${dt}] processing item_id $item_id"
    $PSQL -X -U myuser -h myhost -c "insert into my_schema.thingies select thingie_id, salutation, name, ddr from thingies where item_id = $item_id and salutation like 'Mr.%'" mydb
}

item_id=$($PSQL -X -U myuser -h myhost -P t -P format=unaligned -c "${FETCH_QUERY}" mydb)
dt=$(date)
while [ -n "$item_id" ]; do
    process_item $item_id
    echo "[${dt}] marking item_id $item_id as done..."
    $PSQL -X -U myuser -h myhost -c "update my_schema.${DNL_TABLE} set done = true where item_id = $item_id" mydb
    item_id=$($PSQL -X -U myuser -h myhost -P t -P format=unaligned -c "${FETCH_QUERY}" mydb)
    dt=$(date)
done

How to copy a table from one database to another

There are many ways to do this, but the psql client's \copy command can be used to copy data from one table to another. Let's say you have two databases, olddb and newdb. olddb has a table called users, and newdb has a new, empty table called users that is currently empty but needs the data from olddb's users. Here's how to accomplish this on the command line, piping the output of one psql session into a second psql session:

psql \
    -X \
    -U user \
    -h oldhost \
    -d olddb \
    -c "\\copy users to stdout" \
| \
psql \
    -X \
    -U user \
    -h newhost \
    -d newdb \
    -c "\\copy users from stdin"

A NOTE ON FAILING: I've gotten in teh habit of putting two things at the top of all of my bash scripts:

# fail on uninitialized vars rather than treating them as null
set -u
# fail on the first program that returns $? != 0
set -e

However, when two programs run at once (one piping into the other), the exit status of the last program ends up in $?, and the exit status of the first program gets lost. (Slight lie there: The exit status of the last program in the example above is also put in ${PIPESTATUS[1]}, and the exit status of the first program is put in ${PIPESTATUS[0]}.) When I want the shell script to fail on the failure of any program in the pipeline, I then set three things at the top of my shell script:

set -u
set -e
set -o pipefail

A MORE DIFFICULT EXAMPLE: Let's say your users table in olddb has three columns:

first_name
middle_name
last_name

But your users table in newdb has two columns:

first_name
last_name

You can still use psql's copy command to get what you want:

psql \
    -X \
    -U user \
    -h oldhost \
    -d olddb \
    -c "\\copy (select first_name, last_name from users) to stdout" \
| \
psql \
    -X \
    -U user \
    -h newhost \
    -d newdb \
    -c "\\copy users from stdin"

Here's a good way to get the table definition:

pg_dump \
    -U db_user \
    -h db_host \
    -p 55432 \
    --table my_table \
    --schema-only my_db

Shell script to dump contents of bytea column to a file

Please note that bytea columns, in pg 9.0 and above, display as hex, with an irritating leading '\x', which can be removed using pg's substring funciton.

#!/bin/bash

set -e
set -u

psql \
    -P t \
    -P format=unaligned \
    -X \
    -U myuser \
    -h myhost \
    -c "select substring(my_bytea_col::text from 3) from my_table where id = 12" \
    mydb \
| xxd -r -p > dump.txt

Shell script to show stats of a particular table in a particular database

#!/bin/bash

set -e
set -u

if [ -z "$1" ]; then
    echo "Usage: $0 table [db]"
    exit 1
fi

SCMTBL="$1"
SCHEMANAME="${SCMTBL%%.*}"  # everything before the dot (or SCMTBL if there is no dot)
TABLENAME="${SCMTBL#*.}"  # everything after the dot (or SCMTBL if there is no dot)

if [ "${SCHEMANAME}" = "${TABLENAME}" ]; then
    SCHEMANAME="public"
fi

if [ -n "$2" ]; then
    DB="$2"
else
    DB="my_default_db"
fi

PSQL="psql -U my_default_user -h my_default_host -d $DB -x -c "

$PSQL "
select '-----------' as \"-------------\", 
       schemaname,
       tablename,
       attname,
       null_frac,
       avg_width,
       n_distinct,
       correlation,
       most_common_vals,
       most_common_freqs,
       histogram_bounds
  from pg_stats
 where schemaname='$SCHEMANAME'
   and tablename='$TABLENAME';
" | grep -v "\-\[ RECORD "

And you use it like so:

./table-stats.sh myschema.mytable

or for a table in the public schema:

./table-stats.sh mytable

or for not your default database:

./table-stats.sh mytable myotherdb