What’s Blocking My Migration?

At Axial, as at many other companies, we have software written in the olden days still in active use. It does what it’s supposed to do, so replacing it is not the highest priority. However, sometimes these older scripts keep database transactions open longer than are necessary, and those transactions can hold locks.

This poses a problem when we’re running Alembic migrations; modifying a table requires a lock not just on that table, but on other tables, depending on the operation. (For example, adding a foreign key requires a lock on the destination table to make sure the foreign key constraint is enforced until the transaction is committed.) Sometimes we developers run a migration and it seems to be taking a long time; usually this is because it’s waiting for a lock. Fortunately, it’s really easy to check on the status of queries and locks in PostgreSQL.

There are three system views we care about: pg_class contains data about tables and table-like constructs, pg_locks contains the locking information (with foreign keys to pg_class), and pg_stat_activity contains information on every connection and what it’s doing. By combining data from these three views (technically pg_class is a system catalog, not a view, but whatever) we can determine which connections are holding locks on which tables, and on which tables a given connection is awaiting locks. We have a script called whats_blocking_my_migration that queries these views.

Rather than explain the columns on these views (which you can see very easily in the docs anyway), I’ll show you the queries we execute.

The first thing we do is get information on Alembic’s connection to the DB. In env.py, we set the “application_name” with the connect_args argument to create_engine; this way we can identify the connection Alembic is using. Once we do that, we can use application_name the check the status of the Alembic connection. If there are no rows, or if none of the connection rows are actually waiting, the script prints that info and exits.

SELECT pid, state, waiting FROM pg_stat_activity WHERE application_name = 'alembic';

However, if an Alembic connection is “waiting”, that means something’s holding a lock that it needs. To find out what, we use the pid (which is the pid of the Postgres process handling the connection) to query pg_locks. Our script actually does a join against pg_class to obtain the table name (and make sure the relation is kind ‘r’, meaning an actual table), but the table name can also be obtained directly from the relation by casting it from oid to regclass.

SELECT relation, mode FROM pg_locks WHERE lock type = 'relation' AND granted IS FALSE and pid IN alembic_pids;

This gets us the list of tables that the Alembic connection needs to lock, but there’s one more step we can take. pg_class will also tell us the pids of the connections that are holding locks on these tables, and we can join that pid back against pg_stat_activity to obtain the address and port number of the TCP connection holding the lock. (Unfortunately, most of our connections don’t yet populate application_name, so we have to provide this info, which the user can cross-reference with lsof -i to figure out which service is holding the lock.) The query looks a bit like this:

SELECT client_addr, client_port FROM pg_stat_activity JOIN pg_locks ON pg_stat_activity.pid = pg_locks.pid WHERE pg_locks.granted IS TRUE AND pg_locks.relation IN locked_relations;

There’s no real secret sauce here; it’s just some basic queries against the Postgres system views, but this makes the lives of our developers a little bit easier.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s