Monday, April 09, 2007

Determining who's blocking who in Postgres

Determining who's blocking who in Postgres
If you have databases like I do with lots of concurrent queries, you can sometime run into situations where you issue a query and it just hangs there blocked. Or, more likely somebody or something issues a query and then comes calling when it doesn't seem to be doing anything.

Of course, you have the handy pg_stat_activity and pg_locks views at your disposal, but when it comes to determining exactly which queries are blocking which others and on what table, querying those alone is a tedious way to get the answer. What you really need is a query that sums it all up, in one neat and tidy bundle. Well, my friends here is such a query:

SELECT
bl.procpid as blocked_pid,
bl.usename as user,
bl.current_query as blocked_query,
bl.query_start,
relname as blocked_on ,
lq.procpid as blocking_pid,
lq.usename as user,
lq.current_query as blocking_query,
lq.query_start,
pgl2.mode as lock_type
FROM pg_stat_activity bl, pg_locks pgl1,
pg_stat_activity lq, pg_locks pgl2, pg_class
WHERE bl.procpid = pgl1.pid
AND not pgl1.granted
AND pg_class.oid = pgl1.relation
AND pgl2.relation = pgl1.relation
AND pgl2.granted
AND lq.procpid = pgl2.pid;

In extended mode (\x) psql returns something along these lines for this query:

blocked_pid | 21418
user | sueuser
blocked_query | insert values ('foo', 'bar', 'baz')
into extremely_large_table;
query_start | 2007-02-13 15:14:06.77606-08
blocked_on | extremely_large_table
blocking_pid | 21417
user | joeuser
blocking_query | delete from extremely_large_table;
query_start | 2007-02-13 14:45:34.637675-08
lock_type | AccessExclusiveLock

python spell checker

How to Write a Spelling Corrector
In the past week, two friends (Dean and Bill) independently told me they were amazed at how Google does spelling correction so well and quickly. Type in a search like [speling] and Google comes back in 0.1 seconds or so with Did you mean: spelling. What surprised me is that I thought Dean and Bill, being highly accomplished engineers and mathematicians, would have good intuitions about statistical language processing problems such as spelling correction. But they didn't, and come to think of it, there's no reason they should. I figured they and many others could benefit from an explanation, and so on the plane back from my trip I wrote a toy spelling corrector, which I now share.

Let's get right to it. I figured that in less than a plane flight, and in less than a page of code, I could write a spelling corrector that achieves 80 or 90% accuracy at a rate of at least 10 words per second. And in fact, here, in 20 lines of Python 2.5 code, is the complete spelling corrector:

import re, string, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model

NWORDS = train(words(file('Documents/holmes.txt').read()))

def edits1(word):
n = len(word)
return set([word[0:i]+word[i+1:] for i in range(n)] + ## deletion
[word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + ## transposition
[word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration
[word[0:i]+c+word[i:] for i in range(n+1) for c in string.lowercase]) ## insertion

def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
return max(known([word]) or known(edits1(word)) or known_edits2(word) or [word],
key=lambda w: NWORDS[w])

This defines the function correct, which takes a word as input and returns a likely correction of that word. For example:

>>> correct('speling')
'spelling'
>>> correct('korrecter')
'corrector'