PostgreSQL 9.1 now available

Making for some reasonably good news on 9/11, the next version of PostgreSQL, version 9.1, has been released.

Major enhancements include:

Synchronous replication
continuing the enhancements to built-in WAL-based replication
Per-column collations
to support linguistically-correct sorting down to the column level
Unlogged tables
improving performance for the handling of ephemeral data (e.g. – such as caches)
K-Nearest-Neighbor Indexing
indexing on distances for geographical and text-search queries
Serialized Snapshot Isolation
implementing “true serializability”
Writable Common Table Expressions
recursive and similar queries can now update data
Security Enhanced Postgres
Similar to SE-Linux, providing Mandatory Access Controls for higher grade security
Foreign Data Wrappers
attach to other databases and data sources
managing deployment of additional database features

Many of these continue the trend of continuing to enhance features added in earlier versions (e.g. – synchronous replication, KNN, Writable CTEs)

Some introduce new kinds of functionality (e.g. – SE-Postgres, FDW, Extensions), where new seeds are sown, that we may expect to flower into further new features in future versions.

Work on version 9.2 continues apace; I’m particularly excited about Range Types, which weren’t quite ready for 9.1.

What’s Up Lately With Slony?

What’s up Lately? 2011-04-12 Tue

Git Changeover

In July 2010, we switched over to use Git, which has been working out quite fine so far. The official repository is at; note that some developers are publishing their repositories publicly at GitHub:

You can find details at those “private” repositories of branches that the developers have opened to work on various bug fixes and features.

The next big version

We have been working on what seems most likely to be called the “2.1 release.”

  • There are quite a lot of fixes and enhancements already in place. We have been quite faithful about integrating release notes in as changes are made, so Master RELEASE notes should be quite accurate in representing what has changed. Some highlights include:
    • Changes to queries against sl_log_* tables improve performance when undergoing large backlog
    • Slonik now supports commands to bulk-add tables and sequences
    • Integration of clustertest framework that does rather more sophisticated tests, obsolescing previous “ducttape” and shell script tests.
    • Cleanup of a bunch of things
      • Use named parameters in all functions.
      • Dropped SNMP support that doesn’t seem to run anymore, and which was never part of any regression tests.
  • It is unlikely that it will get dubbed “version 3,” as there aren’t the sorts of deep changes that would warrant such.
    • The database schema has not materially changed in any way that would warrant re-initializing clusters, as was the case between version 1.2 and 2.0.
    • The changes generally aren’t really huge, with the exceptions of a couple features that aren’t quite ready yet (which deserves its own separate discussion)

Still Outstanding

There are two features being worked on, which we hoped would be ready around the time of PGCon 2011:

This feature causes most Slonik commands to wait for whatever event responses should be received before they may be considered properly finished. For instance SUBSCRIBE SET would wait until the subscription has been completed before proceeding.
Multinode FAIL OVER
For clusters where there are multiple origins for different sets, this allows reshaping the entire cluster properly, which has historically been rather more troublesome than people usually were able to recognize.

Unfortunately, neither of these are quite ready yet. It is conceivable that the automatic waiting may be mostly ready, but complications and interruptions have gotten in the way of completion of multinode failover.

When will 2.1 be ready?

Three possibilities seem to present themselves:

  1. Release what we’ve got as 2.1, let the outstanding items arrive in a future version.Unfortunately, this would seem to dictate that we support a “version 2.1” for an extended period of time, complete with the trouble and effort of backpatching. It’s not very attractive.
  2. Draw in Implicit WAIT FOR EVENT, which would make for a substantially more featureful 2.1, and let multinode FAIL OVER come along later.We had been hoping that there would be common functionality between these two features, so had imagined it a bad idea to do one without the other. But perhaps that’s wrong, and Implicit WAIT FOR EVENT doesn’t need multinode failover to be meaningful. That does seem like it may be true.

    There is still the same issue as with 1. above, that this would mean having an extra version of Slony to support, which isn’t something anyone is too keen on.

  3. Wait until it’s all ready.This gets rid of the version proliferation problem, but means that it’s going to be a while (several months, perhaps quite a few) before users may benefit from any of these enhancements.

    Development of the failover facility seems like it will be bottlenecked for a while on Jan, so this suggests that it may be timely to solicit features that Steve and I might work on concurrently in the interim.

So, what might still go into 2.1?

  • We periodically get bug reports from people about this and that, and minor things will certainly get drawn in, particularly if they represent incorrect behaviour.
  • ABORT scriptI plan to send a note out soon describing my thoughts thus far.
  • Cluster Analysis ToolingI think it would be pretty neat to connect to a Slony cluster, pull out some data, and generate some web pages and GraphViz diagrams to characterize the status and health of the cluster.
  • There was evidently discussion at PGEast about trying to get the altperl scripts improved/cleaned up.My personal opinion (cbbrowne) is that they’re not quite general enough, and that making them so would be more trouble than it’s worth, so my “vote” would be to deprecate them.

    But that is certainly not the only opinion out there – there are apparently others that regularly use them.

    While I’m not keen on putting effort into them, if there is some consensus on what to do, I’d go along with it. That might include:

    • Adding scripts to address slonik features that have not thus far been included in altperl.
    • Integrating tests into the set of tests run using the clustertest framework, so that we have some verification that this stuff works properly.
  • Insert Your Pet Feature Here?Maybe there’s some low hanging fruit that we’re not aware of that’s worth poking at.

Fast COUNT(*) in PostgreSQL

One of the frequently-asked questions about PostgreSQL is “why is SELECT COUNT(*) FROM some_table doing a slow sequential scan?”

This has been asked repeatedly on mailing lists everywhere, and the common answer in the FAQ provides a fine explanation which I shall not repeat. There is some elaboration on slow counting.

Regrettably, the proposed alternative solutions aren’t always quite so fine. The one that is most typically pointed out is this one, Tracking the row count

How Tracking the row count works

The idea is fine, at least at first blush:

  • Set up a table that captures row counts
CREATE TABLE rowcounts (
  table_name text not null primary key,
  total_rows bigint);
  • Initialize row counts for the desired tables
DELETE FROM rowcounts WHERE table_name = 'my_table';
INSERT INTO ROWCOUNTS (table_name, total_rows) SELECT 'my_table', count(*) from my_table;
  • Establish trigger function on my_table which has the following logic
if tg_op = 'INSERT' then
   update rowcounts set total_rows = total_rows + 1
     where table_name = 'my_table';
elsif tg_op = 'DELETE' then
   update rowcounts set total_rows = total_rows - 1
     where table_name = 'my_table';
end if;
  • If you want to know the size of my_table, then query
SELECT total_rows FROM rowcounts WHERE table_name = 'my_table';

The problem with this approach

On the face of it, it looks fine, but regrettably, it doesn’t work out happily under conditions of concurrency. If there are multiple connections trying to INSERT or DELETE on my_table, concurrently, then all require an exclusive lock on the tuple in rowcounts for my_table, and there is a risk (heading towards unity) of:

  1. Deadlock, if different connections access data in incompatible orderings
  2. Lock contention, leading to delays
  3. If some of the connections are running in SERIALIZABLE mode, rollbacks due to inability to serialize this update

So, there is risk of delay, or, rather worse, that this counting process causes otherwise perfectly legitimate transactions to fail. Eek!

A non-locking solution

I suggest a different approach, which eliminates the locking problem, in that:

  • The triggers are set up to only ever INSERT into the rowcounts
  • An asynchronous process does summarization, to shorten rowcounts
  • I’d be inclined to use a stored function to query rowcounts

Table definition

CREATE TABLE rowcounts (
    table_name text not null,
    total_rows bigint,
    id serial primary key);
create index rc_by_table on rowcounts(table_name);

I add the id column for the sake of nit-picking normalization, so that anyone that demands a primary key gets what they demand. I’d not be hugely uncomfortable with leaving it off.

Trigger strategy

The triggers have the following form:

if tg_op = 'INSERT' then
   insert into rowcounts(table_name,total_rows) values ('my_table',1);
elsif tg_op = 'DELETE' then
   insert into rowcounts(table_name,total_rows) values ('my_table',-1);
end if;

Note that since the triggers only ever INSERT into rowcounts, they no longer interact with one another in a way that would lead to locks or deadlocks.

Function to return row count

create or replace function row_count(i_table text) returns integer as $$
   return sum(total_rows) from rowcounts where table_name = i_table;
$ language plpgsql;

It would be tempting to have this function itself do a “shortening” of the table, but, that would reintroduce into the application the locking that we were wanting to avoid. So DELETE/UPDATE are still deferred.

Function to clean up row counts table

This function needs to be run once in a while to summarize the table contents.

create or replace function rowcount_cleanse() returns integer as $$
   prec record;
   for prec in select table_name, sum(total_rows) as sum, count(*) as count from rowcounts group by table_name loop
       if count > 1 then
          delete from rowcounts where table_name = prec.table_name;
          insert into rowcounts (table_name, total_rows) values (prec.table_name, prec.total_rows);
       end if;
   end loop;
   return 0;
$ language plpgsql;

Initializing rowcounts for a table that is already populated

Nothing has yet been mentioned that would cause an initial entry to go into rowcounts for an already-populated table.

create or replace function rowcount_new_table(i_table text) returns integer as $$
   query text;
   delete from rowcounts where table_name = i_table;
   query := 'insert into rowcounts(table_name, total_rows) select ''|| i_table ||'', count(*) from ' || i_table || ';';
   execute query;
   return total_rows from rowcounts where table_name = i_table;
$ language plpgsql;

If a table has already got data in it, then it’s necessary to populate rowcounts with an initial count. Implementing such a function is straightforward, and is left as an exercise to the reader.

Further enhancements possible

It is possible to shift some of the maintenance back into the row_count() function, if we do some exception handling.

create or replace function row_count(i_table text) returns integer as $$
   prec record;
      lock table rowcounts nowait;
      select sum(total_rows) as sum, count(*) as count from rowcounts where table_name = i_table;
      if count > 1 then
          delete from rowcounts where table_name = i_table;
          insert into rowcounts (table_name, total_rows) values (prec.table_name, prec.total_rows);
      end if;
      return prec.total_rows;
      return sum(total_rows) from rowcounts where table_name = i_table;
$ language plpgsql;

This is more than a little risky, as, if this function wins the lock, it will block other processes that wish to access row counts until it’s done, this likely isn’t a worthwhile exercise.

Please Send A Patch

Recent Debian blog entries with this title (by Lucas Nussbaum, Matt Palmer) point out assortedly that:

  • Existing developers frequently know the code base so much better than newcomers that they’re likely way more effective at improving things than some callow newcomer.
  • Taking those developers’ time to do your pet thing instead of something they find useful mayn’t be more effective.

Both points are quite valid, and recent PostgreSQL CommitFest activity suggests a way to at least try to evaluate things.

The PostgreSQL project has a number of committers that are unusually productive developers (-1 from me, Tom? :-)), and there have certainly been times when the “best” outcome has been for someone to come in suggesting ideas, and for one of the notably productive folk to implement it.

But there has been some debate surrounding the 2011-01 CommitFest, which consists of some 98 proposed patches, all of which require review. These are all, in fact, patches that came as some sort of response to Please send a patch :-). The trouble with this particular CommitFest is that the patches have been overwhelming the reviewers in terms of sheer volume. Developers that should be considering working on their own “pet features” have been drawn into the review process to look at others’ features instead. None of these results are inherently a bad thing, except for the aggregate that falls out, which is that there’s so much stuff outstanding that it’s tough to get them all properly reviewed.

If a project is busy and vital, it’s pretty necessary for people to do a fair bit of “scratching their own itches” (in keeping with Matt Palmer’s comment) in order to grow the community of people capable of giving real assistance to managing the code base.

“Growing community” requires that some people struggle with the code base a bit so that they become familiar enough to become effective in the future.

NoSQL’s next step – stored procedures

The latest discovery is that the “bad old stored procedures” of SQL… Are what NoSQL needs…

They’re calling them coprocessors or plugins, and it’s truly not terribly surprising. The High Scalability article makes a Battlestar Galactica joke, of\_return. The BSG line that kept coming back over and over was: All this has happened before, and all this will happen again. There’s a rather depressing possibility that people will consider coprocessors to be the greatest thing ever, not realizing that a substantial chunk of the same issues true (for better and worse) for SQL stored procedures will also hold true for coprocessors and they may learn (or fail to learn!) from scratch.

The notion is that you colocate, along with your database, some kind of “coprocessor engine” that can run code locally, which solves a number of problems, some not new, but some somewhat unique to key/value stores:


You’re running your application in the cloud and have somewhat spotty connectivity between the place where your application logic runs and the database where the data is stored. A coprocessor brings logic right near the database, resolving this problem.

Bulk data transfer

A difference between SQL and key/value stores is that SQL is quite happy shovelling sets of data back and forth, whereas key/value stores are all about singular key/value pairs. An SQL request readily “scales” by transferring data in bulk, whereas key/value can get bogged down by there being a zillion network round trips. A coprocessor can keep a bunch of those “round trips” inside the database layer, which will be a big win.

Goodbye, foreign keys, hello, um, ???

You may be able to shove some combination of logic maintenance and such into the coprocessor area, thereby gaining back some of the things lost when NoSQL eschewed SQL foreign key references and triggers.

Data normalization analysis returns

One of the typical things to do with NoSQL is to “shard” the database so each database server only has part of the data, and may operate independently of other database servers.

Coprocessor use will require that all the data that is to be used is on the local server, otherwise you head back to the problem of shovelling tuples back and forth between DB servers with the zillions of network roundtrips problem.

To guard against that, the data needs to be normalized in such a way that the data relevant to the coprocessors is available locally. (Perhaps not exclusively, but generally so. A few round trips may be OK, but not zillions.)

It seems to me that people have been excited by NoSQL in part because they could get away from all that irritating SQL normalization rules stuff. But this bit implies that this benefit was something of a mirage. Perhaps the precise rules of Boyce-Codd Normal Form are no longer crucial, but you’ll still need to have some kind of calculus to ascertain which divisions work and which don’t.

Things still not clear about this…

Managing the coprocessors

One of the challenges faced in SQL systems that use a lot of stored procedures is that of managing these procedures, complete with versioning (because what goes into production on day #1 isn’t what will be there forever, right?).

Windows always used to suffer (may still suffer, for all I know) from dependency hell, where different applications may need competing versions of libraries. (Entertainment of the week was seeing that the Haskell folks are, of late running into this.  Not intended as insult; it’s a problem that is nontrivial to avoid.)

It’s surely needful to have some kind of coprocessor dictionary to keep this sort of thing under some control. It’s never been trivial for any system, so there’s room for:

* Repeating yesteryear’s errors

* Learning from other systems’ mistakes

* Discovering brand new kinds of mistakes

How rich should the coprocessor environment be?

On the powerful side, surely is neat, but having the ability to run arbitrary code there is risky…

How auditable will these systems be?

On the positive side, it’s presumably plausible to add auditing coprocessors to capture interesting information for regulatory purposes.

On the other hand, arbitrarily powerful things like node.js might make it arbitrarily easy to evade regulation.

There aren’t necessarily easy answers to that.

Aside: org2blog mode is pretty nifty…  Made it pretty easy to build this without much tagging effort…

PostgreSQL 9.0 released!

A new release of the most advanced open source database is now available!

As always, as a new major release, there are great gobs of little features that have been added, most of which, individually, likely don’t matter to any particular individual. (For instance, there are a couple dozen enhancements to ECPG, and if you don’t know you’re using that, you almost certainly aren’t, and so those changes likely don’t affect you.)

But there are plenty that are liable to matter, and, indeed, to help improve behaviour of one’s streams of queries, often without even needing any changes to applications.

See also the official release notice, for “markety-speak.”

And see official release notes (that are part of the documentation tree) for deeper details of all the changes in the new release.

PGAN proposal

David Wheeler has recently published a proposal for PGAN which is essentially like like a CPAN for PostgreSQL.

It combines:

  • The use of PGXS to enable building C-based extensions
  • Standardized metadata about any given module

It assumes the use of some specific technologies, which probably warrants some further discussion (because people are sure to have vigorous disagreement about some of them, not because his choices are bad, but because people are a problem!):

  • PGXS, which is a given that seems entirely apropos
  • JSON as a data format for the metadata
  • A choice of pg_regress or pgTAP regression tests.
    (Aficionados of other test frameworks may think differently!)
  • HTML documentation
    It’s worth observing that PostgreSQL contrib material has been trending towards use of DocBook.
    Other analogous structures like CPAN have developed formats such as POD that enable multi-format output, as well as the notion of transforming documentation into deployable man pages.

None of the possible differences are forcibly objectionable; they’re just different options that may be worth considering.

See also…  David Wheeler’s Blog

    More Slony work

    I have been way way too busy to do substantial Slony work in a while. Very very engaged on internal (infernal?) DB apps work.

    At long last I reached a certain degree of completion that allowed me a breather, and a little time to look at Slony 2.0 issues.

    I have been experimenting with Git lately, in several contexts, so pulled the PostgreSQL Git repo, with a view to using that as my “PostgreSQL HEAD” for testing. While the “official” PG version for our apps is 8.3, I usually do my builds/tests on either 8.4 or CVS HEAD, or, I guess, now, Git “master” ;-).

    After checking out Git master, I found problems with both the internal app (minor thing in accessing information_Schema) and, alas, Slony :-(. A function now has 3 arguments (and, in the Klingon tradition, always wins them!), thus needing a bit of autoconf remediation. I hate autoconf… But absent some substantial Tilbrook contributions, that won’t be changing soon! 🙁

    I surely hope I can run through a set of regression tests this coming week so as to get 2.0.3 released!