pgpool-II + Slony-Iクラスタ構成

SRA OSS, inc.
北川俊広

はじめに
- Slony-Iとは
- pgpool-II と Slony-I を組み合わせる利点
pgpool-II + Slony-I クラスタの構築
動作確認
縮退動作
スレーブサーバのマスタサーバへの昇格
おわりに

はじめに

本稿では、pgpool-II 3.0 と Slony-I 2.0 を組み合わせたクラスタ構成について解説します。

Slony-Iとは

Slony-I は、PGDG (PostgreSQL Global Development Group: PostgreSQLの開発チーム) が主体となって開発している非同期方式のレプリケーションソフトウェアです。システムはマスタ・スレーブ構成をとり、マスタのみ更新が可能です。スレーブは複数台用意することができます。レプリケーションの仕組みは、PostgreSQL のトリガ機能を使用しており、あらかじめレプリケーション対象のテーブルにトリガを作成しておくことで、ユーザがデータを更新するとトリガ関数によって、更新内容が Slony-I の管理テーブルに記録されるようになっています。そして、その更新内容は、定期的に Slony-I のデーモンプロセスである slon がスレーブに転送し、それをスレーブで適用することによってレプリケーションを実現します。

非同期方式というと PostgreSQL 9.0 の Streaming Replication/Hot Standby との違いが気になりますが、Slony-I には以下の利点があります。

PostgreSQL のメジャーバージョンが異なっていてもレプリケーション可能
異なる CPU アーキテクチャのハードウェアや異なる OS 間でもレプリケーション可能
レプリケーション対象とするテーブルの選択が可能
テーブル毎にレプリケーション先のスレーブ選択が可能
レプリケーション対象のテーブル群ごとにマスタとなるサーバを定義可能

上記は、Slony-I のマニュアルを参考にしたものですが、このように Streaming Replication/Hot Standby よりも、柔軟にレプリケーション構成を組めるのが Slony-I の利点です。

ただし、レプリケーションの仕組みにトリガを使用している関係で、以下の変更をスレーブに反映できないという制限があります。

ラージオブジェクトの変更
DDLコマンドによる変更
ロールの変更
TRUNCATE コマンドによる変更（将来的には対応するかもしれない）

pgpool-II と Slony-I を組み合わせる利点

pgpool-II と Slony-I を組み合わせる利点は、Slony-I に不足している機能を pgpool-II で補えるという点です。

例えば、Slony-I を負荷分散目的で使用する場合、更新クエリをマスタサーバのみに送信し、参照クエリをマスタサーバかスレーブサーバに送信するように、アプリケーション側で送信先を制御する必要があります。また、Slony-I には障害を検知して自動的にフェイルオーバを行う仕組みがありません。しかし、それらの機能は pgpool-II が持っていますので、組み合わせることで補うことができます。

pgpool-II + Slony-I クラスタの構築

それでは、実際に pgpool-II + Slony-I クラスタを構築していきましょう。構築する際の前提として各サーバマシンの役割、ホスト名、ポート番号は次のとおりとします。

サーバ	ホスト名 (IPアドレス)	ポート番号
PostgreSQLサーバ (マスタ)	host0.example.org (192.168.1.1)	5432
PostgreSQLサーバ (スレーブ)	host1.example.org (192.168.1.2)	5432
pgpool-IIサーバ	host9.example.org (192.168.1.3)	9999

各サーバマシンには、執筆時点で最新の pgpool-II 3.0.1 と PostgreSQL 9.0.2 をインストールします。pgpool-II のインストール方法については第2回の記事を参考にしてください。以降、pgpool-II と PostgreSQL は「/usr/local」にインストールされており、データベースクラスタは「/var/pgsql/pgdata」にあるものとします。また、Slony-I のデーモン slon は、マスタサーバとスレーブサーバで動かします。

PostgreSQLの設定

まず、pgpool-II と Slony-I からの接続を許可するため、マスタサーバとスレーブサーバの pg_hba.conf ファイルに次の1行を追加します。今回は簡易的にパスワード認証を使用せず、trust 認証を使用します。

host    all             all             192.168.1.0/24          trust

そして、postgresql.conf ファイルの最低限必要な次のパラメータを変更します。

listen_addresses = '*'

以上で、PostgreSQL の設定は完了です。PostgreSQL を起動もしくは再起動します。

Slony-I の入手とインストール

Slony-I のソースコードはここのサイトからダウンロードできます。

現在、Slony-I は主にバージョン 2.0 と 1.2 のメンテナンスが行われています。バージョン 2.0 は PostgreSQL 8.3 以降でしか動作しませんので、実際は、PostgreSQL のバージョンに応じて Slony-I のバージョンを選択します。本稿では、執筆時点で最新の 2.0.6 を使用します。

インストールは、3台すべてのサーバにインストールします。pgpool-II サーバにはインストールが不要に思われますが、pgpool-II がフェイルオーバを行う際は Slony-I に対してもフェイルオーバを命令する必要があり、その際に Slony-I のコマンドを使用するため、インストールが必要になります。以降、3台のマシンに同じ手順でインストールします。

slony1-2.0.6.tar.bz2 を /usr/local/src に配置したあと、一通りディレクトリを作成してその所有者を postgres に変更します。

[root@host*]# mkdir /usr/local/src/slony1-2.0.6
[root@host*]# mkdir /usr/local/slony
[root@host*]# chown postgres:postgres /usr/local/src/slony1-2.0.6
[root@host*]# chown postgres:postgres /usr/local/slony

/usr/local/src/slony1-2.0.6 はソースコードの展開先、/usr/local/slony はインストール先です。

続いて、ソースコードを展開し、ビルド、インストールします。

[root@host*]# su - postgres
[postgres@host*]$ tar jxf slony1-2.0.6.tar.bz2
[postgres@host*]$ cd slony1-2.0.6
[postgres@host*]$ ./configure --prefix=/usr/local/slony \
                      --with-pgconfigdir=/usr/local/bin
[postgres@host*]$ make install

--prefix オプションでインストール先を、--with-pgconfigdir オプションで pg_config コマンドの場所を指定できます。--with-pgconfigdir は、複数の PostgreSQL をインストールしている場合や pg_config コマンドにパスが通っていない場合は指定する必要があります。

最後に、必要に応じて Slony-I のコマンドにパスを通して終了です。

[postgres@host*]$ vi ~/.bash_profile
PATHに「/usr/local/slony/bin」を追加

※ 上記はシェルが bash の場合です。

レプリケーションを行うデータベースの作成

まず、マスタサーバにて動作確認用の testdb3 データベースを作成し、psql で testdb3 データベースにログインして下記の SQL を実行します。

[postgres@host0]$ createdb testdb3
[postgres@host0]$ psql testdb3
psql (9.0.2)
Type "help" for help.

=# CREATE TABLE t1 (id SERIAL PRIMARY KEY, comment TEXT, 
                    ins_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP);

これで次のテーブルが作成されます。

testdb3=# \d t1
                                    Table "public.t1"
  Column  |            Type             |      Modifiers
----------+-----------------------------+--------------------------
 id       | integer                     | not null default nextval
                                         ('t1_id_seq'::regclass)
 comment  | text                        |
 ins_time | timestamp without time zone | default now()
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)

※ 幅の関係で1行のデータを改行して記述しています。

Slony-I は、PRIMARY KEY が付いていないテーブルをレプリケーションできません。そこで、レプリケーションを行うテーブルには必ず PRIMARY KEY を付けるようにします。また、DDL コマンドによる変更も自動的にレプリケーションできないため、pg_dump と psql を使用してスレーブサーバにスキーマをコピーします。

[postgres@host0]$ pg_dump -C -s testdb3 | psql -h host1.example.org

Slony-I の設定

Slony-I の設定は slonik という管理コマンドを使用して行います。下記のスクリプトはSlony-I のマニュアルに記載されているサンプルを修正したものです。これを、マスタサーバで実行します。

[postgres@host0]$ sh setup.sh

setup.sh
-----------------------------------------------------------------
#!/bin/sh

CLUSTERNAME=testdb3_cluster
MASTERDBNAME=testdb3
SLAVEDBNAME=testdb3
MASTERHOST=host0.example.org
SLAVEHOST=host1.example.org
REPLICATIONUSER=postgres

/usr/local/slony/bin/slonik <<_EOF_
    #--
    # define the namespace the replication system
    # uses in our example it is slony_example
    #--
    cluster name = $CLUSTERNAME;

    #--
    # admin conninfo's are used by slonik to connect to 
    # the nodes one for eachnode on each side of the cluster, 
    # the syntax is that of PQconnectdb in
    # the C-API
    # --
    node 1 admin conninfo = 'dbname=$MASTERDBNAME \
           host=$MASTERHOST user=$REPLICATIONUSER';
    node 2 admin conninfo = 'dbname=$SLAVEDBNAME \
           host=$SLAVEHOST user=$REPLICATIONUSER';

    #--
    # init the first node.  Its id MUST be 1.  This creates 
    # the schema _$CLUSTERNAME containing all replication 
    # system specific database objects.
    #--
    init cluster ( id=1, comment = 'Master Node');

    #--
    # Slony-I organizes tables into sets.  The smallest unit 
    # a node can subscribe is a set. The master or origin of 
    # the set is node 1.
    #--
    create set (id=1, origin=1, comment='All testdb3 tables');
    set add table (set id=1, origin=1, id=1, 
                   fully qualified name = 'public.t1',
                   comment='t1 table');
    set add sequence (set id=1, origin = 1, id = 1, 
                   fully qualified name = 'public.t1_id_seq',
                   comment = 't1 id sequence');

    #--
    # Create the second node (the slave) tell the 2 nodes how 
    # to connect to each other and how they should listen for events.
    #--

    store node (id=2, comment = 'Slave Node', event node=1);
    store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME \
                host=$MASTERHOST user=$REPLICATIONUSER');
    store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME \
                host=$SLAVEHOST user=$REPLICATIONUSER');
_EOF_
-----------------------------------------------------------------

スクリプト中の各種設定コマンドの意味は、Slony-I のマニュアルを参照してください。

上記のスクリプトを実行すると、Slony-I の各種管理テーブルが testdb3 データベースの「_testdb3_cluster」というスキーマに作成され、そこに設定内容が記録されます。

slon デーモンの起動

マスタサーバとスレーブサーバで slon デーモンを起動します。

[postgres@host0]$ /usr/local/slony/bin/slon testdb3_cluster \
                 "dbname=testdb3 user=postgres host=localhost" &
[1] 27969
2011-01-07 15:54:06 JSTCONFIG main: slon version 2.0.6 starting up
…

[postgres@host1]$ /usr/local/slony/bin/slon testdb3_cluster \
                 "dbname=testdb3 user=postgres host=localhost" &
[1] 7772
2011-01-07 15:54:32 JSTCONFIG main: slon version 2.0.6 starting up
…

レプリケーションの開始

マスタサーバにて、下記のスクリプトを実行するとレプリケーションが始まります。

[postgres@host0]$ sh subscribe.sh
:18: NOTICE:  subscribe set: omit_copy=f
…

subscribe.sh
----------------------------------------------------------------------
#!/bin/sh

CLUSTERNAME=testdb3_cluster
MASTERDBNAME=testdb3
SLAVEDBNAME=testdb3
MASTERHOST=host0.example.org
SLAVEHOST=host1.example.org
REPLICATIONUSER=postgres

/usr/local/slony/bin/slonik <<_EOF_
     # ----
     # This defines which namespace the replication system uses
     # ----
     cluster name = $CLUSTERNAME;

     # ----
     # Admin conninfo's are used by the slonik program to connect
     # to the node databases.  So these are the PQconnectdb arguments
     # that connect from the administrators workstation (where
     # slonik is executed).
     # ----
     node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST \
                              user=$REPLICATIONUSER';
     node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST \
                              user=$REPLICATIONUSER';

     # ----
     # Node 2 subscribes set 1
     # ----
     subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);
_EOF_
------------------------------------------------------------------------

Slony-Iの動作確認

マスタサーバにて psql で testdb3 データベースにログインし、テストデータを挿入します。

[postgres@host0]$ psql testdb3
psql (9.0.2)
Type "help" for help.

testdb3=# INSERT INTO t1(comment) VALUES('replication test')
INSERT 0 1

マスタサーバとスレーブサーバで t1 テーブルの内容を表示し、同じ行が表示されれば成功です。

testdb3=# SELECT * FROM t1;
 id |     comment      |          ins_time
----+------------------+----------------------------
  1 | replication test | 2011-01-07 16:29:36.780824
(1 row)

pgpool-IIの基本設定

第2回の記事を参考にして pgpool.conf を書き換えます。今回はテンプレートとして pgpool.conf.sample-master-slave を使用します。

[root@host9]# cd /usr/local/etc
[root@host9]# cp pgpool.conf.sample-master-slave pgpool.conf
[root@host9]# vi pgpool.conf

第2回と設定が異なるのは次のパラメータです。

replication_mode = false
master_slave_mode = true
master_slave_sub_mode = 'slony'

今回はSlony-Iクラスタ用の設定になりますので、master_slave_mode を true にし、master_slave_sub_mode を 'slony' にします。

pgpool-II のフェイルオーバ設定

まず、フェイルオーバ時に実行するスクリプトを作成します。

[root@host9]# vi /usr/local/bin/failover.sh

failover.sh
----------------------------------------------------------------------
#! /bin/sh
# Failover command for Slony-I.
# This script assumes that DB node 0 is master, and 1 is slave.
# 
# If slave goes down, does nothing. If master goes down, execute
# slonik command.
#
# Arguments: $1: failed node id.

CLUSTERNAME=testdb3_cluster
MASTERDBNAME=testdb3
SLAVEDBNAME=testdb3
MASTERHOST=host0.example.org
SLAVEHOST=host1.example.org
REPLICATIONUSER=postgres

failed_node=$1

# Do nothing if slave goes down.
if [ $failed_node = 1 ]; then
	exit 0;
fi

# execute slonik command
/usr/local/slony/bin/slonik <<_EOF_
cluster name = $CLUSTERNAME;
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST \
                         user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST \
                         user=$REPLICATIONUSER';
failover (id = 1, backup node = 2);
drop node (id = 1, event node = 2);
_EOF_

exit 0;
----------------------------------------------------------------------

作成したスクリプトに実行権限を与えます。

[root@host9]# chmod 755 /usr/local/bin/failover.sh

pgpool.conf の failover_command を以下のように設定します。

[root@host9]# vi /usr/local/etc/pgpool.conf

failover_command = '/usr/local/bin/failover.sh %d'

引数の %d は、実行時に pgpool-II によって「切り離されたノード番号」に置き換えられます。他にも以下の表のパターンが使用できます。

文字	意味
%d	切り離されたノード番号
%h	切り離されたノードのホスト名
%H	新しいマスターのホスト名
%p	切り離されたノードのポート番号
%D	切り離されたノードのデータベースクラスタパス
%M	古いマスターのノード番号
%m	新しいマスターのノード番号
%%	'%'文字

以上で、pgpool-II の設定は完了です。

動作確認

pgpool-II を起動し、pgpool-II 経由でテストデータを挿入します。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql (9.0.2)
Type "help" for help.

testdb3=# INSERT INTO t1(comment) VALUES('from pgpool-II');
INSERT 0 1

マスタサーバとスレーブサーバで t1 テーブルの内容を表示し、同じ行が表示されれば正常です。

testdb3=# SELECT * FROM t1;
 id |     comment      |          ins_time
----+------------------+----------------------------
  1 | replication test | 2011-01-07 19:29:36.780824
  2 | from pgpool-II   | 2011-01-07 19:54:30.977498
(2 rows)

縮退動作

スレーブサーバを停止したときに、縮退運転に以降するかどうか確認します。まず、スレーブサーバの PostgreSQL を停止します。pg_ctl コマンドで immediate モードを指定して停止すると、終了処理をせずに直ちに停止するので、障害によって停止した場合と似た状態になります。

[postgres@host1]$ pg_ctl stop -m immediate -D /var/pgsql/pgdata

スレーブサーバの PostgreSQL を停止後、pgpool-II からアクセスすると接続エラーが発生します。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql: server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.

ここで障害が検知され縮退運転に移行します。再度アクセスすると接続可能になります。SHOW pool_status で確認すると末尾の項目 backend status1 が、稼働中を表す 2 でなく、停止中を表す 3 になっていることが確認できます。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql (9.0.2)
Type "help" for help.

testdb3=# SHOW pool_status;
        item       |       value       |        description
-------------------+-------------------+----------------------------
(中略) 
 backend_hostname1 | host1.example.org | backend #1 hostname
 backend_port1     | 5432              | backend #1 port number
 backend_weight1   | 0.500000          | weight of backend #1
 backend status1   | 3                 | status of backend #1
 standby_delay1    | 0                 | standby delay of backend #1
(75 rows)

次に、縮退運転の状態でテストデータを挿入し、スレーブサーバを復旧後にそのテストデータがレプリケーションされるかどうか確認します。

スレーブサーバが停止している状態でpgpool-II経由でテストデータを挿入します。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql (9.0.2)
Type "help" for help.
  
testdb3=# INSERT INTO t1(comment) VALUES('degeneration test');
INSERT 0 1

testdb3=# SELECT * FROM t1;
 id |      comment      |          ins_time
----+-------------------+----------------------------
  1 | replication test  | 2011-01-07 19:29:36.780824
  2 | from pgpool-II    | 2011-01-07 19:54:30.977498
  3 | degeneration test | 2011-01-07 20:06:04.328413
(3 rows)

スレーブサーバを起動します。

[postgres@host1]$ pg_ctl start -w -D /var/pgsql/pgdata

スレーブサーバでt1テーブルの内容を表示し、3行表示されれば正常です。

[postgres@host1]$ psql -h localhost testdb3
psql (9.0.2)
Type "help" for help.
  
testdb3=# SELECT * FROM t1;
 id |      comment      |          ins_time
----+-------------------+----------------------------
  1 | replication test  | 2011-01-07 19:29:36.780824
  2 | from pgpool-II    | 2011-01-07 19:54:30.977498
  3 | degeneration test | 2011-01-07 20:06:04.328413
(3 rows)

最後に、pgpool-II に切り離されているスレーブサーバを再度追加しておきます。

[root@host9]# pcp_attach_node 100 localhost 9898 ユーザ名 パスワード 1

スレーブサーバのマスタサーバへの昇格

マスタサーバが停止したときにフェイルオーバが行われ、スレーブサーバがマスタサーバに昇格するかどうか確認します。マスタサーバのPostgreSQLを停止します。

[postgres@host0]$ pg_ctl stop -m immediate -D /var/pgsql/pgdata

host0のPostgreSQLを停止後、pgpool-II からアクセスすると接続エラーが発生します。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql: server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.

ここで障害が検知されフェイルオーバーが発生し、再度アクセスすると接続可能になります。SHOW pool_status で確認すると、末尾の項目backend status0が、稼働中を表す 2 でなく、停止中を表す 3 になっていることが確認できます。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql (9.0.2)
Type "help" for help.

testdb3=# SHOW pool_status;
        item       |       value       |        description
-------------------+-------------------+----------------------------
(中略) 
 backend_hostname0 | host0.example.org | backend #0 hostname
 backend_port0     | 5432              | backend #0 port number
 backend_weight0   | 0.500000          | weight of backend #0
 backend status0   | 3                 | status of backend #0
 standby_delay0    | 0                 | standby delay of backend #0

スレーブサーバがマスタサーバに昇格し、更新クエリを受け付けるようになっているかどうか確認するため、テストデータを挿入します。

[postgres@host9]$ psql -h localhost -p 9999 testdb3
psql (9.0.2)
Type "help" for help.

testdb3=# INSERT INTO t1(comment) VALUES('failover test');
INSERT 0 1

testdb3=# SELECT * FROM t1;
 id |      comment      |          ins_time
----+-------------------+----------------------------
  1 | replication test  | 2011-01-07 19:29:36.780824
  2 | from pgpool-II    | 2011-01-07 19:54:30.977498
  3 | degeneration test | 2011-01-07 20:06:04.328413
  4 | failover test     | 2011-01-07 20:18:44.127364
(4 rows)

テストデータが挿入できれば、正常にフェイルオーバが行われています。

おわりに

今回は、pgpool-II + Slony-I クラスタ構成について解説しました。pgpool-II と Slony-I の組み合わせは、pgpool-II のレプリケーションモードと比較すると非同期レプリケーションになってしまうという弱点があるものの、クエリの制限が少なくアプリケーション側に手を入れる必要が少ないことから、既存のシステムのデータベースをレプリケーション構成にする場合によく使われています。既存システムのデータベースを Streaming Replication/Hot Standby に対応した PostgreSQL 9.0 にバージョンアップできない場合は、有用な構成といえます。

(2011年2月22日公開)