pgpool 之二 stream replication 模式 + slave down up

news/2024/6/17 5:48:22

os: centos7.4
postgresql:9.6.8
pgpool:3.7.3

采用 streaming replication mode 模式,这是比较通用的方案。
The streaming replication mode can be used with PostgreSQL servers operating streaming replication. In this mode, PostgreSQL is responsible for synchronizing databases. This mode is widely used and most recommended way to use Pgpool-II. Load balancing is possible in the mode. The sample configuration file is $prefix/etc/pgpool.conf.sample-stream.

这篇文章主要是记录 pool 管理的 postgresql slave 实例宕掉后的恢复过程.slave 所在的操作系统宕掉后,需要首先恢复操作系统.

ip 规划

pgpool    192.168.56.100

pgsql1    192.168.56.101
pgsql2    192.168.56.102

online recovery

Streaming replication mode 模式
参考<<pgpool 之十一 参数文件 pgpool.conf>>

online recovery 的步骤如下(参考:http://www.pgpool.net/docs/37/en/html/runtime-online-recovery.html)

Online recovery is performed in two phases. The first phase is called “first stage” and the second phase is called “second stage”. You need to provide scripts for each stage. Only replication_mode requires the second stage. For other modes including streaming replication mode the second stage is not performed and you don’t need to provide a script for the stage in recovery_2nd_stage_command. i.e. you can safely leave it as an empty string.

Connections from cliens are not allowd only in the second stage while the data can be updated or retrieved during the first statge.

Pgpool-II performs the follows steps in online recovery:

  1. CHECKPOINT.

  2. Execute first stage of online recovery.

  3. Wait until all client connections have disconnected (only in replication_mode).

  4. CHECKPOINT (only in replication_mode). specified).

  5. Execute second stage of online recovery (only in replication_mode).

  6. Start up postmaster (perform pgpool_remote_start)

  7. Node attach

一定要配置 online recovery 参数,否则后面使用 pcp_recovery_node 会报如下错误

$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U postgres -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=95
ERROR:  node recovery failed, unable to connect to master node: 0 

DEBUG: send: tos="X", len=4

pgpool的使用

pgsql1(master) 上创建用户和数据库

postgres=# create user peiyb with password 'peiybpeiyb';
CREATE ROLE
postgres=# create database peiybdb owner = peiyb;
CREATE DATABASE

连接pgpool

$ pg_md5 -h
$ pg_md5 -m -p -u peiyb
$ cat pool_passwd
peiyb:md5bd0875843854575a4b7328813ea498cb

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
Password for user peiyb:
psql (9.6.9)
Type "help" for help.

peiybdb=>
peiybdb=> show pool_version;
     pool_version     
----------------------
 3.7.3 (amefuriboshi)
(1 row)

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | false             | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | true              | 0
(2 rows)

关闭 pgsql2 的 slave

pgsql2的slave关闭,停1分钟,再启动

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 8          | true              | 0
 1       | pgsql2   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

此时,登录pgpool能看到 pgsql2,但是 show pool_nodes 的status 依然为 down
把pgsql1、pgsql2节点的 log_connectionslog_disconnections 都设置为 on后,从日志的数据来看,pgsql2 的slave关闭再启动后,pgpool 并没有去处理。
此时需要执行 pcp_recovery_node 把备库节点的状态变为正常。

$ which pcp_recovery_node
/usr/pgpool/pgpool3.7.3/bin/pcp_recovery_node

$ pcp_recovery_node --help
pcp_recovery_node - recover a node
Usage:
pcp_recovery_node [OPTION...] [node-id]
Options:
  -U, --username=NAME    username for PCP authentication
  -h, --host=HOSTNAME    pgpool-II host
  -p, --port=PORT        PCP port number
  -w, --no-password      never prompt for password
  -W, --password         force password prompt (should happen automatically)
  -n, --node-id=NODEID   ID of a backend node
  -d, --debug            enable debug message (optional)
  -v, --verbose          output verbose messages
  -?, --help             print this help
  

在 slave 节点执行 pcp_recovery_node 修复

$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=117
ERROR:  executing remote start failed with error: "ERROR:  pgpool_remote_start failed"
DEBUG: send: tos="X", len=4

分析pgsql1的日志后发现

sh: /var/lib/pgsql/9.6/data/pgpool_remote_start: No such file or directory
< 2018-05-15 14:00:29.546 CST > ERROR:  pgpool_remote_start failed
< 2018-05-15 14:00:29.546 CST > STATEMENT:  SELECT pgpool_remote_start('pgsql2', '/var/lib/pgsql/9.6/data');

说明在/var/lib/pgsql/9.6/data/下缺少 pgpool_remote_start 这个sh脚本文件
参考 pgpool 节点的模板文件

# cat /tmp/pgpool-II-3.7.3/src/sample/pgpool_remote_start

pgpool_remote_start 脚本

pgsql1、pgsql2上创建 pgpool_remote_start 脚本文件
参考<<pgpool 之七 脚本 pgpool_remote_start>>

再次执行 pcp_recovery_node

[postgres@pgpool etc]$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20
pcp_recovery_node -- Command Successful
DEBUG: send: tos="X", len=4

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

可以看到两个节点都是up状态。

pgsql slave 手动处理

如果手动处理好 pgslq slave,查询 pg_stat_replication 的 state = streaming 状态,这时就不需要执行 pcp_recovery_node,仅需要执行 pcp_attach_node

$ pcp_attach_node  -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
pcp_attach_node -- Command Successful

status = waiting 的处理方式

show pool_nodes 时 status 可能显示 waiting,如下:

postgres=# show pool_nodes;
 node_id | hostname | port | status  | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+---------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up      | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | waiting | 0.500000  | standby | 0          | false             | 0
(2 rows)

这时需要重新登录一下 pgpool 即可

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

参考:
http://www.pgpool.net/docs/37/en/html/runtime-online-recovery.html


http://www.niftyadmin.cn/n/2124318.html

相关文章

Linux上基于网页管理mysql安装过程中的错误点

在安装的过程中&#xff0c;Documentation.html&#xff0c;容易写成Documentstation.html导致访问失败&#xff0c;在访问index.php是不能访问&#xff0c;出现乱码&#xff0c;在.../conf/http.conf中添加 AddType application/x-httpd-php .php .phtml AddType application/…

pgpool 之三 pgpool的几种模式

pgpool:3.7.3 文档 http://www.pgpool.net/docs/latest/en/html/configuring-pgpool.html 4.3.2. Running mode of Pgpool-II There are four different running modes in Pgpool-II: streaming replication mode, logical replication mode, master slave mode (slony mode…

linux上安装oracle11.2.0

软件环境 RedHat5.3&#xff0c;linux_11gR2_database_1of2.zip&#xff0c;linux_11gR2_database_2of2.zip oracle11.2.0的安装界面不同于oracle10g界面。 安装的前提是安装必须的软件包和内核参数&#xff1a; 加载光驱&#xff1a; 跟oracle10g的方式是一样的。这里就不说…

redis cluster 4.0.9 之一: make make install

os: centos7.4 redis:4.0.9 ruby:2.3.4 ip 规划 redis1 192.168.56.101 redis2 192.168.56.102用两台虚拟机模拟6个redis节点。 分别为 192.168.56.101 的 9001、9002、9003 端口&#xff0c; 192.168.56.102 的 9001、9002、9003 端口。 os安装 参考<<centos …

毕业感想

时间总是在我们不知不觉得时候从我们的身边溜走&#xff0c;让我们猝不及防。还记得自己还在小学打架&#xff0c;在初中玩耍&#xff0c;在高中奋斗着呢。转眼间培训结束的时间也到啦。 回想自己在培训的这一年中&#xff0c;感慨很多。还记得哪天为了来学校冒着大雪做了不到两…

oracle ebs r12 打补丁的步骤

os: linux 5.8 ebs:12.1.3 patch:14332598 这里以 p14332598 为例&#xff0c;列出打补丁的过程。 查看补丁 SELECT * FROM ad_applied_patches where 11and patch_name in (14332598) ; SELECT *FROM ad_bugs where 11and bug_number in (14332598) ; 下载patch h…

爱在时就努力去爱

爱在时就努力去爱 很多人都说过&#xff0c;爱情是一种化学反应&#xff0c;是一时的激情与冲动&#xff0c;时间久了&#xff0c;爱情就淡了&#xff0c;淡了之后或者分手&#xff0c;或者步入婚姻&#xff0c;于是爱情慢慢转变成一种亲情和一份责任了。于是爱情的魅力与风采也…

redis cluster 4.0.9 之二: failover

os: centos7.4 redis:4.0.9 ruby:2.3.4 ip 规划 redis1 192.168.56.101 redis2 192.168.56.102用两台虚拟机模拟6个redis节点。 分别为 192.168.56.101 的 9001、9002、9003 端口&#xff0c; 192.168.56.102 的 9001、9002、9003 端口。 连接redis # redis-cli -h 1…