Postgresql 如何选择正确的关闭模式
停止数据库的命令:
pg_ctlstop-D$PGDATA[-mshutdown-mode]
shutdown-mode有如下几种模式:
1.smart:等所有的连接中止后,关闭数据库。如果客户端连接不终止,则无法关闭数据库。
开启一个空会话:
[root@localhost~]#su-postgres [postgres@localhost~]$psql psql(9.4.4) Type"help"forhelp. postgres=#
用smart关闭数据库:
[postgres@localhost~]$pg_ctlstop-D$PGDATA-msmart waitingforservertoshutdown...............................................................failed pg_ctl:serverdoesnotshutdown HINT:The"-mfast"optionimmediatelydisconnectssessionsratherthan waitingforsession-initiateddisconnection
2.fast:快速关闭数据库,断开客户端的连接,让已有的事务回滚,然后正常关闭数据库。
[postgres@localhost~]$pg_ctlstop-D$PGDATA-mfast waitingforservertoshutdown....done serverstopped
查看关闭日志:
LOG:receivedfastshutdownrequest LOG:abortinganyactivetransactions FATAL:terminatingconnectionduetoadministratorcommand LOG:shuttingdown LOG:databasesystemisshutdown
会话被强制中断,然后关闭数据库。
起一个事务,然后测试关闭:
postgres=#createtablet(idintprimarykey,namevarchar(9)); CREATETABLE postgres=#begin; BEGIN postgres=#insertintotvalues(1,'a') postgres-#; INSERT01
不提交,然后用FASTMODE去关闭数据库:
[postgres@localhost~]$pg_ctlstop-D$PGDATA-mfast waitingforservertoshutdown....done serverstopped
查看日志:
LOG:receivedfastshutdownrequest LOG:abortinganyactivetransactions LOG:autovacuumlaunchershuttingdown FATAL:terminatingconnectionduetoadministratorcommand LOG:shuttingdown LOG:databasesystemisshutdown
同样是直接中断会话,而不去管事务有没有提交。
postgres=#select*fromt; id|name ----+------ (0rows)
没有提交的数据,在重启之后并不能查到。
3.immediate:立即关闭数据库,立即停止数据库进程,直接退出,下次启动时会进行实例恢复。
postgres=#insertintotvalues(1,'a') ; INSERT01 postgres=#select*fromt; id|name ----+------ 1|a (1row)
关闭数据库:
[postgres@localhost~]$pg_ctlstop-D$PGDATA-mimmediate waitingforservertoshutdown....done serverstopped
查看日志:
LOG:receivedimmediateshutdownrequest WARNING:terminatingconnectionbecauseofcrashofanotherserverprocess DETAIL:Thepostmasterhascommandedthisserverprocesstorollbackthecurrenttransactionandexit,becauseanotherserverprocessexitedabnormallyandpossiblycorruptedsharedmemory. HINT:Inamomentyoushouldbeabletoreconnecttothedatabaseandrepeatyourcommand. WARNING:terminatingconnectionbecauseofcrashofanotherserverprocess DETAIL:Thepostmasterhascommandedthisserverprocesstorollbackthecurrenttransactionandexit,becauseanotherserverprocessexitedabnormallyandpossiblycorruptedsharedmemory. HINT:Inamomentyoushouldbeabletoreconnecttothedatabaseandrepeatyourcommand.
启动数据库:
[postgres@localhost~]$pg_ctl-D/apps/pgsql/pgdata-l1.logstart serverstarting
查看日志:
LOG:databasesystemwasinterrupted;lastknownupat2017-04-2718:56:47PDT LOG:databasesystemwasnotproperlyshutdown;automaticrecoveryinprogress#提示非正常关机,自动开启恢复。 LOG:redostartsat0/181F910 LOG:recordwithzerolengthat0/181FA90 LOG:redodoneat0/181FA60 LOG:lastcompletedtransactionwasatlogtime2017-04-2718:59:13.727213-07 LOG:MultiXactmemberwraparoundprotectionsarenowenabled LOG:autovacuumlauncherstarted LOG:databasesystemisreadytoacceptconnections
查看数据:
[postgres@localhost~]$psql psql(9.4.4) Type"help"forhelp. postgres=#select*fromt; id|name ----+------ 1|a (1row)
提交的数据已通过实例恢复。
小结:
对比以上三种关库模式:
smart最为安全,但最慢,需要将所有连接都断开后,才会关库,默认关库模式。
fast强制中断会话,而不管有操作有没有提交,在做系统维护(系统维护时一般应用都正常关闭了,或者不再会有事务操作。)时,需要这种模式来关闭数据库。
immediate最暴力的方式,不管数据有没有落盘(POSGRE是遵循WAL机制),就直接关掉,待启动时进行实例恢复,如果在关库前有大量的事务没有写入磁盘,那这个恢复过程可能会非常的漫长。
补充:postgresql异步streamreplication环境关闭master的验证
os:ubuntu16.04
db:postgresql9.6.8
验证在异步streamreplication环境下,主动关闭master时,数据是否有丢失,能丢失多少。
版本
#lsb_release-a NoLSBmodulesareavailable. DistributorID:Ubuntu Description:Ubuntu16.04.5LTS Release:16.04 Codename:xenial #su-postgres-c"psql-c\"selectversion();\"" version ---------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL9.6.8onx86_64-pc-linux-gnu(Ubuntu9.6.8-1.pgdg16.04+1),compiledbygcc(Ubuntu5.4.0-6ubuntu1~16.04.9)5.4.020160609,64-bit (1row)
用pgbench模拟数据库的大量数据操作
postgres=#createdatabasepgbenchdb; #su-postgres $pgbench-i-s20pgbenchdb $pgbench-r-j2-c4-T300pgbenchdb
关闭master
#su-postgres $/usr/lib/postgresql/9.6/bin/pg_ctlstop-mfast-D"/data/pg9.6/main"
提升slave
#su-postgres $/usr/lib/postgresql/9.6/bin/pg_ctlpromote-D"/data/pg9.6/main"
查看oldmaster的xloglocation
$/usr/lib/postgresql/9.6/bin/pg_xlogdump000000010000000000000016 rmgr:XLOGlen(rec/tot):106/106,tx:0,lsn:0/16000028,prev0/152C9A10,desc:CHECKPOINT_SHUTDOWNredo0/16000028;tli1;prevtli1;fpwtrue;xid0:118746;oid16432;multi1;offset0;oldestxid543inDB1;oldestmulti1inDB1;oldest/newestcommittimestampxid:550/118745;oldestrunningxid0;shutdown pg_xlogdump:FATAL:errorinWALrecordat0/16000028:invalidrecordlengthat0/16000098:wanted24,got0
可以看到lsn:0/16000028,prev0/152C9A10,desc:CHECKPOINT_SHUTDOWNredo0/16000028;
查看newmaster的.history文件
$ls-lt|more total360456 -rw-------1postgrespostgres16777216Nov3010:32000000020000000000000016 drwx------2postgrespostgres4096Nov3010:16archive_status -rw-------1postgrespostgres42Nov3010:1600000002.history -rw-------1postgrespostgres16777216Nov3010:16000000010000000000000016.partial -rw-------1postgrespostgres16777216Nov3010:16000000010000000000000015 -rw-------1postgrespostgres16777216Nov3010:16000000010000000000000014 -rw-------1postgrespostgres16777216Nov3010:05000000010000000000000013 $cat00000002.history 10/16000098norecoverytargetspecified $/usr/lib/postgresql/9.6/bin/pg_xlogdump000000010000000000000016 rmgr:XLOGlen(rec/tot):106/106,tx:0,lsn:0/16000028,prev0/152C9A10,desc:CHECKPOINT_SHUTDOWNredo0/16000028;tli1;prevtli1;fpwtrue;xid0:118746;oid16432;multi1;offset0;oldestxid543inDB1;oldestmulti1inDB1;oldest/newestcommittimestampxid:550/118745;oldestrunningxid0;shutdown rmgr:XLOGlen(rec/tot):42/42,tx:0,lsn:0/16000098,prev0/16000028,desc:END_OF_RECOVERYtli2;prevtli1;time2018-11-3010:16:57.249408CST rmgr:Standbylen(rec/tot):50/50,tx:0,lsn:0/160000C8,prev0/16000098,desc:RUNNING_XACTSnextXid118746latestCompletedXid118745oldestRunningXid118746 rmgr:XLOGlen(rec/tot):51/312,tx:0,lsn:0/16000100,prev0/160000C8,desc:FPI_FOR_HINT,blkref#0:rel1664/0/1260blk0FPW rmgr:Standbylen(rec/tot):50/50,tx:0,lsn:0/16000238,prev0/16000100,desc:RUNNING_XACTSnextXid118746latestCompletedXid118745oldestRunningXid118746 pg_xlogdump:FATAL:errorinWALrecordat0/16000238:invalidrecordlengthat0/16000270:wanted24,got0
可以看到关键记录
lsn:0/16000028,prev0/152C9A10,desc:CHECKPOINT_SHUTDOWNredo0/16000028 lsn:0/16000098,prev0/16000028,desc:END_OF_RECOVERY
而END_OF_RECOVERY对应的lsn为0/16000098,和00000002.history时间线文件的内容完全一致。
所以在异步streamreplication环境下,主动关闭master时,会将最后一条记录(CHECKPOINT_SHUTDOWN)发送给slave,不会造成数据的丢失。
而synchronous_commit=on保证事务有两份持久化的落盘数据。
分析pg_log日志
oldmaster上的最后几条日志
2018-11-3010:16:40.986CST,"postgres","pgbenchdb",7559,"[local]",5c009d79.1d87,4,"UPDATEwaiting",2018-11-3010:16:25CST,,0,LOG,00000,"disconnection:sessiontime:0:00:15.723user=postgresdatabase=pgbenchdbhost=[local]",,,,,,,,,"pgbench" 2018-11-3010:16:40.993CST,"postgres","pgbenchdb",7558,"[local]",5c009d79.1d86,3,"idle",2018-11-3010:16:25CST,4/0,0,FATAL,57P01,"terminatingconnectionduetoadministratorcommand",,,,,,,,,"pgbench" 2018-11-3010:16:40.994CST,"postgres","pgbenchdb",7560,"[local]",5c009d79.1d88,3,"idle",2018-11-3010:16:25CST,5/0,0,FATAL,57P01,"terminatingconnectionduetoadministratorcommand",,,,,,,,,"pgbench" 2018-11-3010:16:40.994CST,"postgres","pgbenchdb",7558,"[local]",5c009d79.1d86,4,"idle",2018-11-3010:16:25CST,,0,LOG,00000,"disconnection:sessiontime:0:00:15.729user=postgresdatabase=pgbenchdbhost=[local]",,,,,,,,,"pgbench" 2018-11-3010:16:40.994CST,"postgres","pgbenchdb",7560,"[local]",5c009d79.1d88,4,"idle",2018-11-3010:16:25CST,,0,LOG,00000,"disconnection:sessiontime:0:00:15.725user=postgresdatabase=pgbenchdbhost=[local]",,,,,,,,,"pgbench" 2018-11-3010:16:40.999CST,"postgres","pgbenchdb",7561,"[local]",5c009d79.1d89,3,"idle",2018-11-3010:16:25CST,6/0,0,FATAL,57P01,"terminatingconnectionduetoadministratorcommand",,,,,,,,,"pgbench" 2018-11-3010:16:41.001CST,"postgres","pgbenchdb",7561,"[local]",5c009d79.1d89,4,"idle",2018-11-3010:16:25CST,,0,LOG,00000,"disconnection:sessiontime:0:00:15.731user=postgresdatabase=pgbenchdbhost=[local]",,,,,,,,,"pgbench" 2018-11-3010:16:41.010CST,,,7156,,5c009735.1bf4,7,,2018-11-3009:49:41CST,,0,LOG,00000,"shuttingdown",,,,,,,,,"" 2018-11-3010:16:41.209CST,,,7156,,5c009735.1bf4,8,,2018-11-3009:49:41CST,,0,LOG,00000,"checkpointstarting:shutdownimmediate",,,,,,,,,"" 2018-11-3010:16:47.623CST,,,7156,,5c009735.1bf4,9,,2018-11-3009:49:41CST,,0,LOG,00000,"checkpointcomplete:wrote29357buffers(89.6%);0transactionlogfile(s)added,0removed,0recycled;write=5.931s,sync=0.399s,total=6.418s;syncfiles=53,longest=0.179s,average=0.007s;distance=311294kB,estimate=311294kB",,,,,,,,,"" 2018-11-3010:16:47.683CST,"repl","",7227,"192.168.56.90:52556",5c009795.1c3b,3,"streaming0/16000098",2018-11-3009:51:17CST,,0,LOG,00000,"disconnection:sessiontime:0:25:30.149user=repldatabase=host=192.168.56.90port=52556",,,,,,,,,"walreceiver" 2018-11-3010:16:47.730CST,,,7153,,5c009735.1bf1,5,,2018-11-3009:49:41CST,,0,LOG,00000,"databasesystemisshutdown",,,,,,,,,""
注意倒数第二条信息streaming0/16000098,说明当时的master关闭时,已经和salve沟通过,确认已经接收到END_OF_RECOVERY之前所有的数据了。
oldslave日志
2018-11-3010:16:47.660CST,,,7256,,5c009795.1c58,2,,2018-11-3009:51:17CST,,0,LOG,00000,"replicationterminatedbyprimaryserver","EndofWALreachedontimeline1at0/16000098.",,,,,,,,"" 2018-11-3010:16:47.660CST,,,7256,,5c009795.1c58,3,,2018-11-3009:51:17CST,,0,FATAL,XX000,"couldnotsendend-of-streamingmessagetoprimary:noCOPYinprogress ",,,,,,,,,"" 2018-11-3010:16:47.660CST,,,7255,,5c009795.1c57,5,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"invalidrecordlengthat0/16000098:wanted24,got0",,,,,,,,,"" 2018-11-3010:16:47.854CST,,,7443,,5c009d8f.1d13,1,,2018-11-3010:16:47CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:couldnotconnecttoserver:Connectionrefused Istheserverrunningonhost""192.168.56.119""andaccepting TCP/IPconnectionsonport5432? ",,,,,,,,,"" 2018-11-3010:16:52.668CST,,,7444,,5c009d94.1d14,1,,2018-11-3010:16:52CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:couldnotconnecttoserver:Connectionrefused Istheserverrunningonhost""192.168.56.119""andaccepting TCP/IPconnectionsonport5432? ",,,,,,,,,"" 2018-11-3010:16:56.875CST,,,7255,,5c009795.1c57,6,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"receivedpromoterequest",,,,,,,,,"" 2018-11-3010:16:56.875CST,,,7255,,5c009795.1c57,7,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"redodoneat0/16000028",,,,,,,,,"" 2018-11-3010:16:56.875CST,,,7255,,5c009795.1c57,8,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"lastcompletedtransactionwasatlogtime2018-11-3010:16:40.986869+08",,,,,,,,,"" 2018-11-3010:16:56.888CST,,,7255,,5c009795.1c57,9,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"selectednewtimelineID:2",,,,,,,,,"" 2018-11-3010:16:57.166CST,,,7255,,5c009795.1c57,10,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"archiverecoverycomplete",,,,,,,,,"" 2018-11-3010:16:57.267CST,,,7255,,5c009795.1c57,11,,2018-11-3009:51:17CST,1/0,0,LOG,00000,"MultiXactmemberwraparoundprotectionsarenowenabled",,,,,,,,,"" 2018-11-3010:16:57.267CST,,,7257,,5c009795.1c59,1,,2018-11-3009:51:17CST,,0,LOG,00000,"checkpointstarting:force",,,,,,,,,"" 2018-11-3010:16:57.275CST,,,7253,,5c009795.1c55,3,,2018-11-3009:51:17CST,,0,LOG,00000,"databasesystemisreadytoacceptconnections",,,,,,,,,"" 2018-11-3010:16:57.276CST,,,7447,,5c009d99.1d17,1,,2018-11-3010:16:57CST,,0,LOG,00000,"autovacuumlauncherstarted",,,,,,,,,""
信息也是相当的清晰。
wal_retrieve_retry_interval=5s控制salve到master失败时,再次重试的等待时间。
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。