环境信息
HBase | Zookeeper | 操作系统 |
---|---|---|
2.5.1 集群版 | 3.4.8 | CentOS 7.9 |
问题现象
在hbase shell里执行list命令没问题,但执行create命令时报错“PleaseHoldException: Master is initializing”:
hbase:001:0> list
TABLE
0 row(s)
Took 0.5356 seconds
=> []
hbase:002:0> create 'table1', 'cf1'
2022-03-15 12:41:58,876 INFO [main] client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(132)) - Call exception, tries=6, retries=8, started=5249 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3168)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:2301)
at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:690)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:387)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
, details=, see https://s.apache.org/timeout
查看HMaster日志,发现警告信息“hbase:meta,,1.1588230740 is NOT online;”:
> tail hbase-root-master-host-1.out -n100
2022-03-15 12:29:24,336 INFO [RegionServerTracker-0] master.RegionServerTracker (RegionServerTracker.java:processAsActiveMaster(179)) - RegionServer ephemeral node created, adding [zookeeper-2,16020,1669350558974]
2022-03-15 12:29:24,337 INFO [RegionServerTracker-0] master.RegionServerTracker (RegionServerTracker.java:processAsActiveMaster(179)) - RegionServer ephemeral node created, adding [zookeeper-3,16020,1669350556241]
2022-03-15 12:29:24,337 INFO [RegionServerTracker-0] master.RegionServerTracker (RegionServerTracker.java:processAsActiveMaster(179)) - RegionServer ephemeral node created, adding [zookeeper-1,16020,1669350559278]
2022-03-15 12:29:25,833 INFO [master/zookeeper-1:16000:becomeActiveMaster] master.ServerManager (ServerManager.java:waitForRegionServers(805)) - Waiting on regionserver count=3; waited=1755ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=1504ms
2022-03-15 12:29:27,339 INFO [master/zookeeper-1:16000:becomeActiveMaster] master.ServerManager (ServerManager.java:waitForRegionServers(805)) - Waiting on regionserver count=3; waited=3261ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=3010ms
2022-03-15 12:29:28,594 INFO [master/zookeeper-1:16000:becomeActiveMaster] master.ServerManager (ServerManager.java:waitForRegionServers(825)) - Finished waiting on RegionServer count=3; waited=4516ms, expected min=1 server(s), max=NO_LIMIT server(s), master is running
2022-03-15 12:29:28,597 WARN [master/zookeeper-1:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1344)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1669350563745, server=zookeeper-1,16020,1667491986634}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2022-03-15 12:29:29,598 WARN [master/zookeeper-1:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1344)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1669350563745, server=zookeeper-1,16020,1667491986634}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2022-03-15 12:29:31,599 WARN [master/zookeeper-1:16000:becomeActiveMaster] master.HMaster (HMaster.java:isRegionOnline(1344)) - hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1669350563745, server=zookeeper-1,16020,1667491986634}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
...
在HBase Web UI管理器里看region server没有异常:
问题解决
(实际上没有得到很好的解决,因为需要把hbase的元数据清除,如果hbase里已经有数据需要考虑其他方法)
网上有些文章说只要删除zookeeper里的/hbase节点即可,但实际测试发现删除并重启hbase后错误依然存在。原因是hdfs里/hbase目录下的数据有错误,重启hbase会从hdfs里恢复到zookeeper。因此需要将hdfs里的/hbase目录也一起清除:
# 停止hbase服务
> stop-hbase.sh
# 进入zookeeper命令行
> zkCli.sh
# 删除/hbase节点
[zk: localhost:2181(CONNECTED) 0] rmr /hbase
# 删除hdfs的/hbase目录
> hdfs dfs -rm -R /hbase
# 启动hbase服务
> start-hbase.sh
# 验证问题解决,建表成功
> hbase shell
hbase:002:0> create 'table1', 'cf1'
2022-03-15 12:59:12,870 INFO [main] client.HBaseAdmin (HBaseAdmin.java:postOperationResult(3591)) - Operation: CREATE, Table Name: default:table1, procId: 9 completed
Created table table1
Took 1.2251 seconds
=> Hbase::Table - table1