I have been writing about 12.2 sharding database for a long time, and today I started the environment again, started the host, and then started the Listener and Shardcat databases and the Shard node database instance. When checking the status of Shard, the error is found:
(a) Question one: status display warnings:
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok deployed Region1 ONLINE
SH2 primary_shardgroup warnings deployed Region1-
Gdsctl>
In this case, we can use the Recover Shard command. It is wrong to pay attention to the help recover shard-h.
Gdsctl>recover shard-h
Syntax
RECOVER shard-gdspool pool-database db_name [-skip_first|-ignore_first]
[-full]
Purpose
Executes all DDL statements in specified database starting from the one, which
is previusly executed with errors. The command is intended to perform all
Skipped DDL changes after database administrator fixes shard issues.
Usage Notes
Use Skip_first to SKIP the. This is typically required after manual fix
done by the database administrator. For example, if CREATE TABLE statement fails
Because of lack of space, database administrator fixes the issue and re-executes
CREATE TABLE. To avoid ORA-39151 (table exists) in RECOVER SHARD him has to
Specify-skip_first.
Use the Ignore_first to mark the as obsolete. This are required when wrong DDL
Statement is specified and failed on all shards. Thus you need to mark it down
As obsolete.
Keywords and Parameters
Full:full recovery mode.
Gdspool:the GDS Pool (If not specified and there are only one gdspool with access
Granted to user, it'll be used by default
Ignore_first:make the failed DDL statement obsolete.
Shard:the name of the Shard.
Skip_first:skip the failed DDL statement
Examples
RECOVER Shard-database SHD1
Gdsctl>
The correct usage should be recover shard-shard.
We ran this command and found that the GSM did not start:
Gdsctl>recover Shard-shard SH2
GSM-45076:GSM is not running
Gdsctl>
Gdsctl>start GSM
GSM is started successfully
Gdsctl>
After starting, wait a moment (about 10 seconds), the check status will be restored to normal (here, we have not really used the recover shard, in the following question only normal use):
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok deployed Region1 ONLINE
SH2 primary_shardgroup warnings deployed Region1-
Gdsctl>
gdsctl>--wait about 10 seconds.
Gdsctl>
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok deployed Region1 ONLINE
SH2 primary_shardgroup Ok deployed Region1 ONLINE
Gdsctl>
So, after rebooting, there are database,listener, and GSM that need to be started.
(b) Question two: State shows DDL error
Then, in the following experiment, I drop off a duplicate table on the Shardcat and find that Shard state is not normal at this time. The state of Config Shard shows DDL error.
First, operate on the Shardcat, drop off the duplicate table
Sql> Conn app_schema/oracle
Connected.
sql>
sql> drop table products;
drop table Products
*
ERROR in line 1:
Ora-02557:cannot operate On sharded objects if Shard DDL is disabled
sql> ;
sql>
sql>
sql> alter session enable Shard DDL;
Session altered.
Sql> Select COUNT (*) from the products;
COUNT (*)
----------
&nbs p; 1000
sql>
sql> drop table products;
table dropped.
Sql>
finds that the data is not synchronized at this time and can still be queried on shard node:
Sql>!hostname
Sdb2
Sql>
Sql> Conn App_schema/oracle
Connected.
Sql>
Sql> Select COUNT (*) from the products;
COUNT (*)
----------
1000
Sql>
Sql>!hostname
Sdb3
Sql>
Sql> Conn App_schema/oracle
Connected.
Sql>
Sql> Select COUNT (*) from the products;
COUNT (*)
----------
1000
Sql>
At this point, check the Shard state to become DDL error:
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok DDL error Region1 ONLINE
SH2 primary_shardgroup Ok DDL error Region1 ONLINE
Gdsctl>
We can see some information in the GSM Alertlog:
(Note: GSM log location in/u01/ora12c/app/oracle/diag/gsm/sdb1/sharddirector1/trace/alert_shardcat.log)
......
07-nov-2016 01:35:19 * (Connect_data= (Service_name=gds$catalog.oradbcloud) (Cid= (program=oracle) (HOST=SDB3) (USER= ORACLE12C))) * (Address= (PROTOCOL=TCP) (host=192.168.56.23) (port=34247)) * Establish * gds$catalog.oradbcloud * 0
2016-11-07t01:35:22.021482+08:00
07-nov-2016 01:35:22 * service_update * shardcat%11 * 0
07-nov-2016 01:35:22 * service_update * shardcat%1 * 0
2016-11-07t01:35:35.543868+08:00
Gsm-40135:catalog Request: "DDL request" (80). Target-"33". pool---"". Id= "72".
Payload: "A"
Gsm-40135:catalog Request: "DDL request" (80). Target-"34". pool---"". Id= "73".
Payload: "E"
2016-11-07t01:35:36.652686+08:00
Gsm-40148:database task failed:database: "SH1", Status 4, message: "Ora-01031:insufficient privileges
Ora-06512:at "SYS. Execasuser ", line 35
Ora-06512:at "SYS. Dbms_sys_sql ", line 1434
Ora-06512:at "SYS. Execasuser ", line 23
Ora-06512:at "Gsmadmin_internal. Executeddl ", line 68
Ora-06512:at Line 1 \ (ngsmoci_execute\)
"
Gsm-40148:database task failed:database: "Sh2", Status 4, message: "Ora-01031:insufficient privileges
Ora-06512:at "SYS. Execasuser ", line 35
Ora-06512:at "SYS. Dbms_sys_sql ", line 1434
Ora-06512:at "SYS. Execasuser ", line 23
Ora-06512:at "Gsmadmin_internal. Executeddl ", line 68
Ora-06512:at Line 1 \ (ngsmoci_execute\)
"
2016-11-07t01:35:36.700593+08:00
GSM-40157:GSM response id=72 type=80. Status: "Success with Info".
2016-11-07t01:35:37.010475+08:00
Gsm-40135:catalog Request: "RUNTIME DATABASE" (40). Target-"SH2". pool---"". Id= "74".
Payload: "(parameters= (ddlid=33) (autorcv=0))"
Gsm-40135:catalog Request: "RUNTIME DATABASE" (40). Target-"SH1". pool---"". Id= "75".
Payload: "(parameters= (ddlid=33) (autorcv=0))"
GSM-40157:GSM response id=73 type=80. Status: "Success".
2016-11-07t01:35:41.834561+08:00
07-nov-2016 01:35:41 * service_update * SHARDDIRECTOR1 * 0
......
In fact, from this log, we just see that there are ORA-01031 insufficient privileges error, but what is the lack of authority, we still have to go through Config Shard command to further look:
Gdsctl>config Shard-shard SH1
Name:sh1
Shard Group:primary_shardgroup
Status:ok
State:deployed
Region:region1
Connection string:sdb2:1521/sh1:dedicated
SCAN Address:
ONS Remote port:0
Disk Threshold, ms:20
CPU Threshold,%: 75
version:12.2.0.0
Last Failed ddl:alter database link "productsd ...
DDL error:ora-01031:insufficient Privileges
Ora-06512:at "SYS. Execasuser ", line 35
Ora-06512:at "SYS. Dbms_sys_sql ", line 1434
Ora-06512:at "SYS. Execasuser ", line 23
Ora-06512:at "Gsmadmin_internal. Executeddl ", line 68
Ora-06512:at Line 1 \ (ngsmoci_execute\)
Failed DDL id:33
Availability:online
Supported Services
------------------------
Name Preferred Status
---- --------- ------
OLTP_RW_SRVC Yes Enabled
Gdsctl>
We see this last failed DDL, when ALTER DATABASE link appears with insufficient permissions.
So we found the reason for the fix:
--On the SH1:
[Oracle12c@sdb2 ~]$ Sqlplus "/As SYSDBA"
Sql*plus:release 12.2.0.0.2 Beta on Mon Nov 7 01:50:25 2016
Copyright (c) 1982, 2015, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition release 12.2.0.0.2-64bit Beta
With the partitioning, OLAP, Advanced Analytics and real application testing options
sql> grant ALTER DATABASE link to App_schema;
Grant succeeded.
Sql>
--On the SH2
[Oracle12c@sdb3 ~]$ Sqlplus "/As SYSDBA"
Sql*plus:release 12.2.0.0.2 Beta on Mon Nov 7 01:51:13 2016
Copyright (c) 1982, 2015, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition release 12.2.0.0.2-64bit Beta
With the partitioning, OLAP, Advanced Analytics and real application testing options
sql> grant ALTER DATABASE link to App_schema;
Grant succeeded.
Sql>
--On the Shardcat, gsm_env:
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok DDL error Region1 ONLINE
SH2 primary_shardgroup Ok DDL error Region1 ONLINE
Gdsctl>
Gdsctl>recover Shard-shard SH1
Gdsctl>
Gdsctl>recover Shard-shard SH2
Gdsctl>
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok deployed Region1 ONLINE
SH2 primary_shardgroup Ok deployed Region1 ONLINE
Gdsctl>
--On the SH1:
[Oracle12c@sdb2 ~]$ Sqlplus "/As SYSDBA"
Sql*plus:release 12.2.0.0.2 Beta on Mon Nov 7 01:50:25 2016
Copyright (c) 1982, 2015, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition release 12.2.0.0.2-64bit Beta
With the partitioning, OLAP, Advanced Analytics and real application testing options
sql> grant ALTER DATABASE link to App_schema;
Grant succeeded.
Sql>
--On the SH2
[Oracle12c@sdb3 ~]$ Sqlplus "/As SYSDBA"
Sql*plus:release 12.2.0.0.2 Beta on Mon Nov 7 01:51:13 2016
Copyright (c) 1982, 2015, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition release 12.2.0.0.2-64bit Beta
With the partitioning, OLAP, Advanced Analytics and real application testing options
sql> grant ALTER DATABASE link to App_schema;
Grant succeeded.
Sql>
--On the Shardcat, gsm_env:
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok DDL error Region1 ONLINE
SH2 primary_shardgroup Ok DDL error Region1 ONLINE
Gdsctl>
Gdsctl>recover Shard-shard SH1
Gdsctl>
Gdsctl>recover Shard-shard SH2
Gdsctl>
Gdsctl>config Shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
SH1 primary_shardgroup Ok deployed Region1 ONLINE
SH2 primary_shardgroup Ok deployed Region1 ONLINE
Gdsctl>
You can see that, on the basis of fixing the problem, after running recover Shard-shard, you are back to normal.
Note that, if the root cause did not find, do not know is the database link permissions problem, only run recover shard is ineffective.