Problems with online LUN migration through ASM
After RAW is configured on both nodes, make sure that the newly configured raw is displayed on both sides of ll/dev/raw,
Add a new LUN and delete the old RAW from node 1 using the command line. An error is reported after the command is executed:
SQL> Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5' drop disk '/dev/raw/raw1' rebalance power 4 nowait;
Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5' drop disk '/dev/raw/raw1' rebalance power 4 nowait
*
ERROR at line 1:
ORA-15032: not all alterations saved med
ORA-15054: disk "/DEV/RAW/RAW1" does not exist in diskgroup "ZXDG"
SQL>
Check in v $ asm_disk on both sides and find that the new LUN can be seen on node 1, but the new LUN cannot be seen on node 2.
SQL> select disk_number from v $ asm_disk;
DISK_NUMBER
-----------
0
1
0
So I added it again on node 2:
SQL>
SQL>
SQL> Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5 ';
Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5'
*
ERROR at line 1:
ORA-15032: not all alterations saved med
ORA-15031: disk specification '/dev/raw/raw5' matches no disks
ORA-15025: cocould not open disk '/dev/raw/raw5'
ORA-27041: unable to open file
Linux-x86_64 Error: 6: No such device or address
Additional information: 42
Additional information: 255
Additional information:-750856672
ORA-15031: disk specification '/dev/raw/raw4' matches no disks
ORA-15025: cocould not open disk '/dev/raw/raw4'
ORA-27041: unable to open file
Linux-x86_64 Error: 6: No such device or address
Additional information: 42
Additional information: 255
Additional information:-750856672
The above error is generally caused by raw owner and permissions, or node 2 may identify raw problems (ll/dev/raw results are highly deceptive ),
Check whether the raw owner and permissions are correct. If node 2 fails to correctly identify the lun, run partprobe on node 2 and add disk again:
SQL>
SQL>
SQL> Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5'
2;
Alter diskgroup ZXDG add disk '/dev/raw/raw4','/dev/raw/raw5'
*
ERROR at line 1:
ORA-15032: not all alterations saved med
ORA-15033: disk '/dev/raw/raw4' belongs to diskgroup "ZXDG"
ORA-15033: disk '/dev/raw/raw5' belongs to diskgroup "ZXDG"
This returns the ORA-15032 with the ORA-15033 and shows that raw4/raw5 is already part of the DG,
However, check v $ asm_disk again and find that raw4 and raw5 are abnormal. group_number is 0, mount_status is closed, and no name is assigned:
Group_number disk_number mount_status header_status mode_status state path name
1 0 cached member online dropping/dev/raw/raw1 ZXDG_0000
0 1 closed member online normal/dev/raw/raw4
0 2 closed member online normal/dev/raw/raw5
Raw1 is already in the dropping status. Because raw4/raw5 is not properly added to dg, data cannot be rebalance. In this case, raw1 cannot be dropped from dg, so the status is always dropping.
Just in case, undrop:
SQL> Alter diskgroup ZXDG undrop disks;
Diskgroup altered.
Because asm has assigned a name for raw4/raw5 during the add raw operation, it cannot be deleted from the dg normally:
SQL> Alter diskgroup ZXDG drop disk '/dev/raw/raw4 ';
Alter diskgroup ZXDG drop disk '/dev/raw/raw4'
*
ERROR at line 1:
ORA-15032: not all alterations saved med
ORA-15054: disk & quot;/DEV/RAW/RAW4 & quot; does not exist in diskgroup & quot; ZXDG & quot"
When reading 3 raw metadata, we are glad to find that there is no confusion in metadata, as shown below:
[Oracle @ zxdb01 ~] $ Kfed read/dev/raw/raw1>/tmp/raw1
[Oracle @ zxdb01 ~] $ Kfed read/dev/raw/raw4>/tmp/raw4
[Oracle @ zxdb01 ~] $ Kfed read/dev/raw/raw5>/tmp/raw5
[Oracle @ zxdb01 ~] $ Diff/tmp/raw4/tmp/raw5
6c6
<Kfbh. block. obj: 2147483649; 0x008: TYPE = 0x8 NUMB = 0x1
---
& Gt; kfbh. block. obj: 2147483650; 0x008: TYPE = 0x8 NUMB = 0x2
20c20
<Kfdhdb. dsknum: 1; 0x024: 0x0001
---
> Kfdhdb. dsknum: 2; 0x024: 0x0002
23c23
<Kfdhdb. dskname: ZXDG_0001; 0x028: length = 10
---
> Kfdhdb. dskname: ZXDG_0002; 0x028: length = 10
25c25
<Kfdhdb. fgname: ZXDG_0001; 0x068: length = 10
---
> Kfdhdb. fgname: ZXDG_0002; 0x068: length = 10
[Oracle @ zxdb01 ~] $ Diff/tmp/raw1/tmp/raw4
6, 7c6, 7
<Kfbh. block. obj: 2147483648; 0x008: TYPE = 0x8 NUMB = 0x0
<Kfbh. check: 203544188; 0x00c: 0x0c21d67c
---
& Gt; kfbh. block. obj: 2147483649; 0x008: TYPE = 0x8 NUMB = 0x1
> Kfbh. check: 3389207210; 0x00c: 0xca0332aa
20c20
<Kfdhdb. dsknum: 0; 0x024: 0x0000
---
> Kfdhdb. dsknum: 1; 0x024: 0x0001
23c23
<Kfdhdb. dskname: ZXDG_0000; 0x028: length = 10
---
> Kfdhdb. dskname: ZXDG_0001; 0x028: length = 10
25c25
<Kfdhdb. fgname: ZXDG_0000; 0x068: length = 10
---
> Kfdhdb. fgname: ZXDG_0001; 0x068: length = 10
27, 30c27, 30
<Kfdhdb. crestmp. hi: 32971218; 0x0a8: HOUR = 0x12 DAYS = 0xe MNTH = 0x6 YEAR = 0x7dc
<Kfdhdb. crestmp. lo: 449324032; 0x0ac: USEC = 0x0 MSEC = 0x209 SECS = 0x2c MINS = 0x6
<Kfdhdb. mntstmp. hi: 32993540; 0x0b0: HOUR = 0x4 DAYS = 0x8 MNTH = 0xc YEAR = 0x7dd
<Kfdhdb. mntstmp. lo: 3706305536; 0x0b4: USEC = 0x0 MSEC = 0x26f SECS = 0xe MINS = 0x37
---
> Kfdhdb. crestmp. hi: 33000305; 0x0a8: HOUR = 0x11 DAYS = 0x1b MNTH = 0x2 YEAR = 0x7de
> Kfdhdb. crestmp. lo: 2735401984; 0x0ac: USEC = 0x0 MSEC = 0x2bb SECS = 0x30 MINS = 0x28
> Kfdhdb. mntstmp. hi: 33000305; 0x0b0: HOUR = 0x11 DAYS = 0x1b MNTH = 0x2 YEAR = 0x7de
> Kfdhdb. mntstmp. lo: 2735433728; 0x0b4: USEC = 0x0 MSEC = 0x2da SECS = 0x30 MINS = 0x28
35, 36c35, 36
<Kfdhdb. dsksize: 204797; 0x0c4: 0x00031ffd
<Kfdhdb. pmcnt: 3; 0x0c8: 0x00000003
---
> Kfdhdb. dsksize: 102398; 0x0c4: 0x00018ffe
> Kfdhdb. pmcnt: 2; 0x0c8: 0x00000002
39c39
<Kfdhdb. f1b1locn: 2; 0x0d4: 0x00000002
---
> Kfdhdb. f1b1locn: 0; 0x0d4: 0x00000000
[Oracle @ zxdb01 ~] $
[Oracle @ zxdb01 ~] $ Diff/tmp/raw1/tmp/raw5
6, 7c6, 7
<Kfbh. block. obj: 2147483648; 0x008: TYPE = 0x8 NUMB = 0x0
<Kfbh. check: 203544188; 0x00c: 0x0c21d67c
---
& Gt; kfbh. block. obj: 2147483650; 0x008: TYPE = 0x8 NUMB = 0x2
> Kfbh. check: 3389207210; 0x00c: 0xca0332aa
20c20
<Kfdhdb. dsknum: 0; 0x024: 0x0000
---
> Kfdhdb. dsknum: 2; 0x024: 0x0002
23c23
<Kfdhdb. dskname: ZXDG_0000; 0x028: length = 10
---
> Kfdhdb. dskname: ZXDG_0002; 0x028: length = 10
25c25
<Kfdhdb. fgname: ZXDG_0000; 0x068: length = 10
---
> Kfdhdb. fgname: ZXDG_0002; 0x068: length = 10
27, 30c27, 30
<Kfdhdb. crestmp. hi: 32971218; 0x0a8: HOUR = 0x12 DAYS = 0xe MNTH = 0x6 YEAR = 0x7dc
<Kfdhdb. crestmp. lo: 449324032; 0x0ac: USEC = 0x0 MSEC = 0x209 SECS = 0x2c MINS = 0x6
<Kfdhdb. mntstmp. hi: 32993540; 0x0b0: HOUR = 0x4 DAYS = 0x8 MNTH = 0xc YEAR = 0x7dd
<Kfdhdb. mntstmp. lo: 3706305536; 0x0b4: USEC = 0x0 MSEC = 0x26f SECS = 0xe MINS = 0x37
---
> Kfdhdb. crestmp. hi: 33000305; 0x0a8: HOUR = 0x11 DAYS = 0x1b MNTH = 0x2 YEAR = 0x7de
> Kfdhdb. crestmp. lo: 2735401984; 0x0ac: USEC = 0x0 MSEC = 0x2bb SECS = 0x30 MINS = 0x28
> Kfdhdb. mntstmp. hi: 33000305; 0x0b0: HOUR = 0x11 DAYS = 0x1b MNTH = 0x2 YEAR = 0x7de
> Kfdhdb. mntstmp. lo: 2735433728; 0x0b4: USEC = 0x0 MSEC = 0x2da SECS = 0x30 MINS = 0x28
35, 36c35, 36
<Kfdhdb. dsksize: 204797; 0x0c4: 0x00031ffd
<Kfdhdb. pmcnt: 3; 0x0c8: 0x00000003
---
> Kfdhdb. dsksize: 102398; 0x0c4: 0x00018ffe
> Kfdhdb. pmcnt: 2; 0x0c8: 0x00000002
39c39
<Kfdhdb. f1b1locn: 2; 0x0d4: 0x00000002
---
> Kfdhdb. f1b1locn: 0; 0x0d4: 0x00000000
[Oracle @ zxdb01 ~] $
In this case, you can add disk force to force the dg:
SQL> Alter diskgroup ZXDG add disk '/dev/raw/raw4' name ZXDG_01 force;
Diskgroup altered.
SQL> Alter diskgroup ZXDG add disk '/dev/raw/raw5' name ZXDG_01 force;
Diskgroup altered.
Check the status again. You have successfully added the following information:
Name group_number disk_number mount_status header_status mode_status state path
ZXDG_0000 1 0 cached member online normal/dev/raw/raw1
ZXDG_02 1 2 cached member online normal/dev/raw/raw5
ZXDG_01 1 cached member online normal/dev/raw/raw4
You can safely drop raw1 and set the rebalance with a concurrency of 10:
SQL>
SQL> Alter diskgroup ZXDG drop disk ZXDG_0000 rebalance power 10 nowait;
Diskgroup altered.
SQL>
You can view the rebalance progress through v $ asm_disk:
Name free_mb
ZXDG_0000 155446
ZXDG_02 55532
ZXDG_01 54982
The speed is still quite fast.
Fortunately, metadata is not damaged during the fault process. If metadata is damaged, you need to fix metadata. This is a very tedious step.
Of course, if there is a lot of downtime, you can also reconstruct the dg to fix this problem.
It is estimated that the children's shoes that often run WINDOWS servers are eager to restart asm in this case. However, if metedata is damaged, never try to restart asm (especially not to restart asm of all nodes at the same time ),
It is likely that the dg cannot be mounted, which will cause unnecessary trouble to solve the problem.