Couple of ASM disk headers were accidentally removed by UNIX sysadmin team.
After disk headers got removed, the database kept running for some time as the data inside disk was intact. When scheduled RMAN backup kicked off the database came down.
OCR_VOTE disk group was not affected and remain mounted. All RAC processes except database were running normal as OCR and voting disk were available.
Disk Status was showing as ‘CANDIDATE’ just after DB went down >
SQL> select GROUP_NUMBER,DISK_NUMBER,NAME,TOTAL_MB,FREE_MB,HEADER_STATUS from v$asm_disk; GROUP_NUMBER DISK_NUMBER NAME TOTAL_MB FREE_MB HEADER_STATU ------------ ----------- ------------------------------ ---------- ---------- ------------ 1 0 ARC_LOG_0000 102400 56199 CANDIDATE 2 5 DB_DATA_0005 102400 22441 CANDIDATE 2 4 DB_DATA_0004 102400 22412 CANDIDATE 2 3 DB_DATA_0003 102400 22431 CANDIDATE 2 2 DB_DATA_0002 102400 22427 CANDIDATE 2 1 DB_DATA_0001 102400 22433 CANDIDATE 2 0 DB_DATA_0000 102400 22425 CANDIDATE 3 0 OCR_VOTE_0000 51199 50803 MEMBER 8 rows selected.
As show above, the disks status were showing as ‘CANDIDATE’ just after the database went down, but after some time ASM process dismounted the disk groups and took all disks offline.The ASM alert log showed below message:
WARNING: Disk 0 (DB_DATA_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 WARNING: Disk 1 (DB_DATA_0001) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 WARNING: Disk 2 (DB_DATA_0002) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 WARNING: Disk 3 (DB_DATA_0003) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 WARNING: Disk 4 (DB_DATA_0004) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 WARNING: Disk 5 (DB_DATA_0005) in group 2 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: cache deleting context for group DB_DATA 2/0xbf00da29 GMON dismounting group 2 at 16 for pid 40, osid 17308 NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment SUCCESS: diskgroup DB_DATA was dismounted SUCCESS: alter diskgroup DB_DATA dismount force /* ASM SERVER */ ERROR: PST-initiated MANDATORY DISMOUNT of group DB_DATA
After above error message in alert log, ASM was not able to locate disks on the server >
[root@dbserver ~]# /etc/init.d/oracleasm scandisks Scanning the system for Oracle ASMLib disks: [ OK ] [root@dbserver ~]# /etc/init.d/oracleasm listdisks OCR_VOTE01
Database backups cannot restore a corrupted ASM header. rman backups the Oracle data files. Not the physical ASM disk itself with the disk’s headers.
Below steps helped us in correcting the situation:
Step 1 : Finding ASM to Physical Disk mapping
ASM disk to physical disk mapping information was available to us. This information was collected by below commands.
A) $ /etc/init.d/oracleasm querydisk -d DB_DATA01 Disk "DB_DATA01" is a valid ASM disk on device /dev/sdl[8,176] /dev/sdl is the actual disk path B) Running 'multipath -ll' command and looking for '/dev/sdl' mpath17 (360060e8006d8e7000000d8e700000000) dm-7 HP,OPEN-V [size=100G][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 4:0:0:10 sdaf 65:240 [active][ready] \_ 3:0:0:10 sdl 8:176 [active][ready]
Here
>> 360060e8006d8e7000000d8e700000000 is the WWID which should be same on all RAC nodes when checked with ‘multipath -ll’.
>> /dev/mapper/mpath17 is the multipath disk device corresponding to DB_DATA01 on this node
Similar information was available for other data disks too.
Step 2: Running ‘kfed’ Oracle utility
A backup copy of ASM disk header is always saved inside the disk. Such kind of disk header corruption/missing issues can be taken care by running kfed with ‘repair’ command.
As the ASM disks were not showing up under the ASM directory ‘/dev/oracleasm/disks’, we were not able to run the ‘kfed repair’ command on ASM disks.
Switched to root user and run the ‘kfed repair’ command on corresponding physical devices>
As ‘kfed’ is an oracle utility, you will have to go to GRID_HOME bin directory and run the command.
$ cd /u01/app/11.2.0/grid/bin $ ./kfed repair /dev/mapper/mpath17 $ ./kfed repair /dev/mapper/mpath18 $ ./kfed repair /dev/mapper/mpath19 $ ./kfed repair /dev/mapper/mpath20 $ ./kfed repair /dev/mapper/mpath21 $ ./kfed repair /dev/mapper/mpath22 $ ./kfed repair /dev/mapper/mpath23
After running above command, you may have to change the ownership of disks under /dev/oracleasm/disks to Grid user.
On each node, execute the next commands as root user (next actions will scan the multipath disks and locate all ASM disks) :
$ /etc/init.d/oracleasm scandisks Scanning the system for Oracle ASMLib disks: [ OK ] $ /etc/init.d/oracleasm listdisks ARC_LOG01 DB_DATA01 DB_DATA02 DB_DATA03 DB_DATA04 DB_DATA05 DB_DATA06 OCR_VOTE01
Above command showed all the required ASM disks now.
Step 3: Start Database and verify
$ srvctl start database -d <SID> $ crsctl status resource -t
Latest posts by Brijesh Gogia (see all)
- Oracle Multitenant DB 4 : Parameters/SGA/PGA management in CDB-PDB - July 18, 2020
- Details of running concurrent requests in Oracle EBS - May 26, 2020
- Oracle Multitenant DB 3 : Data Dictionary Architecture in CDB-PDB - March 20, 2020
Nice article