Real Applications Cluster (RAC) FAQs

Some of the good Real Applications Cluster (RAC) interview questions:

Q 1. How to setup SCAN in 11gr2 and how it works?

SCAN is a single name defined in DNS that resolves to three IP addresses using a round-robin algorithm. The three IP addresses are on the same subnet as RAC’s default public network in the cluster.

How to setup SCAN:

In order to successful install grid infrastructure you need to configure your DNS prior to installing the grid infrastructure to resolve the name accordingly. Oracle requires at least one IPs to be configured for the scan name.

You can have two options to define SCAN name

1) Define it in your DNS (Domain Name Service)
2) Use GNS (Grid Naming Service)

How SCAN works:

1) Client sends connection to SCAN name SCAN name resolved to SCAN IP address returned by DNS. SCAN IPs are returned in round-robin process. [If GNS is used for SCAN ip management then DNS delegates the request to GNS services and GNS services in turn return a SCAN IP address ( again in round-robin fashion)]

2) The TNS request is now forwarded by SCAN IP to the SCAN Listeners. Remember that the remote_listener parameter is already made to point to SCAN tnsnames.ora entry and local_listener uses VIP Listener entry.

3) SCAN listeners in turn forward the request to local listeners (least loaded one). The remote listeners which points to SCAN listeners will do the load balancing and local listeners will take care of new process spawning and connection to database.

4) Local listeners take care of client request.

PMON registers with SCAN listener as defined in parameter ‘remote_listener’ setting and also with the node listener depending on the local listener settings. On the basis of PMON provided details SCAN will choose the least loaded node to forward the request received from client.

Q 2. What are benefits of SCAN?

A) NO NEED TO RECONFIGURE CLIENT: the SCAN makes it possible to add or remove nodes from the cluster without needing to reconfigure clients (Because the SCAN is associated with the cluster as a whole, rather than to a particular node). Before SCAN, if the cluster changed, then the client TNSNAMES.ORA files (or other tns connect strings like Easy Connect strings) would need to change.

B) LOCATION INDEPENDENCE: SCAN can connect from one node to any node . This is location independence for the databases, so that client configuration does not have to depend on which nodes are running a particular database.

C) LOAD BALANCING: Round-robin on DNS level allows for a connection request load balancing across SCAN listeners floating in the cluster.

New features of SCAN in database 12c:

1. SCAN and Oracle Clusterware managed VIPs now support IPv6 based IP addresses

2. SCAN is by default restricted to only accept service registration from nodes in the cluster

3. SCAN supports multiple subnets in the cluster (one SCAN per subnet)

Q 3. Describe basic steps to do RAC to NON-RAC cloning

1) Make initialization parameter file from current database
2) Remove instance parameters concerning Cluster, ASM and database name (cluster_database, cluster_database_instances, instance_number, thread etc)
3) startup target database in nomount mode
4) Backup source RAC database and copy the backup pieces to target server.
5) Duplicate database by RMAN “duplicate database” command.

 $rman auxiliary /
 RMAN> duplicate database to "TEST" backup location '/BACKUPS/TEST' nofilenamecheck;

6) Post clone steps: remove thread 2, remove undo tablespace 2, change password etc

Q 4. Describe basic steps to do NON-RAC to RAC cloning/conversion

Need to migrate from older version (than 10g)? First upgrade your existing single instance database, test the upgraded database and then migrate to RAC.

Assuming you already have servers ready with OS user/group already setup.

1) Install and configure Grid Infrastructure (Clusterware+ ASM)
2) Add storage to Automatic Storage Management (ASM)
3) Install Oracle Database Software
4) Duplicate single instance Non-ASM database to ASM using RMAN

> Backup source NON-RAC database and copy the backup pieces to target RAC NODE 1.
> Create password file and init.ora file for RAC NODE 1 (don’t add cluster parameters yet)
> Start the auxiliary database in NOMOUNT mode on RAC NODE1
> Duplicate the database to RAC server

 $rman auxiliary /
 RMAN> duplicate database to "TEST" backup location '/BACKUPS/TEST' nofilenamecheck;

5) Manually Convert single-instance to RAC.

> Create redo thread 2 and enable it.
> Add undo tablespace 2 for the second instance
> Add cluster related parameters now

*.cluster_database_instances=2
 *.cluster_database=true
 *.remote_listener='LISTENERS_ORADB’
 ORADB1.instance_number=1
 ORADB2.instance_number=2
 ORADB1.thread=1
 ORADB2.thread=2
 ORADB1.undo_tablespace='UNDOTBS1'
 ORADB2.undo_tablespace='UNDOTBS2'

> Copy the updated init.ora file to RAC NODE2 and rename the files as per instance name.
> Update the environment and start the database on both nodes
> Register the RAC instances with CRS

$ srvctl add database -d ORADB -o /home/oracle/product/v10204
$ srvctl add instance -d ORADB -i ORADB1 -n orarac1
$ srvctl add instance -d ORADB -i ORADB2 -n orarac2

> Create the spfile on ASM shared storage
> Cluster Verify

Q 5. How do you start/stop RAC services ?

Check out this post

Q 6. What is node eviction? What causes Node eviction?

NODE EVICTION:

The Oracle Clusterware is designed to perform a node eviction by removing one or more nodes from the
cluster if some critical problem is detected. A critical problem could be a node not responding via a
network heartbeat, a node not responding via a disk heartbeat, a hung or severely degraded machine,
or a hung ocssd.bin process. The purpose of this node eviction is to maintain the overall health of the
cluster by removing bad members.

Starting in 11.2.0.2 RAC (or if you are on Exadata), a node eviction may not actually reboot the machine.
This is called a rebootless restart. In this case we restart most of the clusterware stack to see if that fixes the
unhealthy node.

CAUSES OF NODE EVICTION:

a) Network failure or latency between nodes. It would take 30 consecutive missed checkins (by default – determined by the CSS misscount) to cause a node eviction.
b) Problems writing to or reading from the CSS voting disk. If the node cannot perform a disk heartbeat to the majority of its voting files, then the node will be evicted.

c) A member kill escalation. For example, database LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism. If this times out it could escalate to a node kill.

d) An unexpected failure or hang of the OCSSD process, this can be caused by any of the above issues or something else.

e) An Oracle bug.

f) High Load on Database Server: Out of 100 Issues, I have seen 70 to 80 time High load on the system was reason for Node Evictions, One common scenario is due to high load RAM and SWAP space of DB node got exhaust and system stops working and finally reboot.

g) Database Or ASM Instance Hang: Sometimes Database or ASM instance hang can cause Node reboot. In these case Database instance is hang and is terminated afterwards, which cause either reboot cluster or Node eviction.

Q 7. What happens to data block/resources of the evicted node?

Those are recreated from PI (Past Image) by the surviving instances.

Below is how it works:

– At Node eviction, instance Failure is detected by Cluster Manager and GCS
– Reconfiguration of GES resources (enqueues); global resource directory is frozen
– Reconfiguration of GCS resources; involves redistribution among surviving instances
– One of the surviving instances becomes the “recovering instance”
– SMON process of recovering instance starts first pass of redo log read of the failed instance’s redo log thread
– SMON finds BWR (block written records) in the redo and removes them as their PI is already written to disk
– SMON prepares recovery set of the blocks modified by the failed instance but not written to disk
– Entries in the recovery list are sorted by first dirty SCN
– SMON informs each block’s master node to take ownership of the block for recovery
– Second pass of log read begins.
– Redo is applied to the data files.
– Global Resource Directory is unfrozen

Q 8. How do you troubleshoot Node Eviction?

Look for error messages in below log files:

Clusterware alert log in /log/ :

The cssdagent log(s) in /log//agent/ohasd/oracssdagent_root
The cssdmonitor log(s) in /log//agent/ohasd/oracssdmonitor_root
The ocssd log(s) in /log//cssd
The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
IPD/OS or OS Watcher data

‘opatch lsinventory -detail’ output for the GRID home

*Messages files:

* Messages file locations:
Linux: /var/log/messages
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
IBM: /bin/errpt -a > messages.out

Q 9. What is split brain situation in RAC?

‘Split Brain’ situation occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all physically up and running and the database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept user connections and work independently. So, basically due to lack of communication, the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instance running, the same block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. Oracle has efficiently implemented check for the split brain syndrome.

In such scenario, if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.

Q 10. What is voting disk?

The voting disk records node membership information. CSSD process on every node makes entries in the voting disk to ascertain the membership of that node.
While marking their own presence, all the nodes also register the information about their communicability with other nodes in voting disk . This is called network heartbeat.
Healthy nodes will have continuous network and disk heartbeats exchanged between the nodes. Break in heart beat indicates a possible error scenario.If the disk block is not updated in a short timeout period, that node is considered unhealthy and may be rebooted to protect the database information.

During reconfig (join or leave) CSSD monitors all nodes and determines whether a node has a disk heartbeat, including those with no network heartbeat. If no disk heartbeat is detected then node is declared as dead.

So, Voting disks contain static and dynamic data.

Static data : Info about nodes in the cluster
Dynamic data : Disk heartbeat logging

It maintains and consists of important details about the cluster nodes membership, such as

– which node is part of the cluster,
– who (node) is joining the cluster, and
– who (node) is leaving the cluster.
The voting disk is not striped but put as a whole on ASM Disks

with external redundancy, one copy of voting file will be stored on one disk in the diskgroup.- If we store voting disk on a diskgroup with normal redundancy, we should be able to tolerate the loss of one disk i.e. even if we lose one disk, we should have sufficient number of voting disks so that clusterware can continue. If the diskgroup has 2 disks (minimum required for normal redundancy), we can store 2 copies of voting disk on it.

If we lose one disk, only one copy of voting disk will be left and clusterware won’t be able to continue, because to continue, clusterware should be able to access more than half the no. of voting disks i.e.> (2*1/2)
i.e. > 1
i.e.= 2
Hence, to be able to tolerate the loss of one disk, we should have 3 copies of the voting disk on a diskgroup with normal redundancy . So, a normal redundancy diskgroup having voting disk should have minimum 3 disks in it.

Q 11. How do you back up voting disk?

In previous versions of Oracle Clusterware you needed to backup the voting disks with the dd command. Starting with Oracle Clusterware 11g Release 2 you no longer need to backup the voting disks. The voting disks are automatically backed up as a part of the OCR

you may need to back up manually the Voting disk file every time

– you add or remove a node from the cluster or
– immediately after you configure or upgrade a cluster.

Q 12. What are various key RAC background processs and what they do?

Oracle RAC instances are composed of following background processes:

ACMS — Atomic Control file to Memory Service (ACMS)
GTX0-j — Global Transaction Process
LMON — Global Enqueue Service Monitor
LMD — Global Enqueue Service Daemon
LMS — Global Cache Service Process
LCK0 — Instance Enqueue Process
DIAG — Diagnosability Daemon
RMSn — Oracle RAC Management Processes (RMSn)
RSMN — Remote Slave Monitor
DBRM — Database Resource Manager (from 11g R2)
PING — Response Time Agent (from 11g R2)

Q 13. What are the RAC specific wait events?

The special use of a global buffer cache in RAC makes it imperative to monitor inter -instance communication via the cluster-specific wait events such as gc cr request and gc buffer busy.

a) gc cr request wait event

The gc cr request wait event specifies the time it takes to retrieve the data from the remote cache. High wait times for this wait event often are because of RAC interconnect Traffic Using Slow Connection or Inefficient Queries. Poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.

b) gc buffer busy acquire and gc buffer busy release wait event

In Oracle 11g you will see gc buffer busy acquire wait event when the global cache open request originated from the local instance and gc buffer busy release when the open request originated from a remote instance. These wait events are all very similar to the buffer busy wait events in a single-instance database and are often the result of Hot Blocks or Inefficient Queries.

c) GC Current Block 2-Way/3-Way

For a Current Block in read mode a KPJUERPR ( Protected Read ) lock is requested. Excessive waits for gc current block are either related to inefficient QEP leading to nummerous block visits or application affinity not being in play

d) GC CR Block Congested/GC Current Block Congested

if LMS process did not process a request within 1ms than LMS marks the response to that block with the congestion wait event. Root cause: LMS is suffering CPU scheduling, LMS is suffering resources like memory ( paging )

Q 14. How can you measure the RAC interconnect bandwidth?

From Tool – Measureware, MRTG etc
From Database Table – from Stats pack report, average time of data block from v$sysstat.

Q 15. How is SCAN listener different from VIP listener in RAC? How many scan listeners are required for a 10 node RAC? Explain in detail.

Scan Name came to picture after Oracle RAC 11gR2. SCANs uses IP’s not assigned to any interface. Clusterware will be in charge of it. It will direct the requests to the appropriate servers in the cluster. The main purpose of SCAN is to provide ease of management/connection. For instance you can add new nodes to the cluster without changing your client TNSnames. This is because Oracle will automatically distribute requests accordingly based on the SCAN IPs which point to the underlying VIPs. Scan listeners do the bridge between clients and the underlying local listeners which are VIP-dependent.

VIPs are the IPs which CRS maintains for each node. A VIP is a virtual IP for a specific server node in the cluster.Should that server node fail, this VIP is transferred to another server node in order to still provide network connectivity for clients using the address of the failed server node. VIP in other words provides high availability as despite a server node failing, network communication to this node will still be supported by another node via the failed node’s VIP.

On a 10 node cluster, there will be 10 virtual IP addresses with 10 virtual hostnames – which means that many clients will need to know and use all 10 VIPs in order to make load balanced, or high availability, or TAF, connections. SCAN replaces this on the client side – by providing the client with a Single Client Acces Name to use as oppose to 10 VIPs. Oracle asks to have 3 Scan IPs/Listeners to serve multi-node RAC system. You can have 10 node system but you need only 3 SCAN listeners. The SCAN listeners can run on any node of the cluster. Having three is plenty.

Q 16. Difference between RAC and Non-RAC database. Difference in terms of SGA/shared pool?

Q 17. What is OCR File?

OCR contains information about all Oracle resources in the cluster.

The purpose of the Oracle Cluster Registry (OCR) is to hold cluster and database configuration information for RAC and Cluster Ready Services (CRS) such as the cluster node list, and cluster database instance to node mapping, and CRS application resource profiles.

Oracle recommends that you configure:

At least three OCR locations, if OCR is configured on non-mirrored or non-redundant storage. Oracle strongly recommends that you mirror OCR if the underlying storage is not RAID. Mirroring can help prevent OCR from becoming a single point of failure.

At least two OCR locations if OCR is configured on an Oracle ASM disk group. You should configure OCR in two independent disk groups. Typically this is the work area and the recovery area.

Q 18. What is OLR?

Oracle Local Registry (OLR) is a registry similar to OCR located on each node in a cluster, but contains information specific to each node.
It contains manageability information about Oracle Clusterware, including dependencies between various services. Oracle High Availability Services uses this information. OLR is located on local storage on each node in a cluster

Q 19. How will you take ocr Backup?

There are two methods of copying OCR content and using the content for recovery. The first method uses automatically generated physical OCR file copies and the second method uses manually created logical OCR export files.

Because of the importance of OCR information, the ocrconfig tool should be used to make daily copies of the automatically generated backup files

Author
Recent Posts

Brijesh Gogia

I’m an experienced Cloud/Oracle Applications/DBA Architect with more than 15 years of full-time DBA/Architect experience. I have gained wide knowledge on Oracle and Non-Oracle software stack running on-prem and on Cloud and have worked on several big projects for multi-national companies. I enjoy working with leading-edge technology and have a passion for Cloud architecture, automation, database performance, and stability. Thankfully my work allows me time for researching new technologies (and to write about them).

4 Comments

Lilian Barroso

Very nice post!
Congrats!!! 😀

May 9, 2015 Reply
nagendra

please update

December 26, 2016 Reply
DBA

Sooper Articles and clear concepts,

March 6, 2018 Reply
Vaibhav Awasare

Redundancy of Voting Disk and OCR and redundancy of DATADG and why is different.

October 18, 2020 Reply

Real Applications Cluster (RAC) FAQs

Related posts:

4 Comments