Auto-Restart for Oracle RAC VDBs

View Only

Back to discussions

Expand all | Collapse all

Auto-Restart for Oracle RAC VDBs

1. Auto-Restart for Oracle RAC VDBs

3 Recommend
Delphix Ambassador

Tim Gorman
Posted 10-13-2019 09:41:00 AM
| view attached

Reply
Background

One of the more interesting features that slipped into the Delphix virtualization engine a few years ago is Automatic VDB Restart On Target Server After Reboot, explained in the documentation HERE.

The documentation for this feature starts with this paragraph...

The Delphix Engine now automatically detects whether a target server has been rebooted, and proactively restarts any VBD on that server that was previously up and running. This is independent of data platform. It is done as if you realized a target server was restarted and issued a start command from the Delphix Engine. This feature is compatible with Jet Stream ordering dependencies and is limited to non-clustered VDBs.
OK, that seems like a nice feature to have. For many people, it may seem like an almost useless feature, since database servers are engineered to fail very seldom, if at all.

However, what this viewpoint is not taking into account is the common operating procedure to reboot Windows servers periodically. Organizations which continue to adhere to this antiquated habit will reboot servers on a rolling schedule during low-usage periods (i.e. weekends), sometimes without prior notification. Even with prior notification, system administrators may not always restart all services in a timely manner, if at all.

So following either a planned reboot or an expected failure, this feature helpfully restarts failed VDBs for all platforms, whether UNIX/Linux or Windows, and for all data platforms, whether SQL Server, Oracle, SAP ASE, SAP HANA, DB2, or PostgreSQL.

But immediately following this paragraph in the documentation is the dreaded fine print...

Note: It does not work for Oracle RAC VDBs, Oracle 12c PDB/CDB or MSSQL cluster VDBs
Translated, this means it does not work for clusters, and that it does not work for Oracle vCDBs or vPDBs. Now, just speculating, but I believe that there is a good chance that the caveat about vCDBs and vPDBs might soon be retracted as Delphix virtualization expands further into the Oracle multi-tenant product feature set.

But I have not seen or heard indication that a VDB auto-restart is planned for Oracle RAC VDBs or SQL Server clustered VDBs. This makes sense, because each vendor (Oracle and Microsoft) touts their clustering solutions as high-availability solutions, so why should Delphix have to worry about cluster nodes being restarted? For those who found the probability of unexpected reboot of their database servers a remote possibility, then the probability of unexpected reboot of database clusters seems an even more remote possibility.

However, in my role within Delphix Fields Services, I did indeed have a customer who requested that Automatic VDB Restart After Reboot be available for Oracle RAC clusters, so I thought about it, and this is what resulted...

Requirements for Automatic VDB Restart After Reboot on one (or more) Oracle RAC nodes

If one or more nodes in an Oracle RAC cluster is rebooted, then the Delphix VDB database instance on the affected nodes will be terminated when the OS is halted. Upon reboot, there is nothing to automatically restart the VDB after the Oracle clusterware on the affected nodes has been restarted.

Most likely, it is the issue to time the restart of a VDB database instance only after the Oracle clusterware services (CRS) on the node has been restarted. Attempting to start an Oracle RAC database instance before the CRS is fully operational will hang or fail.

I knew the solution had to be as simple (or simpler) than advising the customer to manually restart the VDB instances, so we certainly could not have a complex Rube Goldberg machine.

As it turns out, Oracle provides for high-availability management of other applications by the Oracle clusterware itself, as documented HERE. I had used this mechanism back in 2007 to automate the restart of a product called Teleran iSight within an Oracle10g RAC cluster, but that was a long time ago, and I seemed to retain a faint whiff of Rube Goldberg in my memory of that exercise.

Custom application management by Oracle Clusterware

As it turns out, I was incorrect. It was not complicated. As twelve years had passed, I had completely forgotten that I had solved a similar problem, and I had complegely forgotten that I had also created a one-hour presentation for Oracle users groups on the topic, which I found as I was googling. It is always a little disconcerting yet satisfying to find your own prior words as the best reference answering your own current question.

I reviewed my presentation from so long ago, compared it to the current Oracle documentation, and found that the components and steps to deploy this solution had changed very little between Oracle10g and Oracle18c. In fact, it was even simpler.

Oracle clusterware provides the vital framework for monitoring and managing highly-available applications, because it was built to make Oracle database highly available. To make a long story short, there are just four actions to be performed by the clusterware...

check whether the application instance is "up" or "down" on a cluster node

start a "down" application instance on a cluster node

stop an "up" application instance on a cluster node

clean or stop an application instance on a cluster and leave it in a state ready to be started

The Oracle cluster framework leaves an opening for a custom-built program capable of performing any of these actions on the application instance when called with the appropriate command-line parameter value. So the framework consists entirely of supported Oracle clusterware code, but one small piece -- the action script -- contains the logic to interact with the custom application.

Once the action script is written and tested, then it can be registered into an Oracle RAC cluster environment as a new resource using the Oracle clusterware crsctl utility.

Here is the key design concept: I do not want to monitor an Oracle database instance on a RAC cluster node. DBAs may have reasons to start and stop instances of a Delphix RAC VDB using either SQL*Plus, Oracle Enterprise Manager, or the Oracle srvctl utility. If my solution attempted to restart a VDB instance every time it went down, then my solution may turn into more of a problem than a solution.

Instead, I wanted to monitor the availability of the NFS file-system mount-point in which the Delphix VDB database instance resides. This was something that local control of database instances would not normally impact. The only reasons that the NFS mount-point exists is to support the VDB. The only valid reason that the NFS mount-point might be missing is if the Delphix administrator performed a stop VDB, disable VDB, or delete VDB action. Other than that, the only reason that the NFS mount-point might be unmounted is if the server had been rebooted unexpectedly.

Detailed explanation of the solution

So, the "action script" for the Oracle clusterware application needed to monitor whether the NFS mount-point for the VDB was "up" (mounted) or "down" (unmounted).

Here is the Oracle clusterware "action script" for a Delphix RAC VDB...

1 #!/bin/bash 2 #================================================================================ 3 # File:         vdb_action.sh 4 # Type:         bash-shell script 5 # Date:         26-July 2019 6 # Author:       Delphix Field Services 7 # Ownership:    This script is owned and maintained by the user, not by Delphix 8 # 9 # Licensed under the Apache License, Version 2.0 (the "License"); 10 # you may not use this file except in compliance with the License. 11 # You may obtain a copy of the License at 12 # 13 #       http://www.apache.org/licenses/LICENSE-2.0 14 # 15 # Unless required by applicable law or agreed to in writing, software 16 # distributed under the License is distributed on an "AS IS" BASIS, 17 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 18 # See the License for the specific language governing permissions and 19 # limitations under the License. 20 # 21 # Copyright (c) 2019 by Delphix. All rights reserved. 22 # 23 # Description: 24 # 25 #       This is an "action script" for an Oracle Clusterware customized resource 26 #       to perform actions initiated by Oracle Clusterware (CRS) for a "Generic 27 #       Resource". 28 # 29 #       The recommended syntax for creating the customized generic resource is: 30 # 31 #               crsctl add resource <resource-name> \ 32 #                       -type generic_application \ 33 #                       -attr "SERVER_POOLS=Generic,\ 34 #                               START_PROGRAM='<full-path-to-this-script> start <db-name>',\ 35 #                               STOP_PROGRAM='<full-path-to-this-script> stop <db-name>',\ 36 #                               CHECK_PROGRAMS='<full-path-to-this-script> check <db-name>',\ 37 #                               CLEAN_PROGRAM='<full-path-to-this-script> clean <db-name>',\ 38 #                               CARDINALITY=%CRS_SERVER_POOL_SIZE%,\ 39 #                               AUTO_START=restore,\ 40 #                               SCRIPT_TIMEOUT=75 41 # 42 #       where... 43 # 44 #               <resource-name>                 CRS resource name (no spaces) 45 #               <full-path-to-this-script>      full path to this shell script on 46 #                                               all RAC cluster nodes 47 #               <db-name>                       value of DB_UNIQUE_NAME initialization parameter 48 # 49 # Note: 50 # 51 #       CRSCTL commands can be executed from any one of the nodes in the RAC cluster 52 #       while connected to the shell as the Oracle clusterware owner OS account 53 #       (i.e. "grid", "oracle"). 54 # 55 # Standard Oracle documentation: 56 # 57 #       Entitled "Making Applications Highly Available Using Oracle Clusterware" 58 #       for version 12.2 can be found at... 59 # 60 #               https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwadd/making-applications-highly-available-using-oracle-clusterware.htm    l#GUID-13400D83-3FF4-482E-B7A7-41C864A491DC 61 # 62 # Calling syntax: 63 # 64 #       vdb_action.sh action-type db-name 65 # 66 # Calling parameters: 67 # 68 #       action-type     one of the following: start, stop, check, clean 69 #       db-name         value of DB_UNIQUE_NAME initialization parameter 70 # 71 # Modifications: 72 #       TGorman 26jul19 written 73 #       TGorman 19aug19 added db-name parameter 74 #================================================================================ 75 # 76 #-------------------------------------------------------------------------------- 77 # Verify that there are two command-line parameters... 78 #-------------------------------------------------------------------------------- 79 if (( $# != 2 )) 80 then 81         echo "`date` [ERROR] Usage: \"$0 action-type db-name\"; aborting..." 82         exit 1 83 fi 84 # 85 #-------------------------------------------------------------------------------- 86 # Please customize the following variables for local preferences... 87 #-------------------------------------------------------------------------------- 88 _dbName=$2 89 _dlpxEngine="dlpxengine.xyz.com" 90 _baseMntPath="/delphix/provision" 91 _logDir="/tmp" 92 _dlpxUser="mgmt_`hostname -s`" 93 # 94 #-------------------------------------------------------------------------------- 95 # Please do not change the following script variables unless prepared to make 96 # changes elsewhere to the script... 97 #-------------------------------------------------------------------------------- 98 _hostName="`hostname -s`" 99 _fileLabel="delphix_monitor" 100 _monitorVdbDir=${_baseMntPath}/${_dbName}/datafile 101 _monitorVdbFile=${_monitorVdbDir}/${_fileLabel}_${_dbName}_${_hostName}.txt 102 _logFile=${_logDir}/${_fileLabel}_${1}_${_dbName}_${_hostName}_$$.txt 103 typeset -i _loopSleep=3 104 typeset -i _loopLimit=100 105 # 106 #-------------------------------------------------------------------------------- 107 # Clean out previous log files... 108 #-------------------------------------------------------------------------------- 109 rm -f /tmp/${_fileLabel}_${1}_${_dbName}_${_hostName}_*.txt 110 # 111 #-------------------------------------------------------------------------------- 112 # perform START, STOP/CLEAN, and CHECK operations... 113 #-------------------------------------------------------------------------------- 114 case "$1" in 115         #------------------------------------------------------------------------ 116         # START operation will first connect to the Delphix virtualization engine 117         # and attempt to START the VDB... 118         #------------------------------------------------------------------------ 119         start)          ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile} 120 /source select "${_dbName}" start; commit; exit 121 __EOF__ 122                         # 123                         #-------------------------------------------------------- 124                         # if the output from the Delphix CLI command includes the 125                         # phrase "Error:", then it is only possibly an actual 126                         # error... 127                         #-------------------------------------------------------- 128                         if grep "Error:" ${_logFile} > /dev/null 2>&1 129                         then 130                                 # 131                                 #------------------------------------------------ 132                                 # if the output from the Delphix CLI command also 133                                 # includes the verbiage "only one job can be active", 134                                 # then it is not really an error, just a concurrency 135                                 # problem with multiple nodes sending a START at the 136                                 # same time... 137                                 #------------------------------------------------ 138                                 if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1 139                                 then 140                                         typeset -i _loopCnt=0 141                                         while (( ${_loopCnt} < ${_loopLimit} )) 142                                         do 143                                                 if [ -d ${_monitorVdbDir} ] 144                                                 then 145                                                         break 146                                                 else 147                                                         sleep ${_loopSleep} 148                                                 fi 149                                                 typeset -i _loopCnt=${_loopCnt}+1 150                                         done 151                                         if (( ${_loopCnt} >= ${_loopLimit} )) 152                                         then 153                                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount failure" | tee -a ${_logFile} 154                                                 exit 1 155                                         fi 156                                 else 157                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile} 158                                         exit 1 159                                 fi 160                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile} 161                         else 162                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile} 163                         fi 164                         # 165                         #---------------------------------------------------------- 166                         # If the START operation via the Delphix CLI succeeded, then 167                         # put a timestamp into the "monitor file" within the NFS 168                         # mountpoint in which the VDB resides... 169                         #---------------------------------------------------------- 170                         if date > ${_monitorVdbFile} 2>> ${_logFile} 171                         then 172                                 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile} 173                                 exit 0 174                         else 175                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - date failure" | tee -a ${_logFile} 176                                 exit 1 177                         fi 178                         ;; 179         #------------------------------------------------------------------------ 180         # STOP and CLEAN are the same actions... 181         # 182         # If the "monitor file" within the NFS mountpoint in which the VDB resides 183         # is accessible, then remove it and then contact the Delhpix virtualization 184         # engine in order to STOP the VDB... 185         #------------------------------------------------------------------------ 186         stop|clean)     if rm -f ${_monitorVdbFile} 2>&1 >> ${_logFile} 187                         then 188                                 # 189                                 #------------------------------------------------ 190                                 # if the "rm -f" command succeeded, then the NFS 191                                 # mount-point is still mounted and the VDB instance 192                                 # needs to be stopped... 193                                 #------------------------------------------------ 194                                 ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile} 195 /source select "${_dbName}"; stop; commit; exit 196 __EOF__ 197                                 if grep "Error:" ${_logFile} > /dev/null 2>&1 198                                 then 199                                         if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1 200                                         then 201                                                 typeset -i _loopCnt=0 202                                                 while (( ${_loopCnt} < ${_loopLimit} )) 203                                                 do 204                                                         if [ -d ${_monitorVdbDir} ] 205                                                         then 206                                                                 break 207                                                         else 208                                                                 sleep ${_loopSleep} 209                                                         fi 210                                                         typeset -i _loopCnt=${_loopCnt}+1 211                                                 done 212                                                 if (( ${_loopCnt} >= ${_loopLimit} )) 213                                                 then 214                                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount-point failure" | tee -a ${_logFile} 215                                                         exit 1 216                                                 fi 217                                         else 218                                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile} 219                                                 exit 1 220                                         fi 221                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile} 222                                 else 223                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile} 224                                 fi 225                         else 226                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - rm failure" | tee -a ${_logFile} 227                                 echo "\"rm -f ${_monitorVdbFile}\" failed" >> ${_logFile} 228                                 exit 1 229                         fi 230                         echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile} 231                         exit 0 232                         ;; 233         #------------------------------------------------------------------------ 234         # CHECK operation involves simply checking for the accessibility of the 235         # "monitor" file residing within the NFS mount-point of the VDB... 236         #------------------------------------------------------------------------ 237         check)          if [ -e ${_monitorVdbFile} ] 238                         then 239                                 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile} 240                                 exit 0 241                         else 242                                 echo "`date` [ERROR] \"$0 $1 $2\" failure on \"${_hostName}\"" | tee -a ${_logFile} 243                                 exit 1 244                         fi 245                         ;; 246         #------------------------------------------------------------------------ 247         # invalid command-line parameter passed to the script... 248         #------------------------------------------------------------------------ 249         *)              echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - invalid action requested" | tee -a ${_logFile} 250                         exit 1 251                         ;; 252 esacHERE is a clean non-numbered download copy of the shell-script.

At 252 lines, it seems lengthy, but as you can see, well over half of the lines are comments. So here is a quick summary of the parts of the script...

lines 1-75

header comments and explanatory description, including disclaimers, call syntax, and Oracle doc references

this is all worth reading...

lines 76-84

validation of command-line parameters

lines 85-93

script variables which should be considered to be changed to adhere to the local server directory paths

lines 94-110

script variables which should only be changed if you REALLY know what you're doing :)

lines 111-178

script logic pertaining to the START operation for the RAC VDB

lines 179-232

script logic pertaining to the STOP/CLEAN operations (which are functionally equivalent)

lines 233-252

script logic pertaining to the CHECK operation

This just gives you a "table of contents" for the script. It is pretty well commented, so for fine details, I encourage you to read them thoroughly.

The first deployment step is to store this "action script" in the same directory pathname on all nodes in the RAC cluster.

Then, the last deployment step is to register the script within the Oracle clusterware by adding a clusterware resource using the Oracle CRSCTL utility...

crsctl add resource <resource-name> \            -type generic_application \          -attr "SERVER_POOLS=Generic,\                   START_PROGRAM='<full-path-to-action-script> start <db-unq-name>',\                 STOP_PROGRAM='<full-path-to-action-script> stop <db-unq-name>',\                   CHECK_PROGRAMS='<full-path-to-action-script> check <db-unq-name>',\                 CLEAN_PROGRAM='<full-path-to-action-script> clean <db-unq-name>',\                 CARDINALITY=%CRS_SERVER_POOL_SIZE%,\                   AUTO_START=restore,\                 SCRIPT_TIMEOUT=75"
where...

<resource-name>                 CRS resource name (no spaces)

<full-path-to-action-script>    full path to the action script on all RAC cluster nodes

<db-unq-name>                   value of DB_UNIQUE_NAME initialization paramete

Some comments about this CRSCTL ADD RESOURCE command...

If it hasn't already been made clear, this is the single command which creates the monitoring resource within the Oracle Clusterware ecosystem

Oracle clusterware commands re available only to the OS account which is the "owner" of Oracle clusterware

in some installations, there is only one OS account (i.e. "oracle") which is the "owner" of all Oracle software: clusterware, RDBMS, etc

in other installations, there is a separation of duties between the OS account (i.e. "grid") which is the "owner" of Oracle clusterware versus the OS account(s) (i.e. "oracle", etc) which are the owner(s) of the Oracle RDBMS software installation(s)

The START_PROGRAM, STOP_PROGRAM, CHECK_PROGRAM, and CLEAN_PROGRAM attributes identify the calls to be made to the custom-built "action script"

The SERVER_POOLS and CARDINALITY attributes are short-hand for indicating that all nodes in the RAC cluster use this resource

The inclusion of the SCRIPT_TIMEOUT attribute resolves the puzzlement from the mysterious and inconsistent failures of START operations

while VDB startup generally occurs in less than the default SCRIPT_TIMEOUT of 60 seconds, it is easy to breach that time for a variety of reasons in a variety of clusters and situations

be sure to set this value far higher than necessary to ensure that everything works

then possibly lower the timeout guardedly (if at all) after better understanding how long it normally takes to START

Additional CRSCTL utility commands allow management of the clusterware resource...

crsctl delete resource <resource-name> [ -f ]

optional "-f" switch is "force"

crsctl status resource <resource-name>

crsctl start resource <resource-name>

crsctl stop resource <resource-name>

crsctl clean resource <resource-name>

Dealing with the race condition

If you read through the shell script, then you may have noticed some checking for the phrase "only one job can be active for an object" in the output returned from the Delphix CLI commands to start or stop the RAC VDB.

This is because the clusterware resource is likely being executed independently on more than one RAC cluster node at the same time. For example, if all of the RAC cluster nodes had been restarted, then as each of them restart and the Oracle clusterware starts checking the status of resources, then one or more action scripts may detect that the RAC VDB is stopped, all at the same time. As a result, an eight-node RAC cluster would result in eight different START VDB commands being submitted to the Delphix virtualization engine via the Delphix CLI at approximately the same time. In such a scenario, the first job submitted will actually execute on the Delphix engine. The other seven jobs will all fail with the error message stating that "only one job can be active for an object". So the action script has logic to consider that response as a successful operation, but only if the NFS mount-point eventually becomes visible about halfway into the START VDB operation.

This "race condition" is one that, with some clever programming, I could probably have avoided. However, I chose to accept the race condition and keep the logic in the action script simpler. The upshot is that Delphix administrators will notice some failed START VDB or STOP VDB operations around the time that the same operation succeeded. This should be a clue that the VDB was stopped or started by Oracle clusterware.

To this day, I still don't know if it was the right decision to work around the race condition in this manner. I leave it to you to decide?

A cautionary note, obvious in hindsight...

If you create this clusterware resource, you must obviously coordinate its usage with the Delphix administrator's management of the RAC VDB. Obviously, the clusterware resource should first be stopped (i.e. "crsctl stop resource <resource-name>" before the Delphix administrator performs operations like VDB refresh, VDB rewind, VDB stop, VDB disable, and VDB drop. Likewise, confusion is bound to result should the Oracle clusterware administrator decide to begin issuing CRSCTL STOP|STOP|CLEAN RESOURCE commands without coordinating with the Delphix administrator.

As there is likely a separation of duties between the role of Delphix administrator and Oracle clusterware administrator, such coordination might easily be overlooked.

Please be cognizant of these possibilities before deciding to adopt this solution.

Using privilege escalation (a.k.a. "sudo", "pbrun", "suexec", etc) to allow the Delphix OS user for the RAC environment (a.k.a. "delphix_os", etc) to execute the CRSCTL command can allow the creation of Delphix hooks to automatically coordinate activities initiated by the Delphix administrator, such as...

Configure clone

sudo crsctl add resource <resource-name> ...

Pre-Refresh, Pre-Rewind, Pre-Stop

sudo crsctl stop resource <resource-name>

Post-Refresh, Post-Rewind, Post-Stop

sudo crsctl start resource <resource-name>

Pre-Start

sudo crsctl status resource <resource-name>

The customer for whom I wrote this RAC VDB autorestart functionality has not yet obtained permissions from their InfoSec team to grant the "sudo" privileges to make these hooks possible.

Is this all really necessary?

My other concern remains: was this really necessary? In other words, do unexpected server reboots happen often enough in a RAC cluster to warrant this automation?

I am of the group for whom the answer is "no". If unexpected server restarts are happening, then you determine the root cause and fix it, instead of instrumenting other applications to work around it. But then again, if a customer's environment involves procedures which correctly or incorrectly perform server reboots unexpectedly, such as the case with many Windows farms, then this functionality would make sense.

As a result, it is probably useful to pay more attention to VDB autorestart for Windows failover clusters or always-on availability groups.

But it is worth testing this in a real-life application, and I can assure you that testing is on-going. I will let you know what I learn, when I learn it.

------------------------------
Tim Gorman
Senior Technical Manager
Delphix
------------------------------

Attachment(s)

vdb_action.sh.txt 9 KB 1 version

Delphix Products

Auto-Restart for Oracle RAC VDBs

1. Auto-Restart for Oracle RAC VDBs

Background

Requirements for Automatic VDB Restart After Reboot on one (or more) Oracle RAC nodes

Custom application management by Oracle Clusterware

Detailed explanation of the solution

Dealing with the race condition

A cautionary note, obvious in hindsight...

Is this all really necessary?

Product

Solutions

Company

Resources

Support

Delphix Products

Auto-Restart for Oracle RAC VDBs

1. Auto-Restart for Oracle RAC VDBs

Background

Requirements for Automatic VDB Restart After Reboot on one (or more) Oracle RAC nodes

Custom application management by Oracle Clusterware

Detailed explanation of the solution

Dealing with the race condition

A cautionary note, obvious in hindsight...

Is this all really necessary?

Related Content

Auto-Restart for Oracle RAC VDBs

Adding node to running RAC VDB

Start & Stop a Node via REST API

How does Delphix handle the multiple archivelog streams of a RAC cluster?

Failed to create node because it already belongs to cluster "APP_NAME", hosts may only belong to one cluster at a time.

Product

Solutions

Company

Resources

Support