Background
One of the more interesting features that slipped into the Delphix virtualization engine a few years ago is
Automatic VDB Restart On Target Server After Reboot, explained in the documentation
HERE.
The documentation for this feature starts with this paragraph...
The Delphix Engine now automatically detects whether a target server has been rebooted, and proactively restarts any VBD on that server that was previously up and running. This is independent of data platform. It is done as if you realized a target server was restarted and issued a start command from the Delphix Engine. This feature is compatible with Jet Stream ordering dependencies and is limited to non-clustered VDBs.
OK, that seems like a nice feature to have. For many people, it may seem like an almost useless feature, since database servers are engineered to fail very seldom, if at all.
However, what this viewpoint is not taking into account is the common operating procedure to reboot Windows servers periodically. Organizations which continue to adhere to this antiquated habit will reboot servers on a rolling schedule during low-usage periods (i.e. weekends), sometimes without prior notification. Even with prior notification, system administrators may not always restart all services in a timely manner, if at all.
So following either a planned reboot or an expected failure, this feature helpfully restarts failed VDBs for all platforms, whether UNIX/Linux or Windows, and for all data platforms, whether SQL Server, Oracle, SAP ASE, SAP HANA, DB2, or PostgreSQL.
But immediately following this paragraph in the documentation is the dreaded fine print...
Note: It does not work for Oracle RAC VDBs, Oracle 12c PDB/CDB or MSSQL cluster VDBs
Translated, this means it does not work for clusters, and that it does not work for Oracle vCDBs or vPDBs. Now, just speculating, but I believe that there is a good chance that the caveat about vCDBs and vPDBs might soon be retracted as Delphix virtualization expands further into the Oracle multi-tenant product feature set.
But I have not seen or heard indication that a VDB auto-restart is planned for Oracle RAC VDBs or SQL Server clustered VDBs. This makes sense, because each vendor (Oracle and Microsoft) touts their clustering solutions as high-availability solutions, so why should Delphix have to worry about cluster nodes being restarted? For those who found the probability of unexpected reboot of their database servers a remote possibility, then the probability of unexpected reboot of database clusters seems an even more remote possibility.
However, in my role within Delphix Fields Services, I did indeed have a customer who requested that
Automatic VDB Restart After Reboot be available for Oracle RAC clusters, so I thought about it, and this is what resulted...
Requirements for Automatic VDB Restart After Reboot on one (or more) Oracle RAC nodes
If one or more nodes in an Oracle RAC cluster is rebooted, then the Delphix VDB database instance on the affected nodes will be terminated when the OS is halted. Upon reboot, there is nothing to automatically restart the VDB after the Oracle clusterware on the affected nodes has been restarted.
Most likely, it is the issue to time the restart of a VDB database instance only after the Oracle clusterware services (CRS) on the node has been restarted. Attempting to start an Oracle RAC database instance before the CRS is fully operational will hang or fail.
I knew the solution had to be as simple (or simpler) than advising the customer to manually restart the VDB instances, so we certainly could not have a complex
Rube Goldberg machine.
As it turns out, Oracle provides for high-availability management of other applications by the Oracle clusterware itself, as documented
HERE. I had used this mechanism back in 2007 to automate the restart of a product called Teleran iSight within an Oracle10g RAC cluster, but that was a long time ago, and I seemed to retain a faint whiff of Rube Goldberg in my memory of that exercise.
Custom application management by Oracle Clusterware
As it turns out, I was incorrect. It was not complicated. As twelve years had passed, I had completely forgotten that I had solved a similar problem, and I had complegely forgotten that I had also created a one-hour presentation for Oracle users groups on the topic, which I found as I was googling. It is always a little disconcerting yet satisfying to find your own prior words as the best reference answering your own current question.
I reviewed my presentation from so long ago, compared it to the current Oracle documentation, and found that the components and steps to deploy this solution had changed very little between Oracle10g and Oracle18c. In fact, it was even simpler.
Oracle clusterware provides the vital framework for monitoring and managing highly-available applications, because it was built to make Oracle database highly available. To make a long story short, there are just four actions to be performed by the clusterware...
- check whether the application instance is "up" or "down" on a cluster node
- start a "down" application instance on a cluster node
- stop an "up" application instance on a cluster node
- clean or stop an application instance on a cluster and leave it in a state ready to be started
The Oracle cluster framework leaves an opening for a custom-built program capable of performing any of these actions on the application instance when called with the appropriate command-line parameter value. So the framework consists entirely of supported Oracle clusterware code, but one small piece -- the action script -- contains the logic to interact with the custom application.
Once the action script is written and tested, then it can be registered into an Oracle RAC cluster environment as a new resource using the Oracle clusterware
crsctl utility.
Here is the key design concept: I do not want to monitor an Oracle database instance on a RAC cluster node. DBAs may have reasons to start and stop instances of a Delphix RAC VDB using either SQL*Plus, Oracle Enterprise Manager, or the Oracle srvctl utility. If my solution attempted to restart a VDB instance every time it went down, then my solution may turn into more of a problem than a solution.
Instead, I wanted to monitor the availability of the NFS file-system mount-point in which the Delphix VDB database instance resides. This was something that local control of database instances would not normally impact. The only reasons that the NFS mount-point exists is to support the VDB. The only valid reason that the NFS mount-point might be missing is if the Delphix administrator performed a
stop VDB,
disable VDB, or
delete VDB action. Other than that, the only reason that the NFS mount-point might be unmounted is if the server had been rebooted unexpectedly.
Detailed explanation of the solution
So, the "action script" for the Oracle clusterware application needed to monitor whether the NFS mount-point for the VDB was "up" (mounted) or "down" (unmounted).
Here is the Oracle clusterware "action script" for a Delphix RAC VDB...
1 #!/bin/bash
2 #================================================================================
3 # File: vdb_action.sh
4 # Type: bash-shell script
5 # Date: 26-July 2019
6 # Author: Delphix Field Services
7 # Ownership: This script is owned and maintained by the user, not by Delphix
8 #
9 # Licensed under the Apache License, Version 2.0 (the "License");
10 # you may not use this file except in compliance with the License.
11 # You may obtain a copy of the License at
12 #
13 # http://www.apache.org/licenses/LICENSE-2.0
14 #
15 # Unless required by applicable law or agreed to in writing, software
16 # distributed under the License is distributed on an "AS IS" BASIS,
17 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 # See the License for the specific language governing permissions and
19 # limitations under the License.
20 #
21 # Copyright (c) 2019 by Delphix. All rights reserved.
22 #
23 # Description:
24 #
25 # This is an "action script" for an Oracle Clusterware customized resource
26 # to perform actions initiated by Oracle Clusterware (CRS) for a "Generic
27 # Resource".
28 #
29 # The recommended syntax for creating the customized generic resource is:
30 #
31 # crsctl add resource <resource-name> \
32 # -type generic_application \
33 # -attr "SERVER_POOLS=Generic,\
34 # START_PROGRAM='<full-path-to-this-script> start <db-name>',\
35 # STOP_PROGRAM='<full-path-to-this-script> stop <db-name>',\
36 # CHECK_PROGRAMS='<full-path-to-this-script> check <db-name>',\
37 # CLEAN_PROGRAM='<full-path-to-this-script> clean <db-name>',\
38 # CARDINALITY=%CRS_SERVER_POOL_SIZE%,\
39 # AUTO_START=restore,\
40 # SCRIPT_TIMEOUT=75
41 #
42 # where...
43 #
44 # <resource-name> CRS resource name (no spaces)
45 # <full-path-to-this-script> full path to this shell script on
46 # all RAC cluster nodes
47 # <db-name> value of DB_UNIQUE_NAME initialization parameter
48 #
49 # Note:
50 #
51 # CRSCTL commands can be executed from any one of the nodes in the RAC cluster
52 # while connected to the shell as the Oracle clusterware owner OS account
53 # (i.e. "grid", "oracle").
54 #
55 # Standard Oracle documentation:
56 #
57 # Entitled "Making Applications Highly Available Using Oracle Clusterware"
58 # for version 12.2 can be found at...
59 #
60 # https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwadd/making-applications-highly-available-using-oracle-clusterware.htm l#GUID-13400D83-3FF4-482E-B7A7-41C864A491DC
61 #
62 # Calling syntax:
63 #
64 # vdb_action.sh action-type db-name
65 #
66 # Calling parameters:
67 #
68 # action-type one of the following: start, stop, check, clean
69 # db-name value of DB_UNIQUE_NAME initialization parameter
70 #
71 # Modifications:
72 # TGorman 26jul19 written
73 # TGorman 19aug19 added db-name parameter
74 #================================================================================
75 #
76 #--------------------------------------------------------------------------------
77 # Verify that there are two command-line parameters...
78 #--------------------------------------------------------------------------------
79 if (( $# != 2 ))
80 then
81 echo "`date` [ERROR] Usage: \"$0 action-type db-name\"; aborting..."
82 exit 1
83 fi
84 #
85 #--------------------------------------------------------------------------------
86 # Please customize the following variables for local preferences...
87 #--------------------------------------------------------------------------------
88 _dbName=$2
89 _dlpxEngine="dlpxengine.xyz.com"
90 _baseMntPath="/delphix/provision"
91 _logDir="/tmp"
92 _dlpxUser="mgmt_`hostname -s`"
93 #
94 #--------------------------------------------------------------------------------
95 # Please do not change the following script variables unless prepared to make
96 # changes elsewhere to the script...
97 #--------------------------------------------------------------------------------
98 _hostName="`hostname -s`"
99 _fileLabel="delphix_monitor"
100 _monitorVdbDir=${_baseMntPath}/${_dbName}/datafile
101 _monitorVdbFile=${_monitorVdbDir}/${_fileLabel}_${_dbName}_${_hostName}.txt
102 _logFile=${_logDir}/${_fileLabel}_${1}_${_dbName}_${_hostName}_$$.txt
103 typeset -i _loopSleep=3
104 typeset -i _loopLimit=100
105 #
106 #--------------------------------------------------------------------------------
107 # Clean out previous log files...
108 #--------------------------------------------------------------------------------
109 rm -f /tmp/${_fileLabel}_${1}_${_dbName}_${_hostName}_*.txt
110 #
111 #--------------------------------------------------------------------------------
112 # perform START, STOP/CLEAN, and CHECK operations...
113 #--------------------------------------------------------------------------------
114 case "$1" in
115 #------------------------------------------------------------------------
116 # START operation will first connect to the Delphix virtualization engine
117 # and attempt to START the VDB...
118 #------------------------------------------------------------------------
119 start) ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile}
120 /source select "${_dbName}" start; commit; exit
121 __EOF__
122 #
123 #--------------------------------------------------------
124 # if the output from the Delphix CLI command includes the
125 # phrase "Error:", then it is only possibly an actual
126 # error...
127 #--------------------------------------------------------
128 if grep "Error:" ${_logFile} > /dev/null 2>&1
129 then
130 #
131 #------------------------------------------------
132 # if the output from the Delphix CLI command also
133 # includes the verbiage "only one job can be active",
134 # then it is not really an error, just a concurrency
135 # problem with multiple nodes sending a START at the
136 # same time...
137 #------------------------------------------------
138 if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1
139 then
140 typeset -i _loopCnt=0
141 while (( ${_loopCnt} < ${_loopLimit} ))
142 do
143 if [ -d ${_monitorVdbDir} ]
144 then
145 break
146 else
147 sleep ${_loopSleep}
148 fi
149 typeset -i _loopCnt=${_loopCnt}+1
150 done
151 if (( ${_loopCnt} >= ${_loopLimit} ))
152 then
153 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount failure" | tee -a ${_logFile}
154 exit 1
155 fi
156 else
157 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile}
158 exit 1
159 fi
160 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile}
161 else
162 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile}
163 fi
164 #
165 #----------------------------------------------------------
166 # If the START operation via the Delphix CLI succeeded, then
167 # put a timestamp into the "monitor file" within the NFS
168 # mountpoint in which the VDB resides...
169 #----------------------------------------------------------
170 if date > ${_monitorVdbFile} 2>> ${_logFile}
171 then
172 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
173 exit 0
174 else
175 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - date failure" | tee -a ${_logFile}
176 exit 1
177 fi
178 ;;
179 #------------------------------------------------------------------------
180 # STOP and CLEAN are the same actions...
181 #
182 # If the "monitor file" within the NFS mountpoint in which the VDB resides
183 # is accessible, then remove it and then contact the Delhpix virtualization
184 # engine in order to STOP the VDB...
185 #------------------------------------------------------------------------
186 stop|clean) if rm -f ${_monitorVdbFile} 2>&1 >> ${_logFile}
187 then
188 #
189 #------------------------------------------------
190 # if the "rm -f" command succeeded, then the NFS
191 # mount-point is still mounted and the VDB instance
192 # needs to be stopped...
193 #------------------------------------------------
194 ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile}
195 /source select "${_dbName}"; stop; commit; exit
196 __EOF__
197 if grep "Error:" ${_logFile} > /dev/null 2>&1
198 then
199 if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1
200 then
201 typeset -i _loopCnt=0
202 while (( ${_loopCnt} < ${_loopLimit} ))
203 do
204 if [ -d ${_monitorVdbDir} ]
205 then
206 break
207 else
208 sleep ${_loopSleep}
209 fi
210 typeset -i _loopCnt=${_loopCnt}+1
211 done
212 if (( ${_loopCnt} >= ${_loopLimit} ))
213 then
214 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount-point failure" | tee -a ${_logFile}
215 exit 1
216 fi
217 else
218 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile}
219 exit 1
220 fi
221 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile}
222 else
223 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile}
224 fi
225 else
226 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - rm failure" | tee -a ${_logFile}
227 echo "\"rm -f ${_monitorVdbFile}\" failed" >> ${_logFile}
228 exit 1
229 fi
230 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
231 exit 0
232 ;;
233 #------------------------------------------------------------------------
234 # CHECK operation involves simply checking for the accessibility of the
235 # "monitor" file residing within the NFS mount-point of the VDB...
236 #------------------------------------------------------------------------
237 check) if [ -e ${_monitorVdbFile} ]
238 then
239 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
240 exit 0
241 else
242 echo "`date` [ERROR] \"$0 $1 $2\" failure on \"${_hostName}\"" | tee -a ${_logFile}
243 exit 1
244 fi
245 ;;
246 #------------------------------------------------------------------------
247 # invalid command-line parameter passed to the script...
248 #------------------------------------------------------------------------
249 *) echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - invalid action requested" | tee -a ${_logFile}
250 exit 1
251 ;;
252 esacHERE is a clean non-numbered download copy of the shell-script.
At 252 lines, it seems lengthy, but as you can see, well over half of the lines are comments. So here is a quick summary of the parts of the script...
- lines 1-75
- header comments and explanatory description, including disclaimers, call syntax, and Oracle doc references
- this is all worth reading...
- lines 76-84
- validation of command-line parameters
- lines 85-93
- script variables which should be considered to be changed to adhere to the local server directory paths
- lines 94-110
- script variables which should only be changed if you REALLY know what you're doing :)
- lines 111-178
- script logic pertaining to the START operation for the RAC VDB
- lines 179-232
- script logic pertaining to the STOP/CLEAN operations (which are functionally equivalent)
- lines 233-252
- script logic pertaining to the CHECK operation
This just gives you a "table of contents" for the script. It is pretty well commented, so for fine details, I encourage you to read them thoroughly.
The first deployment step is to store this "action script" in the same directory pathname on all nodes in the RAC cluster.
Then, the last deployment step is to register the script within the Oracle clusterware by adding a clusterware resource using the Oracle CRSCTL utility...
crsctl add resource <resource-name> \
-type generic_application \
-attr "SERVER_POOLS=Generic,\
START_PROGRAM='<full-path-to-action-script> start <db-unq-name>',\
STOP_PROGRAM='<full-path-to-action-script> stop <db-unq-name>',\
CHECK_PROGRAMS='<full-path-to-action-script> check <db-unq-name>',\
CLEAN_PROGRAM='<full-path-to-action-script> clean <db-unq-name>',\
CARDINALITY=%CRS_SERVER_POOL_SIZE%,\
AUTO_START=restore,\
SCRIPT_TIMEOUT=75"
where...
- <resource-name> CRS resource name (no spaces)
- <full-path-to-action-script> full path to the action script on all RAC cluster nodes
- <db-unq-name> value of DB_UNIQUE_NAME initialization paramete
Some comments about this CRSCTL ADD RESOURCE command...
- If it hasn't already been made clear, this is the single command which creates the monitoring resource within the Oracle Clusterware ecosystem
- Oracle clusterware commands re available only to the OS account which is the "owner" of Oracle clusterware
- in some installations, there is only one OS account (i.e. "oracle") which is the "owner" of all Oracle software: clusterware, RDBMS, etc
- in other installations, there is a separation of duties between the OS account (i.e. "grid") which is the "owner" of Oracle clusterware versus the OS account(s) (i.e. "oracle", etc) which are the owner(s) of the Oracle RDBMS software installation(s)
- The START_PROGRAM, STOP_PROGRAM, CHECK_PROGRAM, and CLEAN_PROGRAM attributes identify the calls to be made to the custom-built "action script"
- The SERVER_POOLS and CARDINALITY attributes are short-hand for indicating that all nodes in the RAC cluster use this resource
- The inclusion of the SCRIPT_TIMEOUT attribute resolves the puzzlement from the mysterious and inconsistent failures of START operations
- while VDB startup generally occurs in less than the default SCRIPT_TIMEOUT of 60 seconds, it is easy to breach that time for a variety of reasons in a variety of clusters and situations
- be sure to set this value far higher than necessary to ensure that everything works
- then possibly lower the timeout guardedly (if at all) after better understanding how long it normally takes to START
Additional CRSCTL utility commands allow management of the clusterware resource...
- crsctl delete resource <resource-name> [ -f ]
- optional "-f" switch is "force"
- crsctl status resource <resource-name>
- crsctl start resource <resource-name>
- crsctl stop resource <resource-name>
- crsctl clean resource <resource-name>
Dealing with the race condition
If you read through the shell script, then you may have noticed some checking for the phrase "
only one job can be active for an object" in the output returned from the Delphix CLI commands to start or stop the RAC VDB.
This is because the clusterware resource is likely being executed independently on more than one RAC cluster node at the same time. For example, if all of the RAC cluster nodes had been restarted, then as each of them restart and the Oracle clusterware starts checking the status of resources, then one or more action scripts may detect that the RAC VDB is stopped, all at the same time. As a result, an eight-node RAC cluster would result in eight different START VDB commands being submitted to the Delphix virtualization engine via the Delphix CLI at approximately the same time. In such a scenario, the first job submitted will actually execute on the Delphix engine. The other seven jobs will all fail with the error message stating that "
only one job can be active for an object". So the action script has logic to consider that response as a successful operation, but only if the NFS mount-point eventually becomes visible about halfway into the START VDB operation.
This "race condition" is one that, with some clever programming, I could probably have avoided. However, I chose to accept the race condition and keep the logic in the action script simpler. The upshot is that Delphix administrators will notice some failed START VDB or STOP VDB operations around the time that the same operation succeeded. This should be a clue that the VDB was stopped or started by Oracle clusterware.
To this day, I still don't know if it was the right decision to work around the race condition in this manner. I leave it to you to decide?
A cautionary note, obvious in hindsight...
If you create this clusterware resource, you must obviously coordinate its usage with the Delphix administrator's management of the RAC VDB. Obviously, the clusterware resource should first be stopped (i.e. "crsctl stop resource
<resource-name>" before the Delphix administrator performs operations like
VDB refresh,
VDB rewind,
VDB stop,
VDB disable, and
VDB drop. Likewise, confusion is bound to result should the Oracle clusterware administrator decide to begin issuing CRSCTL STOP|STOP|CLEAN RESOURCE commands without coordinating with the Delphix administrator.
As there is likely a separation of duties between the role of Delphix administrator and Oracle clusterware administrator, such coordination might easily be overlooked.
Please be cognizant of these possibilities before deciding to adopt this solution.
Using privilege escalation (a.k.a. "sudo", "pbrun", "suexec", etc) to allow the Delphix OS user for the RAC environment (a.k.a. "delphix_os", etc) to execute the CRSCTL command can allow the creation of Delphix hooks to automatically coordinate activities initiated by the Delphix administrator, such as...
- Configure clone
- sudo crsctl add resource <resource-name> ...
- Pre-Refresh, Pre-Rewind, Pre-Stop
- sudo crsctl stop resource <resource-name>
- Post-Refresh, Post-Rewind, Post-Stop
- sudo crsctl start resource <resource-name>
- Pre-Start
- sudo crsctl status resource <resource-name>
The customer for whom I wrote this RAC VDB autorestart functionality has not yet obtained permissions from their InfoSec team to grant the "sudo" privileges to make these hooks possible.
Is this all really necessary?
My other concern remains: was this really necessary? In other words, do unexpected server reboots happen often enough in a RAC cluster to warrant this automation?
I am of the group for whom the answer is "no". If unexpected server restarts are happening, then you determine the root cause and fix it, instead of instrumenting other applications to work around it. But then again, if a customer's environment involves procedures which correctly or incorrectly perform server reboots unexpectedly, such as the case with many Windows farms, then this functionality would make sense.
As a result, it is probably useful to pay more attention to VDB autorestart for Windows failover clusters or always-on availability groups.
But it is worth testing this in a real-life application, and I can assure you that testing is on-going. I will let you know what I learn, when I learn it.
------------------------------
Tim Gorman
Senior Technical Manager
Delphix
------------------------------