Delphix Products

 View Only

Auto-Restart for Oracle RAC VDBs

  • 1.  Auto-Restart for Oracle RAC VDBs

    Posted 10-13-2019 09:41:00 AM
      |   view attached


    One of the more interesting features that slipped into the Delphix virtualization engine a few years ago is Automatic VDB Restart On Target Server After Reboot, explained in the documentation HERE.

    The documentation for this feature starts with this paragraph...
    The Delphix Engine now automatically detects whether a target server has been rebooted, and proactively restarts any VBD on that server that was previously up and running. This is independent of data platform. It is done as if you realized a target server was restarted and issued a start command from the Delphix Engine. This feature is compatible with Jet Stream ordering dependencies and is limited to non-clustered VDBs.
    OK, that seems like a nice feature to have.  For many people, it may seem like an almost useless feature, since database servers are engineered to fail very seldom, if at all.

    However, what this viewpoint is not taking into account is the common operating procedure to reboot Windows servers periodically.  Organizations which continue to adhere to this antiquated habit will reboot servers on a rolling schedule during low-usage periods (i.e. weekends), sometimes without prior notification.  Even with prior notification, system administrators may not always restart all services in a timely manner, if at all.

    So following either a planned reboot or an expected failure, this feature helpfully restarts failed VDBs for all platforms, whether UNIX/Linux or Windows, and for all data platforms, whether SQL Server, Oracle, SAP ASE, SAP HANA, DB2, or PostgreSQL.

    But immediately following this paragraph in the documentation is the dreaded fine print...
    Note: It does not work for Oracle RAC VDBs, Oracle 12c PDB/CDB or MSSQL cluster VDBs
    Translated, this means it does not work for clusters, and that it does not work for Oracle vCDBs or vPDBs.  Now, just speculating, but I believe that there is a good chance that the caveat about vCDBs and vPDBs might soon be retracted as Delphix virtualization expands further into the Oracle multi-tenant product feature set.

    But I have not seen or heard indication that a VDB auto-restart is planned for Oracle RAC VDBs or SQL Server clustered VDBs.  This makes sense, because each vendor (Oracle and Microsoft) touts their clustering solutions as high-availability solutions, so why should Delphix have to worry about cluster nodes being restarted?  For those who found the probability of unexpected reboot of their database servers a remote possibility, then the probability of unexpected reboot of database clusters seems an even more remote possibility.

    However, in my role within Delphix Fields Services, I did indeed have a customer who requested that Automatic VDB Restart After Reboot be available for Oracle RAC clusters, so I thought about it, and this is what resulted...

    Requirements for Automatic VDB Restart After Reboot on one (or more) Oracle RAC nodes

    If one or more nodes in an Oracle RAC cluster is rebooted, then the Delphix VDB database instance on the affected nodes will be terminated when the OS is halted.  Upon reboot, there is nothing to automatically restart the VDB after the Oracle clusterware on the affected nodes has been restarted.

    Most likely, it is the issue to time the restart of a VDB database instance only after the Oracle clusterware services (CRS) on the node has been restarted.  Attempting to start an Oracle RAC database instance before the CRS is fully operational will hang or fail.

    I knew the solution had to be as simple (or simpler) than advising the customer to manually restart the VDB instances, so we certainly could not have a complex Rube Goldberg machine.

    As it turns out, Oracle provides for high-availability management of other applications by the Oracle clusterware itself, as documented HERE.  I had used this mechanism back in 2007 to automate the restart of a product called Teleran iSight within an Oracle10g RAC cluster, but that was a long time ago, and I seemed to retain a faint whiff of Rube Goldberg in my memory of that exercise.

    Custom application management by Oracle Clusterware

    As it turns out, I was incorrect.  It was not complicated.  As twelve years had passed, I had completely forgotten that I had solved a similar problem, and I had complegely forgotten that I had also created a one-hour presentation for Oracle users groups on the topic, which I found as I was googling.  It is always a little disconcerting yet satisfying to find your own prior words as the best reference answering your own current question.

    I reviewed my presentation from so long ago, compared it to the current Oracle documentation, and found that the components and steps to deploy this solution had changed very little between Oracle10g and Oracle18c.  In fact, it was even simpler.

    Oracle clusterware provides the vital framework for monitoring and managing highly-available applications, because it was built to make Oracle database highly available.  To make a long story short, there are just four actions to be performed by the clusterware...

    1. check whether the application instance is "up" or "down" on a cluster node
    2. start a "down" application instance on a cluster node
    3. stop an "up" application instance on a cluster node
    4. clean or stop an application instance on a cluster and leave it in a state ready to be started

    The Oracle cluster framework leaves an opening for a custom-built program capable of performing any of these actions on the application instance when called with the appropriate command-line parameter value.  So the framework consists entirely of supported Oracle clusterware code, but one small piece -- the action script -- contains the logic to interact with the custom application.

    Once the action script is written and tested, then it can be registered into an Oracle RAC cluster environment as a new resource using the Oracle clusterware crsctl utility.

    Here is the key design concept:  I do not want to monitor an Oracle database instance on a RAC cluster node.  DBAs may have reasons to start and stop instances of a Delphix RAC VDB using either SQL*Plus, Oracle Enterprise Manager, or the Oracle srvctl utility.  If my solution attempted to restart a VDB instance every time it went down, then my solution may turn into more of a problem than a solution.

    Instead, I wanted to monitor the availability of the NFS file-system mount-point in which the Delphix VDB database instance resides.  This was something that local control of database instances would not normally impact.  The only reasons that the NFS mount-point exists is to support the VDB.  The only valid reason that the NFS mount-point might be missing is if the Delphix administrator performed a stop VDB, disable VDB, or delete VDB action.  Other than that, the only reason that the NFS mount-point might be unmounted is if the server had been rebooted unexpectedly.

    Detailed explanation of the solution

    So, the "action script" for the Oracle clusterware application needed to monitor whether the NFS mount-point for the VDB was "up" (mounted) or "down" (unmounted).

    Here is the Oracle clusterware "action script" for a Delphix RAC VDB...

      1 #!/bin/bash
      2 #================================================================================
      3 # File:
      4 # Type:         bash-shell script
      5 # Date:         26-July 2019
      6 # Author:       Delphix Field Services
      7 # Ownership:    This script is owned and maintained by the user, not by Delphix
      8 #
      9 # Licensed under the Apache License, Version 2.0 (the "License");
     10 # you may not use this file except in compliance with the License.
     11 # You may obtain a copy of the License at
     12 #
     13 #
     14 #
     15 # Unless required by applicable law or agreed to in writing, software
     16 # distributed under the License is distributed on an "AS IS" BASIS,
     17 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     18 # See the License for the specific language governing permissions and
     19 # limitations under the License.
     20 #
     21 # Copyright (c) 2019 by Delphix. All rights reserved.
     22 #
     23 # Description:
     24 #
     25 #       This is an "action script" for an Oracle Clusterware customized resource
     26 #       to perform actions initiated by Oracle Clusterware (CRS) for a "Generic
     27 #       Resource".
     28 #
     29 #       The recommended syntax for creating the customized generic resource is:
     30 #
     31 #               crsctl add resource <resource-name> \
     32 #                       -type generic_application \
     33 #                       -attr "SERVER_POOLS=Generic,\
     34 #                               START_PROGRAM='<full-path-to-this-script> start <db-name>',\
     35 #                               STOP_PROGRAM='<full-path-to-this-script> stop <db-name>',\
     36 #                               CHECK_PROGRAMS='<full-path-to-this-script> check <db-name>',\
     37 #                               CLEAN_PROGRAM='<full-path-to-this-script> clean <db-name>',\
     38 #                               CARDINALITY=%CRS_SERVER_POOL_SIZE%,\
     39 #                               AUTO_START=restore,\
     40 #                               SCRIPT_TIMEOUT=75
     41 #
     42 #       where...
     43 #
     44 #               <resource-name>                 CRS resource name (no spaces)
     45 #               <full-path-to-this-script>      full path to this shell script on
     46 #                                               all RAC cluster nodes
     47 #               <db-name>                       value of DB_UNIQUE_NAME initialization parameter
     48 #
     49 # Note:
     50 #
     51 #       CRSCTL commands can be executed from any one of the nodes in the RAC cluster
     52 #       while connected to the shell as the Oracle clusterware owner OS account
     53 #       (i.e. "grid", "oracle").
     54 #
     55 # Standard Oracle documentation:
     56 #
     57 #       Entitled "Making Applications Highly Available Using Oracle Clusterware"
     58 #       for version 12.2 can be found at...
     59 #
     60 #         l#GUID-13400D83-3FF4-482E-B7A7-41C864A491DC
     61 #
     62 # Calling syntax:
     63 #
     64 # action-type db-name
     65 #
     66 # Calling parameters:

     67 #
     68 #       action-type     one of the following:  start, stop, check, clean
     69 #       db-name         value of DB_UNIQUE_NAME initialization parameter
     70 #
     71 # Modifications:
     72 #       TGorman 26jul19 written
     73 #       TGorman 19aug19 added db-name parameter
     74 #================================================================================
     75 #
     76 #--------------------------------------------------------------------------------
     77 # Verify that there are two command-line parameters...
     78 #--------------------------------------------------------------------------------
     79 if (( $# != 2 ))
     80 then
     81         echo "`date` [ERROR] Usage: \"$0 action-type db-name\"; aborting..."
     82         exit 1
     83 fi
     84 #
     85 #--------------------------------------------------------------------------------
     86 # Please customize the following variables for local preferences...
     87 #--------------------------------------------------------------------------------
     88 _dbName=$2
     89 _dlpxEngine=""
     90 _baseMntPath="/delphix/provision"
     91 _logDir="/tmp"
     92 _dlpxUser="mgmt_`hostname -s`"
     93 #
     94 #--------------------------------------------------------------------------------
     95 # Please do not change the following script variables unless prepared to make
     96 # changes elsewhere to the script...
     97 #--------------------------------------------------------------------------------
     98 _hostName="`hostname -s`"
     99 _fileLabel="delphix_monitor"
    100 _monitorVdbDir=${_baseMntPath}/${_dbName}/datafile
    101 _monitorVdbFile=${_monitorVdbDir}/${_fileLabel}_${_dbName}_${_hostName}.txt
    102 _logFile=${_logDir}/${_fileLabel}_${1}_${_dbName}_${_hostName}_$$.txt
    103 typeset -i _loopSleep=3
    104 typeset -i _loopLimit=100
    105 #
    106 #--------------------------------------------------------------------------------
    107 # Clean out previous log files...
    108 #--------------------------------------------------------------------------------
    109 rm -f /tmp/${_fileLabel}_${1}_${_dbName}_${_hostName}_*.txt
    110 #
    111 #--------------------------------------------------------------------------------
    112 # perform START, STOP/CLEAN, and CHECK operations...
    113 #--------------------------------------------------------------------------------
    114 case "$1" in
    115         #------------------------------------------------------------------------
    116         # START operation will first connect to the Delphix virtualization engine
    117         # and attempt to START the VDB...
    118         #------------------------------------------------------------------------
    119         start)          ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile}
    120 /source select "${_dbName}" start; commit; exit
    121 __EOF__
    122                         #
    123                         #--------------------------------------------------------
    124                         # if the output from the Delphix CLI command includes the
    125                         # phrase "Error:", then it is only possibly an actual
    126                         # error...
    127                         #--------------------------------------------------------
    128                         if grep "Error:" ${_logFile} > /dev/null 2>&1
    129                         then
    130                                 #
    131                                 #------------------------------------------------
    132                                 # if the output from the Delphix CLI command also
    133                                 # includes the verbiage "only one job can be active",
    134                                 # then it is not really an error, just a concurrency
    135                                 # problem with multiple nodes sending a START at the
    136                                 # same time...
    137                                 #------------------------------------------------
    138                                 if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1
    139                                 then
    140                                         typeset -i _loopCnt=0
    141                                         while (( ${_loopCnt} < ${_loopLimit} ))
    142                                         do
    143                                                 if [ -d ${_monitorVdbDir} ]
    144                                                 then
    145                                                         break
    146                                                 else
    147                                                         sleep ${_loopSleep}
    148                                                 fi
    149                                                 typeset -i _loopCnt=${_loopCnt}+1
    150                                         done
    151                                         if (( ${_loopCnt} >= ${_loopLimit} ))
    152                                         then
    153                                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount failure" | tee -a ${_logFile}
    154                                                 exit 1
    155                                         fi
    156                                 else
    157                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile}
    158                                         exit 1
    159                                 fi
    160                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile}
    161                         else
    162                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile}
    163                         fi
    164                         #
    165                         #----------------------------------------------------------
    166                         # If the START operation via the Delphix CLI succeeded, then
    167                         # put a timestamp into the "monitor file" within the NFS
    168                         # mountpoint in which the VDB resides...
    169                         #----------------------------------------------------------
    170                         if date > ${_monitorVdbFile} 2>> ${_logFile}
    171                         then
    172                                 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
    173                                 exit 0
    174                         else
    175                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - date failure" | tee -a ${_logFile}
    176                                 exit 1
    177                         fi
    178                         ;;
    179         #------------------------------------------------------------------------
    180         # STOP and CLEAN are the same actions...
    181         #
    182         # If the "monitor file" within the NFS mountpoint in which the VDB resides
    183         # is accessible, then remove it and then contact the Delhpix virtualization
    184         # engine in order to STOP the VDB...
    185         #------------------------------------------------------------------------
    186         stop|clean)     if rm -f ${_monitorVdbFile} 2>&1 >> ${_logFile}
    187                         then
    188                                 #
    189                                 #------------------------------------------------
    190                                 # if the "rm -f" command succeeded, then the NFS
    191                                 # mount-point is still mounted and the VDB instance
    192                                 # needs to be stopped...
    193                                 #------------------------------------------------
    194                                 ssh ${_dlpxUser}@${_dlpxEngine} << __EOF__ 2>&1 >> ${_logFile}
    195 /source select "${_dbName}"; stop; commit; exit
    196 __EOF__
    197                                 if grep "Error:" ${_logFile} > /dev/null 2>&1
    198                                 then
    199                                         if grep -i "only one job can be active for an object" ${_logFile} > /dev/null 2>&1
    200                                         then
    201                                                 typeset -i _loopCnt=0
    202                                                 while (( ${_loopCnt} < ${_loopLimit} ))
    203                                                 do
    204                                                         if [ -d ${_monitorVdbDir} ]
    205                                                         then
    206                                                                 break
    207                                                         else
    208                                                                 sleep ${_loopSleep}
    209                                                         fi
    210                                                         typeset -i _loopCnt=${_loopCnt}+1
    211                                                 done
    212                                                 if (( ${_loopCnt} >= ${_loopLimit} ))
    213                                                 then
    214                                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - mount-point failure" | tee -a ${_logFile}
    215                                                         exit 1
    216                                                 fi
    217                                         else
    218                                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - CLI failure" | tee -a ${_logFile}
    219                                                 exit 1
    220                                         fi
    221                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - duplicate, success" | tee -a ${_logFile}
    222                                 else
    223                                         echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - success" | tee -a ${_logFile}
    224                                 fi
    225                         else
    226                                 echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - rm failure" | tee -a ${_logFile}
    227                                 echo "\"rm -f ${_monitorVdbFile}\" failed" >> ${_logFile}
    228                                 exit 1
    229                         fi
    230                         echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
    231                         exit 0
    232                         ;;
    233         #------------------------------------------------------------------------
    234         # CHECK operation involves simply checking for the accessibility of the
    235         # "monitor" file residing within the NFS mount-point of the VDB...
    236         #------------------------------------------------------------------------
    237         check)          if [ -e ${_monitorVdbFile} ]
    238                         then
    239                                 echo "`date` [INFO] \"$0 $1 $2\" success on \"${_hostName}\"" | tee -a ${_logFile}
    240                                 exit 0
    241                         else
    242                                 echo "`date` [ERROR] \"$0 $1 $2\" failure on \"${_hostName}\"" | tee -a ${_logFile}
    243                                 exit 1
    244                         fi
    245                         ;;
    246         #------------------------------------------------------------------------
    247         # invalid command-line parameter passed to the script...
    248         #------------------------------------------------------------------------
    249         *)              echo "`date` [ERROR] \"$0 $1 $2\" on \"${_hostName}\" - invalid action requested" | tee -a ${_logFile}
    250                         exit 1
    251                         ;;
    252 esac
    HERE is a clean non-numbered download copy of the shell-script.

    At 252 lines, it seems lengthy, but as you can see, well over half of the lines are comments.  So here is a quick summary of the parts of the script...

    • lines 1-75
      • header comments and explanatory description, including disclaimers, call syntax, and Oracle doc references
      • this is all worth reading...
    • lines 76-84
      • validation of command-line parameters
    • lines 85-93
      • script variables which should be considered to be changed to adhere to the local server directory paths
    • lines 94-110
      • script variables which should only be changed if you REALLY know what you're doing  :)
    • lines 111-178
      • script logic pertaining to the START operation for the RAC VDB
    • lines 179-232
      • script logic pertaining to the STOP/CLEAN operations (which are functionally equivalent)
    • lines 233-252
      • script logic pertaining to the CHECK operation

    This just gives you a "table of contents" for the script.  It is pretty well commented, so for fine details, I encourage you to read them thoroughly.

    The first deployment step is to store this "action script" in the same directory pathname on all nodes in the RAC cluster.

    Then, the last deployment step is to register the script within the Oracle clusterware by adding a clusterware resource using the Oracle CRSCTL utility...

    crsctl add resource <resource-name> \
               -type generic_application \
               -attr "SERVER_POOLS=Generic,\
    <full-path-to-action-script> start <db-unq-name>',\
    <full-path-to-action-script> stop <db-unq-name>',\
    <full-path-to-action-script> check <db-unq-name>',\
    <full-path-to-action-script> clean <db-unq-name>',\
    • <resource-name>                 CRS resource name (no spaces)
    • <full-path-to-action-script>    full path to the action script on all RAC cluster nodes
    • <db-unq-name>                   value of DB_UNIQUE_NAME initialization paramete

    Some comments about this CRSCTL ADD RESOURCE command...

    • If it hasn't already been made clear, this is the single command which creates the monitoring resource within the Oracle Clusterware ecosystem
    • Oracle clusterware commands re available only to the OS account which is the "owner" of Oracle clusterware
      • in some installations, there is only one OS account (i.e. "oracle") which is the "owner" of all Oracle software: clusterware, RDBMS, etc
      • in other installations, there is a separation of duties between the OS account (i.e. "grid") which is the "owner" of Oracle clusterware versus the OS account(s) (i.e. "oracle", etc) which are the owner(s) of the Oracle RDBMS software installation(s)
    • The START_PROGRAM, STOP_PROGRAM, CHECK_PROGRAM, and CLEAN_PROGRAM attributes identify the calls to be made to the custom-built "action script"
    • The SERVER_POOLS and CARDINALITY attributes are short-hand for indicating that all nodes in the RAC cluster use this resource
    • The inclusion of the SCRIPT_TIMEOUT attribute resolves the puzzlement from the mysterious and inconsistent failures of START operations
      • while VDB startup generally occurs in less than the default SCRIPT_TIMEOUT of 60 seconds, it is easy to breach that time for a variety of reasons in a variety of clusters and situations
      • be sure to set this value far higher than necessary to ensure that everything works
        • then possibly lower the timeout guardedly (if at all) after better understanding how long it normally takes to START

    Additional CRSCTL utility commands allow management of the clusterware resource...

    • crsctl delete resource <resource-name> [ -f ]
      • optional "-f" switch is "force"
    • crsctl status resource <resource-name>
    • crsctl start resource <resource-name>
    • crsctl stop resource <resource-name>
    • crsctl clean resource <resource-name>

    Dealing with the race condition

    If you read through the shell script, then you may have noticed some checking for the phrase "only one job can be active for an object" in the output returned from the Delphix CLI commands to start or stop the RAC VDB.

    This is because the clusterware resource is likely being executed independently on more than one RAC cluster node at the same time.  For example, if all of the RAC cluster nodes had been restarted, then as each of them restart and the Oracle clusterware starts checking the status of resources, then one or more action scripts may detect that the RAC VDB is stopped, all at the same time.  As a result, an eight-node RAC cluster would result in eight different START VDB commands being submitted to the Delphix virtualization engine via the Delphix CLI at approximately the same time.  In such a scenario, the first job submitted will actually execute on the Delphix engine.  The other seven jobs will all fail with the error message stating that "only one job can be active for an object".  So the action script has logic to consider that response as a successful operation, but only if the NFS mount-point eventually becomes visible about halfway into the START VDB operation.

    This "race condition" is one that, with some clever programming, I could probably have avoided.  However, I chose to accept the race condition and keep the logic in the action script simpler.  The upshot is that Delphix administrators will notice some failed START VDB or STOP VDB operations around the time that the same operation succeeded.  This should be a clue that the VDB was stopped or started by Oracle clusterware.

    To this day, I still don't know if it was the right decision to work around the race condition in this manner.  I leave it to you to decide?

    A cautionary note, obvious in hindsight...

    If you create this clusterware resource, you must obviously coordinate its usage with the Delphix administrator's management of the RAC VDB.  Obviously, the clusterware resource should first be stopped (i.e. "crsctl stop resource <resource-name>" before the Delphix administrator performs operations like VDB refresh, VDB rewind, VDB stop, VDB disable, and VDB drop.  Likewise, confusion is bound to result should the Oracle clusterware administrator decide to begin issuing CRSCTL STOP|STOP|CLEAN RESOURCE commands without coordinating with the Delphix administrator.

    As there is likely a separation of duties between the role of Delphix administrator and Oracle clusterware administrator, such coordination might easily be overlooked.

    Please be cognizant of these possibilities before deciding to adopt this solution.

    Using privilege escalation (a.k.a. "sudo", "pbrun", "suexec", etc) to allow the Delphix OS user for the RAC environment (a.k.a. "delphix_os", etc) to execute the CRSCTL command can allow the creation of Delphix hooks to automatically coordinate activities initiated by the Delphix administrator, such as...

    • Configure clone
      • sudo crsctl add resource <resource-name> ...
    • Pre-Refresh, Pre-Rewind, Pre-Stop
      • sudo crsctl stop resource <resource-name>
    • Post-Refresh, Post-Rewind, Post-Stop
      • sudo crsctl start resource <resource-name>
    • Pre-Start
      • sudo crsctl status resource <resource-name>

    The customer for whom I wrote this RAC VDB autorestart functionality has not yet obtained permissions from their InfoSec team to grant the "sudo" privileges to make these hooks possible.

    Is this all really necessary?

    My other concern remains:  was this really necessary?  In other words, do unexpected server reboots happen often enough in a RAC cluster to warrant this automation?

    I am of the group for whom the answer is "no".  If unexpected server restarts are happening, then you determine the root cause and fix it, instead of instrumenting other applications to work around it.  But then again, if a customer's environment involves procedures which correctly or incorrectly perform server reboots unexpectedly, such as the case with many Windows farms, then this functionality would make sense.

    As a result, it is probably useful to pay more attention to VDB autorestart for Windows failover clusters or always-on availability groups.

    But it is worth testing this in a real-life application, and I can assure you that testing is on-going.  I will let you know what I learn, when I learn it.

    Tim Gorman
    Senior Technical Manager


    txt   9 KB 1 version