The previous post in this 3-part series of posts covered an overview of How Delphix In-Place Masking Works. Continuing with the second post, this covers the internal testing that was performed to improve the masking performance ...
Previous post Next in the series
The following test cases are performed individually and observed the improvement for each of the specific tests. All these tests are performed on Delphix VDB’s with In-Place Masking Option.
- Archive Log Mode On vs Archive Log Mode Off
- Different Redo Log file size comparisons (50MB,500MB,1G,2G and 4G, 32GB)
- Synchronous vs Asynchronous Log Writer
- ORA-01555 Snapshot too old
- SGA and Log Buffer Modifications
- Overall Recommendations
The benchmarking environment contains…
- Delphix Virtualization Engine
- Delphix Masking Engine
- Source and target-servers which are setup in AWS EC2 with Cold HDD (sc1) disks at the backend
Multiple iterations of testing are performed with the below baseline Masking configuration...
- 5 Masking Jobs
- 5 Tables in each ruleset
- 1M rows in each table
- 9 columns per table masked (having Secure lookup, Date shift Algorithms)
- TOTAL 225 columns masked during each iteration
1) Archive Log Mode On vs Archive Log Mode Off - 30-35% performance improvement with databases running in NO ARCHIVELOG mode
Having the VDB in NOARCHIVELOG mode has 30-35% performance improvement over running it in ARCHIVELOG mode. With archive log mode ON the VDB’s copy the redo logs into an archive logs generating an additional IO requests. As those logs are not required until the VDB is securely masked, disabling the archive mode will improve the database performance.
2) Minimize redo log switching - 20% performance improvement after increasing online xredo log files from 50M to 1024M
The default VDB log file size configuration in Delphix VDBs is 50MB with 3 log file groups. Having a VDB with smaller logfile size eg: 50MB, we observed contention on the log writer and checkpoint. This can be observed from the Top Wait events in an Oracle AWR report.
Log file switch (checkpoint incomplete) event indicates that Oracle wants to reuse a redo log file, but the current checkpoint position is still in that logfile and the logfile switch is waiting for the checkpoint operation to complete. A checkpoint operation consists of the DBWR process(es) flushing all changed datafile blocks to disk, so the DBWR is not completing its job in time. The remediation might include increasing DB_WRITER_PROCESSES or more frequent checkpointing so that the only checkpoint does not occur on log switch.
The optimal recommendation is to have a minimum of 1024MB as the logfile size with at least 3 logfile groups.
3) Switch from synchronous to asynchronous LGWR write I/O - 15-25% performance improvement over default synchronous LGWR
COMMIT_WRITE = NOWAIT for Oracle11g and below and COMMIT_WAIT = NOWAIT for Oracle12c and above are used to set the Asynchronous LGWR.
ASYNC LGWR is where the Oracle/LGWR process doesn't need to wait until the redo content from log buffer is written to the log files. With NOWAIT option the commit command is asynchronous. The log writer (LGWR) does not wait and write each transaction on commit. It can return before the relevant redo information is written to the online redo log.
4) ORA-01555 Snapshot too old
The ORA-1555 message indicates "rollback records needed by a reader for consistent read are overwritten by other writers". Typically this happens when a masking job starts and as it running, other users change the same data required for the data consistency. Oracle cannot access the original copy of the data from when the masking job query started, and the changes cannot be undone by Oracle as the rollback data present in the undo tablespace is overwritten. This relates to insufficient rollback segments or undo_retentions parameter values that are not large enough.
The recommendations are to have larger UNDO tablespace with autoextend ON with MAXSIZE very large, UNDO_RETENTION set to a value greater than the masking job runtime, and set RETENTION GUARANTEE on the UNDO tablespace.
Steps to eliminate ORA-01555 errors in single-instance non-RAC Oracle databases…
- Create a new UNDO tablespace with an extremely large auto-extension maximum size and the RETENTION GUARANTEE attribute set. Example syntax is...
CREATE BIGFILE UNDO TABLESPACE DELPHIX_MASKING_UNDOTBS
DATAFILE SIZE 4096M
AUTOEXTEND ON NEXT 512M MAXSIZE 1048576M
- Set the initialization parameter UNDO_RETENTION parameter to indicate an extremely long period of time, at least twice as long as the longest-duration masking job (i.e. 86400 seconds = 24 hours, etc)
- ALTER SYSTEM SET UNDO_RETENTION = 86400 SID=’*’;
- The quickest way to switch the database instance from using the old UNDO tablespace to the newly-created DELPHIX_MASKING_UNDOTBS tablespace is to do the following steps…
- ALTER SYSTEM SET UNDO_TABLESPACE = ‘DELPHIX_MASKING_UNDOTBS’ SCOPE=SPFILE SID=’instance-name’;
- SHUTDOWN IMMEDIATE
- To verify that changes performed in previous steps took effect, execute the following from SQL*Plus in the newly-restarted database instance...
- SHOW PARAMETER UNDO_RETENTION
- SHOW PARAMETER UNDO_TABLESPACE
For multi-instance RAC databases, a separate UNDO tablespace will need to be created for each database instance, and the initialization parameters should be set on all database instances.
So, step 1 above would be repeated N times for N instances, with an appropriate sequence number appended to the tablespace name.
Step 2 can be performed as documented above, either for RAC or non-RAC environments, due to the “SID=’*’” clause.
Step 3a would have to be performed N times for N instances, with the appropriate ORACLE_SID or INSTANCE_NAME value for each iteration. Then, steps 3b and 3c might be best performed using the SRVCTL STOP DATABASE and SRVCTL START DATABASE commands.
5) SGA and Log Buffer Modifications
The masking job select process translates to db file sequential read events which refers to a physical read of a single Oracle block from the disk. It is usually caused by reading an index block or accessing a table via a rowid. The name of the event includes the word “sequential”, but the name derives from the manner of execution sequence rather than the manner of I/O. The read requests themselves aren’t sequential, but the calls are made sequentially. These are random-access, single-block read I/O to the datafiles, usually used in indexed scans. For FULL table scans, where large sequential requests for multiple blocks at a time are made, the events are labeled direct path read, also indicating that database I/O buffering is bypassed in these bulk operations.
Increasing the SGA and log buffer sizes didn’t have any impact to the overall masking performance.
6) 40-50% improvement by having ARCHIVE Log OFF, Large Logfile size (1024MB) and ASYNC LGWR
Combining NOARCHIVELOG mode with larger online redo log file size (1024 MB minimum) and asynchronous LGWR options, we saw an improvement of 40-50%.
This concludes the second in this 3-part series of posts. Please click Next in the series below to continue reading the third post...
Previous in the series Next in the series