Delphix Products

 View Only

Part 1 of 3: Delphix Masking Best Practices for Performance on Oracle VDBs

By Krishna Kapa posted 07-26-2019 03:56:51 PM


This is the first post in a 3-part series, with an overview of
How Delphix In-Place Masking Works.  The next post in the series describes Test Cases Performed and summarized Results, and the last post details how to deploy the recommendations...
                                                                                                                                                                                           Next in the series


The intended audience for this blog are Delphix Administrators, Oracle DBAs and Application Architect/Users who use the Delphix Masking Interface for in-place masking on Oracle VDBs 


The blog discusses about the Oracle Database and Delphix Settings that improve the overall Delphix Masking performance. The testing and improvements are measured against rows processed per minute and Job Completion times. 


The improvements/results are observed purely on an environment which was used during the testing.  These results may vary based on many factors like customer’s environment, algorithms used, number of columns masked, data sample set, database/OS versions etc.  

Please don’t consider these as the final improvement numbers. Using the recommendations in the blog may improve performance and those may vary in each environment. 

The recommendations are purely to improve the Delphix in-place masking performance during ETL/batch process with little/no significance on reliability, availability on the masked VDB. The expectation is that the masked VDB can be rewound/refreshed to a previous/latest point from dSource even if there is a catastrophic event to the VDB. 

These recommendations are only for the masked VDB and inline with most ETL jobs and doesn’t apply for a regular VDB’s.  

How Delphix In-Place Masking works

Delphix Masking is an ETL (Extract Transform and Load) process which reads data into memory, performs the masking transformation, and writes the data back to the same/different database.  Delphix masking uses array-processing for both reads (i.e. SELECTs) and writes (i.e. UPDATEs)

  1. SELECT phase retrieves unmasked columns including primary key or ROWID
  2. Masking of column values takes place within Delphix masking engine
  3. UPDATE phase sets masked columns using primary key or ROWID

Each table from the ruleset is masked in separated stream and it will use two Oracle sessions. One session will be used for reading data from database ( SELECT phase ) and the second session will be used to update data inside database ( UPDATE phase ). The data flow inside a masking engine is based on streams and the stream buffer/commit size between different phases of masking process is 10000 rows ( in version 5.3.X ). 


ROWID vs Primary Key

If table has a primary key constraint defined, it will be used during UPDATE phase of masking to identify rows to update. If Primary Key is disabled (by prescript or Delphix Engine setting), index supporting a primary key will be dropped by an Oracle database. This will end up with Primary Key(s) column still used as identifier but without supporting index so table full scan will happen per each UPDATE execution (commit size)

For the fastest possible performance of the masking job, always set a logical key for every table to a ROWID (case sensitive).

Streams vs threads

Number of streams selected in Job setting is driving a number of tables which will be masked in parallel. One stream is masking one table and having at least one update thread.

Number of threads in Job setting is driving a number of concurrent update sessions on single table (stream) using a one reading process (SELECT stream). Data from reading process are distributed between an update threads using a round robin model. Increasing number of update threads to more than 1 will likely create a congestion on the block level for most of RDBMS packages and it’s not recommended without careful tests. 

Commit size

This job setting is defining how many rows are sent to a database for each update command (using a bulk feature)  and committed after update is finished.

Commit size is best if it is left blank (default 10,000).

Parallel processing

Parallel processing of multiple tables is defined by number of streams as described above. In case of large tables, parallel processing of the single table can be introduced by cloning a ruleset and applying filters to copies of ruleset which will select distinct sets of blocks - one set of blocks per ruleset. This approach speed up a masking of the single table without creating a contention on the block level. 

An example filter for Oracle RDBMS looks like this:

  1. Ruleset 1 uses filter expression MOD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID),4) = 0
  2. Ruleset 2 uses filter expression MOD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID),4) = 1
  3. Ruleset 3 uses filter expression  MOD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID),4) = 2
  4. Ruleset 4 uses filter expression  MOD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID),4) = 3


This concludes the first in this 3-part series of posts.  Please click Next in the series below to continue reading the second post...

​​​                                                                                                                                                                                                                                                            Next in the series