Delphix Products

 View Only
  • 1.  Performance experiences

    Posted 08-31-2023 01:50:00 AM

    Hi! Could anyone who is using Delphix Data Masking with several different platforms (e.g. Oracle, DB/2 zSeries, DB/2 iSeries, SQL Server, ...) please give some rough estimations on the average number of rows per second they are achieving during masking on these platforms?



    ------------------------------
    Philipp Guehring
    Test Env user
    UniCredit Group
    ------------------------------


  • 2.  RE: Performance experiences

    Posted 09-01-2023 01:31:00 AM
    Hi Philipp,
     
    The performance is determined by number of factors like the algorithm used (customized, out of box etc), latency etc.
     
    For instance, We are able to achieve performance of 0.35 - 0.5 million rows per second for delphix core algorithm across hetro platform (SQL Server, Oracle, DB2 Z/oS, DB2 LUW, Mainframe (Files) .
     
    Regards,
    Bhupinder Kwatra


    ------------------------------
    Bhupinder Kwatra
    Senior System Engineer
    Delphix Community Members
    ------------------------------



  • 3.  RE: Performance experiences
    Best Answer

    Posted 09-01-2023 03:10:00 AM

    The performance of masking is depending on 3 core factors (ordered in descending priority):

    1) database server performance (depending on Disk IO, CPU, Memory, Indexing, Transaction logging ...)
    2) network performance (bandwidth and especially latency!)
    3) masking performance (depending on amount of masked columns per table, algorithms, protocol used, parallelism used)

    Under optimum conditions I have seen performance reaching 6million rows/minute.
    I would consider everything above 1 million rows/minute as good.
    Everything below 500k rows/minute is not very good and to be investigated.

    The most common reasons for poor performance are:
    - Indexes on masked columns not being dropped
    - DB server lacking resources (CPU, Mem, Disk IO)

    The network latency has a significant impact, most people underestimate this.
    It is relative rare that the masking engine is the cause of performance issues.

    Depending on the platform there are varying options to improve, the best options exist for Oracle by using parallel jobs with SQL optimisation (rowid in combination with blocknumber to avoid contention)

    When masking a table with 300 columns and 150 columns being masked it is understandable that the performance is limited.
    Therefore we prefer to talk about troughput in MB/minute instead rows/min.
    Minimum throughput is considered 500MB/min .. but usually significantly higher.


    Parallel updates are a good option to improve performance for ON THE FLY jobs (don't use this option for IN PLACE jobs!).

    If a database is optimised for READ/INSERT an INPLACE masking job is NOT a good idea.
    This applies as well to platforms like HADOOP etc >> those are designed for SELECT/INSERT and not UPDATE.

    If performance is a critical factor you may choose Delphix Hyperscale orchestration > this approach avoids the SQL (JDBC) interface
    and uses instead DB platform specific BULK unload/load mechanisms that allow much higher performance by bypassing the SQL bottleneck.
    Hyperscale extracts the data, then splits the "files" across a group of Delphix masking engines, and loads the masked files back into the database.



    ------------------------------
    Tino Pironti
    Masking SME
    Technical Manager
    Delphix
    ------------------------------



  • 4.  RE: Performance experiences

    Posted 09-01-2023 08:25:00 AM

    Under optimum conditions I have seen performance reaching 6million rows/minute.
    I would consider everything above 1 million rows/minute as good.
    Everything below 500k rows/minute is not very good and to be investigated.

    Can you add more context to those numbers?  Is that "total" for a database?  Or are those per thread/stream rates?

    The worst I've seen (after optimizations) was 50k rpm per thread.  BUT that was on Itanium2 hardware from ~2010.  Thankfully that has been migrated to modern hardware.  



    ------------------------------
    Kevin Bott
    Sr. Database Architect
    Northwestern Mutual Life Insurance Company
    ------------------------------



  • 5.  RE: Performance experiences

    Posted 09-01-2023 09:00:00 AM

    Those numbers are on table level .. seen on Oracle .. fast environment, perfect network, SSD storage.
    On virtulized enviroments (both masking and DB running on same VMware) I usually see above 2million rows/min.

    The combined performance rows/minute (e.g. using parallel STREAMS) is obviously higher .. streams work well as they never cause contention on DB level.
    Parallel Update threads have a high risk to cause contention (locks/deadlocks) - depending on DB platform.
    Oracle has possibly the best transaction isolation to allow parallel updates however we found out that using parallel jobs instead (on same table) with the above mentioned where clause optimisation scales up at almost 100% - till you reach saturation of resources. So in an optimised environment you may run one job with "many small tables" .. plus lets say 4 jobs on same ruleset (copies) containing the very large tables (each having the where clause optimization) > you can run those 5 jobs in parallel to get the masking done as fast as possible without contention.

    A performance of 50k rows/minute is not good and has usually reasons: for example the update fires triggers or indexes .. or the hardware is very poor .. the VM is over provisioned .. or the masking access the DB via WAN instead LAN / poor latency. Be careful about having SQL customisations like where conditions on columns that are not indexed / joins that cause poor performance. There are plenty of possibilities - but in almost 6 years of doing performance optimistation there were maybe 5 cases where the masking engine caused the poor performance. The majority of cases are related to indexes and performance issues caused by the DB config/resources.



    ------------------------------
    Tino Pironti
    Masking SME
    Technical Manager
    Delphix
    ------------------------------



  • 6.  RE: Performance experiences

    Posted 09-01-2023 03:36:00 AM
    Edited by Tino Pironti 09-01-2023 03:37:03 AM

    You asked for some DB platform specific feedback: Oracle, DB/2 zSeries, DB/2 iSeries, SQL Server

    Oracle: Usually very fast, always use ROWID as LK, good options to use parallel jobs for INPLACE masking:     
    Sample how to edit WHERE condition for 2 parallel jobs: 
    First job:       mod(DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid),2) =0
    Second job:  mod(DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid),2) =1 
    (if table has partitions you should split by partitions instead!)

    DB2 zSeries: usually quite fast using JDBC, bit expensive because of MIPS, JDBC removes the need to deal with VSAM/EBCDIC

    DB2 iSeries:  Slower than zOS, needs Journaling enabled for masked tables, we use DRDA protocol in JCC driver,
    there is a free JT400 JDBC driver (slower than IBM JCC driver)

    MSSQL:  is a fast platform but a bit weak regarding DB contention.
    Change transaction logging to simple, change transaction isolation (this will cause the TEMP db to grow!)

    ALTER DATABASE MyDatabase  SET ALLOW_SNAPSHOT_ISOLATION ON  

    ALTER DATABASE MyDatabase  SET READ_COMMITTED_SNAPSHOT ON

     




    ------------------------------
    Tino Pironti
    Masking SME
    Technical Manager
    Delphix
    ------------------------------



  • 7.  RE: Performance experiences

    Posted 09-01-2023 08:46:00 AM

    In addition to Tino's explanation of the core factors, I have two more DB specific items, both are DB2 LUW examples:

    • Page Size.  We have two tables that are identical in structure.  The only difference is 4k vs 16k page size.  The 16k page size performs ~90% better.  We have not been able to get the 4k table changed in production, so the current performance is accepted. 
    • Locking scheme and configuration.  From what I recall, the lock list would fill up, causing the DB to pause updates so it could escalate from page locks to table level lock.  Not as impactful as IO performance, but it is something to consider in a full performance tuning review.  


    ------------------------------
    Kevin Bott
    Sr. Database Architect
    Northwestern Mutual Life Insurance Company
    ------------------------------



  • 8.  RE: Performance experiences

    Posted 09-03-2023 01:50:00 PM

    Hi,

    Unfortunately not a straight forward answer.  It depends on the type of algorithms used, complexity of the table data structure, number of algorithms assigned on a table and sizing of the masking engine and database server.  We see ranges from 10s of thousands to over 3 million rows per minute.  We have SQL Server, Oracle, Sybase and flat file masking and haven't really noticed the technology having an impact on performance.

    cheers

    Murray



    ------------------------------
    Murray Penno
    Platform Lead
    ANZ Bank New Zealand Limited
    ------------------------------