Delphix Products

 View Only
  • 1.  In Delphix data masking jobs, what is the difference between "streams" and "threads"?

    Posted 08-17-2016 12:58:00 PM
    When running Delphix data masking jobs, there are fields to specify the number of "streams" and the number of "threads".  Obviously, these represent concurrency and concurrently processing the job, but what is the difference?  For example, suppose we have a job that updates four tables, and we specify 1 stream and 2 threads?  How is that processed?  How is that different from 2 stream and 1 thread?  Or 2 and 2?

    And when we specify how much memory (min and max) is to be used, how is that allocated?  If we specify 8GB, is it 8GB of memory divided amongst the streams and/or threads?  Or is it 8GB per stream, per thread?

    #Masking


  • 2.  RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
    Best Answer

    Posted 08-17-2016 01:09:00 PM
    Hi,
    A data masking job runs on  a collection of table with algorithms applied to columns, this is called rule-set.
    A rule-set  is given streams and threads, Streams are number of tables to process in parallel while threads are number of concurrent updates in those individual streams.

    E.g.
    2 stream and 1 thread will process 2 tables out of all tables in parallel with 1 update threads each
    1 stream and 2 thread will process 1 table at a time with 2 update threads.
    2 stream and 2 thread will process 2 tables out of all tables in parallel with 2 update threads in each ( total 4 updates concurrently).

    Notes:
     - Not all databases support multi threading.
     - Multiple streams are usually better than multiple threads.
     - after a point there are diminishing returns.
     - Too many threads may deadlock the table especially in MSSQL.
     - Table allocation to stream is currently decided upfront and not dynamically, one stream may end up faster then other.
     - Longest running table determines total time, splicing up job into sub-jobs might help.

    Memory allocation is dynamic and it is done by java, we do not have explicit control of how much of the 8GB heap goes to which table, kettle is very optimized to handle this; splitting job is the way to handle it manually if needed.






  • 3.  RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
    Best Answer

    Posted 08-17-2016 03:16:00 PM
    Thanks for the great response, Hims!

    Getting more specific on the memory allocation inputs, are the values we enter passed as parameters to the Java process?  For example, does entering "8192" for min and "16384" for max memory gets passed along to the Java process as parameter values "-Xms 8192mb -Xmx 16384mb"?

    In other words, is each stream and each thread a separate Java process?  Or is each stream a separate Java process, and each thread a Java thread within the Java process?

    For example, if we specify 8192 for min memory and 16384 for max memory, then specify 4 streams each with 4 threads, will we have 4 Java processes with the "-Xms" and "-Xmx" parameters as specified, or will we have 16 Java processes?

    By the way, these questions are coming from customers for whom I'm presenting a virtualization class...


  • 4.  RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

    Posted 08-17-2016 07:14:00 PM
    :)
    Yes, a new jvm is spawned separately from tomcat and existing job with specified memory allocation.
    One job is one java heap and all streams are internal to it, if we want x table(s) to run separately; it is advised to create a separate rule-set and job for it/them.

    in your e.g. all 8 GB will be allocated to 1 job, pentaho and kettle internally allocate to 4 streams, I understand it balances load internally.

    It is recommended to keep -Xms and -Xmx parameters same, reason being to let the job fail upfront if 'all' memory allocation is unavailable rather than during the run of job. Also prevents memory thrashing between various heaps and system.

    you are welcome..


  • 5.  RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

    Posted 04-20-2017 04:10:00 PM
    This reply was created from a merged topic originally titled Correct setting for profiling job.

    Hello 

      what are the correct settings for

    No. of Streams, Min Memory, Max Memory fields appearing in the "create profile job" wizard?

    I have specified 1 as the No. of streams and  no values for the others fields

    When i run a profile job at data level with theese settings on an amount of

    13944,750 Megabytes distributed on 199 tables the monitor shows 52 tables completed, 147 tables   waiting and 0 tables processing. The job is still running  waiting on theese tables  since i have started it about 5 hours ago. I am not able to stop it from the GUI

    Someone can suggest  what kind of waiting is this and how can be avoided?

    Thank you very much.