Delphix Products

View Only

Back to discussions

Expand all | Collapse all

In Delphix data masking jobs, what is the difference between "streams" and "threads"?

Jump to Best Answer

1. In Delphix data masking jobs, what is the difference between "streams" and "threads"?

2 Recommend
Delphix Ambassador

Tim Gorman
Posted 08-17-2016 12:58:00 PM

Reply
When running Delphix data masking jobs, there are fields to specify the number of "streams" and the number of "threads". Obviously, these represent concurrency and concurrently processing the job, but what is the difference? For example, suppose we have a job that updates four tables, and we specify 1 stream and 2 threads? How is that processed? How is that different from 2 stream and 1 thread? Or 2 and 2?

And when we specify how much memory (min and max) is to be used, how is that allocated? If we specify 8GB, is it 8GB of memory divided amongst the streams and/or threads? Or is it 8GB per stream, per thread?

#Masking
2. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer

1 Recommend
Delphix Ambassador

Himanshu Pawar
Posted 08-17-2016 01:09:00 PM

Reply
Hi,
A data masking job runs on a collection of table with algorithms applied to columns, this is called rule-set.
A rule-set is given streams and threads, Streams are number of tables to process in parallel while threads are number of concurrent updates in those individual streams.

E.g.
2 stream and 1 thread will process 2 tables out of all tables in parallel with 1 update threads each
1 stream and 2 thread will process 1 table at a time with 2 update threads.
2 stream and 2 thread will process 2 tables out of all tables in parallel with 2 update threads in each ( total 4 updates concurrently).

Notes:
- Not all databases support multi threading.
- Multiple streams are usually better than multiple threads.
- after a point there are diminishing returns.
- Too many threads may deadlock the table especially in MSSQL.
- Table allocation to stream is currently decided upfront and not dynamically, one stream may end up faster then other.
- Longest running table determines total time, splicing up job into sub-jobs might help.

Memory allocation is dynamic and it is done by java, we do not have explicit control of how much of the 8GB heap goes to which table, kettle is very optimized to handle this; splitting job is the way to handle it manually if needed.
3. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer

1 Recommend
Delphix Ambassador

Tim Gorman
Posted 08-17-2016 03:16:00 PM

Reply
Thanks for the great response, Hims!

Getting more specific on the memory allocation inputs, are the values we enter passed as parameters to the Java process? For example, does entering "8192" for min and "16384" for max memory gets passed along to the Java process as parameter values "-Xms 8192mb -Xmx 16384mb"?

In other words, is each stream and each thread a separate Java process? Or is each stream a separate Java process, and each thread a Java thread within the Java process?

For example, if we specify 8192 for min memory and 16384 for max memory, then specify 4 streams each with 4 threads, will we have 4 Java processes with the "-Xms" and "-Xmx" parameters as specified, or will we have 16 Java processes?

By the way, these questions are coming from customers for whom I'm presenting a virtualization class...
4. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

1 Recommend
Delphix Ambassador

Himanshu Pawar
Posted 08-17-2016 07:14:00 PM

Reply
:)
Yes, a new jvm is spawned separately from tomcat and existing job with specified memory allocation.
One job is one java heap and all streams are internal to it, if we want x table(s) to run separately; it is advised to create a separate rule-set and job for it/them.

in your e.g. all 8 GB will be allocated to 1 job, pentaho and kettle internally allocate to 4 streams, I understand it balances load internally.

It is recommended to keep -Xms and -Xmx parameters same, reason being to let the job fail upfront if 'all' memory allocation is unavailable rather than during the run of job. Also prevents memory thrashing between various heaps and system.

you are welcome..
5. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

0 Recommend
luigi d
Posted 04-20-2017 04:10:00 PM

Reply
This reply was created from a merged topic originally titled Correct setting for profiling job.
Hello
what are the correct settings for
No. of Streams, Min Memory, Max Memory fields appearing in the "create profile job" wizard?
I have specified 1 as the No. of streams and no values for the others fields
When i run a profile job at data level with theese settings on an amount of
13944,750 Megabytes distributed on 199 tables the monitor shows 52 tables completed, 147 tables waiting and 0 tables processing. The job is still running waiting on theese tables since i have started it about 5 hours ago. I am not able to stop it from the GUI
Someone can suggest what kind of waiting is this and how can be avoided?
Thank you very much.

Delphix Products

In Delphix data masking jobs, what is the difference between "streams" and "threads"?

Tim Gorman08-17-2016 12:58:00 PM

Himanshu Pawar08-17-2016 01:09:00 PMBest Answer

Tim Gorman08-17-2016 03:16:00 PMBest Answer

Himanshu Pawar08-17-2016 07:14:00 PM

luigi d04-20-2017 04:10:00 PM

1. In Delphix data masking jobs, what is the difference between "streams" and "threads"?

2. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer

3. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer

4. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

5. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

Product

Solutions

Company

Resources

Support

Delphix Products

In Delphix data masking jobs, what is the difference between "streams" and "threads"?

Tim Gorman08-17-2016 12:58:00 PM

Himanshu Pawar08-17-2016 01:09:00 PMBest Answer

Tim Gorman08-17-2016 03:16:00 PMBest Answer

Himanshu Pawar08-17-2016 07:14:00 PM

luigi d04-20-2017 04:10:00 PM

1. In Delphix data masking jobs, what is the difference between "streams" and "threads"?

2. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"? Best Answer

3. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"? Best Answer

4. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

5. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?

Product

Solutions

Company

Resources

Support

2. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer

3. RE: In Delphix data masking jobs, what is the difference between "streams" and "threads"?
Best Answer