Mapping algorithm values

  • 0
  • 1
  • Question
  • Updated 1 month ago
Hi there, we have multiple tables with a total of 40K-ish unique values and I have setup a mapping algorithm with 100K unique values but it still fails due to not enough mapping values issue. 
Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb

Posted 1 month ago

  • 0
  • 1
Photo of Robert Patten

Robert Patten, Employee

  • 640 Points 500 badge 2x thumb
How many unique values are there in all the tables that are using this mapping algorithm?  You'll need to make sure you have enough values to cover them all.  
Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Hi there,
There are about 38K in total across all tables.
I have 100K unique values in the lookup file.
Regards
Photo of Matt Griffith

Matt Griffith

  • 910 Points 500 badge 2x thumb
Try recreating the algorithm.  I had a similar issue on 5.2.5 and recreating the algorithm fixed it.  I didn't follow it up with support so I never found root cause but it's worth a try.
Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Thanks, I will try that as well, meanwhile I added 1 million values, which are unnecessary for the current set.
Photo of Matt Griffith

Matt Griffith

  • 910 Points 500 badge 2x thumb
The downside to that is the startup time of the job will be longer.  Ideally you always want to have a similar number of values in the lookup file to the number of unique rows to be masked.

https://thedatalobby.kuzodata.com/maximum-performance-data-masking/

However, it's probably worth testing it to see whether it fixes your issue and how it performs.
Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Yes, thanks, I had this in mind, but it was initially a test: I understand the performance compromise. look forward to get back to a new algorithm with less values. Moreover, it is not suggested to have so many lookup values for unique mapping, should use segmented mapping instead.
This cannot be the algorithm going forward, just wondering why such an issue came up with the current ample of values. 
The only thing to add is that, I have multiple copies of the same DB using this algorithm when masked.
Regards 
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb

Hi,

If you are using lookup file this means that you are not using segmented mapping algorithm but secure lookup one instead.

Notice that it's more accurate to use segmented mapping for primary / foreign keys columns it will guarantee you the uniqueness value generation.

Can you show an example of the values you have in your column that we can help.

Regards,

Mouhssine

Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Hi Mouhssine,

I am using mapping algorithm, not segmented mapping or secure lookup.

regards
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb

HI,

Sorry for the misunderstanding.

So the rule should be number of unique values in the mapping file have to be equal or superior to the column once to guarantee that uniqueness values.


Can you confirm pleas if this is the case with you mapping file.

Regards,

Mouhssine

Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Thanks, That is true Mouhssine, as I have mentioned before, there are 100K unique values in the lookup file to 38K unique in the DB.
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb

Hi,

Just to complete the answer.

I'm agree with Math's remarks the making job have to load on memory the 100K entries in mapping file before proceeding with masking and this spins up Performance / Maintenance questions.

You will have to update the mapping file whenever it's needed to guaranty it have equal or more values than what you have in your columns.

I still think that the easiest and smart way is to use segmented mapping algorithm, is there any particular reason for not using it in your case

Regards,

Mouhssine  


Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Hi Mouhssine,

The lookup file provides realistic values, while the segmented will be haphazard, random ones which is needed in our case.

The issue is not which Algorithm being used, it's simply why is it asking for more values while it has more than enough, more than double unique values available.

I am versed with other provisions and performance issues that exist.

Regards.
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb

Hi,

I've got the point, could you give it try with SL algorithm you will have to generate a file with >= to the number of entries in the column.

May be the hick is with how the algorithm is working, and the idea is try with a simple replacement with guarantee of preserving the realistic value meaning.

Regards,

Mouhssine

Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi,

Have you defined any ignore character at the algorithm level, may be this influences the results.

https://support.delphix.com/Unpublished_Articles/KBA1328_Mapping_Algorithm_(MA)_Technical_Overview

Regards,

Mouhssine 
Photo of EnterpriseMan

EnterpriseMan

  • 630 Points 500 badge 2x thumb
Thanks Mouhssine,
There are no ignored chars. Mapping Algorithm ticks all the boxes in our case.
It's just this anomaly regarding not enough lookup values while there are plenty, we are facing.
Regards

(Edited)
Photo of Robert Patten

Robert Patten, Employee

  • 640 Points 500 badge 2x thumb
Is it possible that new values are being masked in the columns from one execution to the next?  Once a value from the MA lookup file is assigned to a specific value it is no longer available for new values that could exist on next execution.