Classification of Masking Algorithms based on Uniqueness and Referential Integrity

  • 0
  • 1
  • Question
  • Updated 6 months ago
  • Answered
  • (Edited)
I'd like to classify the masking algorithms shipped with the Masking Engine on basis of the "Uniqueness" and "Referential Integrity" properties.
The most worries about masking data are:

  • Referential integrity for all columns that refer to other columns
  • preserve the uniqueness values in columns designed to contain unique values (PK, Alternate Keys)
So at design time of the inventory, we have to know which algorithm preserves RI and which preserves uniqueness.


Algorithm               RI               Uniqueness
Secure Lookup           YES               NO
Segment Mapping         YES               YES
Mapping Algorithm       ???               YES
Binary Lookup           ???               ???
Min Max                 ???               ???
Secure Shuffle          ???               ???
Who kindly can help me?
Thank you.
Gianpiero
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 1,220 Points 1k badge 2x thumb

Posted 6 months ago

  • 0
  • 1
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 4,632 Points 4k badge 2x thumb
Hi Gianpiero,

Based on what I’ve looked for during implementation and POV here is my thinking

Algorithm RI Uniqueness
Secure Lookup YES NO
Segment Mapping YES YES
Mapping Algorithm YES. YES => if number of inputs provided for secure lookup is grater or equal than the column records elsewhere NO
Binary Lookup. YES NO
Min Max YES NO
Secure Shuffle YES YES => if base column contains unique values at source elsewhere NO

Hope this could be double checked and confirmed by delphix internals

Regards,

Mouhssine
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 1,220 Points 1k badge 2x thumb
In order to choose the right algorithm that best fits the user requirements, it is important to have a clear description about the algorithm behavior. 

For example, the Secure Lookup should work in this way (or equivalent way):
  1. it takes the original value, and encrypts it with the internal secret key
  2. it calculates the numeric hash value of the encrypted original value
  3. it calculates the N-modulus of the numeric hashed value, where N is the number of text rows of the SL file (provided by the user). 
  4. it gets the value from the text row n, where n is the N-modulus from the previous step.
With this description I know that if I change the secret key of the masking engine then the numeric hash value changes and so the index of the text row changes and so the output of the SL Algorithm changes. With this description I know that if the secret key does not change, then the SL Algorithm generates the same output when it takes the same value as input -> this preserve the Referential Integrity.
Last, with this description, I know that because the algorithm uses a combination of hash and encryption functions (with an unknown key), then the behavior of the algorithm is deterministic, but is unpredictable: so it preserve Referential Integrity, but is unbreakable. Because the hash algorithm can generate collisions, I know that it cannot preserve uniqueness.

How does the Mapping Algorithm work internally?

Thanks.
Gianpiero
(Edited)