Delphix Products

  • 1.  Classification of Masking Algorithms based on Uniqueness and Referential Integrity

    Posted 11-16-2017 05:22:00 PM
    I'd like to classify the masking algorithms shipped with the Masking Engine on basis of the "Uniqueness" and "Referential Integrity" properties.
    The most worries about masking data are:

    • Referential integrity for all columns that refer to other columns
    • preserve the uniqueness values in columns designed to contain unique values (PK, Alternate Keys)
    So at design time of the inventory, we have to know which algorithm preserves RI and which preserves uniqueness.

      Algorithm               RI               Uniqueness  Secure Lookup           YES               NO  Segment Mapping         YES               YES  Mapping Algorithm       ???               YES  Binary Lookup           ???               ???  Min Max                 ???               ???  Secure Shuffle          ???               ???  
    Who kindly can help me?
    Thank you.
    Gianpiero
    #Masking


  • 2.  RE: Classification of Masking Algorithms based on Uniqueness and Referential Integrity
    Best Answer

    Posted 11-16-2017 06:20:00 PM
    Hi Gianpiero, Based on what I’ve looked for during implementation and POV here is my thinking Algorithm RI Uniqueness Secure Lookup YES NO Segment Mapping YES YES Mapping Algorithm YES. YES => if number of inputs provided for secure lookup is grater or equal than the column records elsewhere NO Binary Lookup. YES NO Min Max YES NO Secure Shuffle YES YES => if base column contains unique values at source elsewhere NO Hope this could be double checked and confirmed by delphix internals Regards, Mouhssine


  • 3.  RE: Classification of Masking Algorithms based on Uniqueness and Referential Integrity

    Posted 11-22-2017 05:50:00 PM
    In order to choose the right algorithm that best fits the user requirements, it is important to have a clear description about the algorithm behavior. 

    For example, the Secure Lookup should work in this way (or equivalent way):
    1. it takes the original value, and encrypts it with the internal secret key
    2. it calculates the numeric hash value of the encrypted original value
    3. it calculates the N-modulus of the numeric hashed value, where N is the number of text rows of the SL file (provided by the user). 
    4. it gets the value from the text row n, where n is the N-modulus from the previous step.
    With this description I know that if I change the secret key of the masking engine then the numeric hash value changes and so the index of the text row changes and so the output of the SL Algorithm changes. With this description I know that if the secret key does not change, then the SL Algorithm generates the same output when it takes the same value as input -> this preserve the Referential Integrity.
    Last, with this description, I know that because the algorithm uses a combination of hash and encryption functions (with an unknown key), then the behavior of the algorithm is deterministic, but is unpredictable: so it preserve Referential Integrity, but is unbreakable. Because the hash algorithm can generate collisions, I know that it cannot preserve uniqueness.

    How does the Mapping Algorithm work internally?

    Thanks.
    Gianpiero