I want to mask a Unique Identifier column and which is the best algorithm for this?

  • 0
  • 1
  • Question
  • Updated 3 weeks ago
  • Answered
I want to mask a column whose datatype is Unique Identifier (Not Null) but have repeatable values and the data is in the format of "586A3557-2243-4C06-BABE-004F2810BD2C" . I thought of trying SM algo but maximum length of each segment is only 4 in this algo. Any suggestions please!

Kind Regards,
Rajesh
Photo of Rajesh Dadi

Rajesh Dadi

  • 402 Points 250 badge 2x thumb

Posted 4 weeks ago

  • 0
  • 1
Photo of Mayank Ahluwalia

Mayank Ahluwalia

  • 646 Points 500 badge 2x thumb
Hi Rajesh,

The Segmented Mapping would fit perfectly for you as long as the total length of data never exceeds 36 characters.

Looking at the example you gave, you can create 8 segments of length 4 each, alphanumeric and ignore the character '-'.

Regards,
Mayank
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Similar to the above scenario i have data which of data type varbinary - sample date "0x01000000C6934392AC8527F35257EDD0A414ADB9C58B6224B785FB1BD33694D4AE89797B" which would be the best algorithm.
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi Jayashree,

I think the best one for your case will be the "Binary Lookup Algorithm".

Regards,

Mouhssine
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Thanks Mouhssine. I missed mentioning in my earlier post that the column also has a PK/FK constraint attached. Please suggest.

Thanks,
Jayashree
Photo of Mayank Ahluwalia

Mayank Ahluwalia

  • 646 Points 500 badge 2x thumb
If you need unique values post masking, one solution i can think of is to cast varbinary into varchar...masking with Segmented mapping and then update the masked varchar values by casting them back into varbinary
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Thanks Mayank.

Segmented Mapping can take only upto 9 segments, but the length of the above goes to more than that.
Also once we convert varbinary to varchar, we are facing issue converting it back to varbinary. Can you please suggest if Tokenization or Mapping Algorithm would be a better choice. Your thoughts please.

Thanks,
Jayashree 
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi Jayashree,

Sorry for the delay.

You still can use Secure lookup or mapping algorithm to mask the PK/FK column, with this requirement.

- nomber of unique values in the lookup/mapping file have to be superior or at less equal to the number of rows in the column to be masked.

- you will have to optimize the job regarding the size of SL file and number of rows to mask.

I prefer the usage of SL instead of mapping algorithm.

Regards,

Mouhssine
(Edited)
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Thanks Mouhssine.

Just to confirm we have about 38643560 rows and we have about 45000 unique rows, so i should be good creating a SL file with about 45000 or 50000 rows, which will make it unique and serve my purpose of FK/PK right?

Sorry I am a beginner, asking so many doubts.

Thanks,
Jayashree


Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi jayashree,

Don’t mind we are here to help if we can :).

Yes, I confirm your purpose if you create a file with 50000 it could be great this will let you 5000 more values in case of your database evolution.

Regards,

Mouhssine
Photo of Mayank Ahluwalia

Mayank Ahluwalia

  • 646 Points 500 badge 2x thumb
Hi Mouhssine,

My understanding of Secured Lookup Algorithm is that while it is deterministic and can preserve Referential Integrity, it may or may not provide uniqueness.

So, in the above case, even if we create a lookup file with # of values > # of unique values in the column, does the Algorithm implementation guarantee uniqueness ?

Regards,
Mayank
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi Mayank,

SL calculate a hash value with modelus so if we imagine the column values are unique and the same in SL file with the condition of having # of values > to number of values in the column we can still some how guarantee the uniqueness generated of masked value. (This mimics mapping algorithm)

Mapping algorith guarantees a 1 to 1 mapping and works like SL, so this said SL/MAPPING could answer the need in ower case

Regards,

Mouhssine
Photo of Mayank Ahluwalia

Mayank Ahluwalia

  • 646 Points 500 badge 2x thumb
Thank you for the explanation :)
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Thanks for detailed explanation Mouhssine. But i am facing the below issues.

1) I have a varbinary column varbinary(256). I tried secure Lookup but it is failing with the error
Implicit conversion from nvarchar to varbinary is not allowed, use the convert function.

2) I decided to try using Binary Lookup as mentioned, for which the lookup file has to be of the format xml, jpg or pdf. I created the lookup with a pdf lookup file. But this too is failing with the error. I even tried increasing the column width to varbinary(500) 

Data Truncation.

Data sample: 0x01000000FFFDFB622D17E4F4891B97D2785F79EA6D22B96FA45C9E07D5AFF3A2D25E6497


Please suggest.

Thanks,
Jayashree
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi Jayashree,

I'm not masterd on SQL Server, but got the same behavior on postgres where i created an implicite cast that converts my PK colum of type uuid to varchar in order to be masked.

You may have to look on (CAST / CONVERT) you colum to allow the masking process.

P.S : what is the content of this colum is it a pic, pdf ....?

Regards,

Mouhssine
Photo of Jayashree Iyer

Jayashree Iyer

  • 150 Points 100 badge 2x thumb
Hi Mouhssine,

It is not pic or pdf. It contains a binary data sample data below. DataType is varbinary(256)

0x01000000FFFDFB622D17E4F4891B97D2785F79EA6D22B96FA45C9E07D5AFF3A2D25E6497

Thanks,
Jayashree
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,846 Points 5k badge 2x thumb
Hi,


OK, so you have to look on (CAST / CONVERT) between varbinary and nvarchar to get the column masked using the SL algorithm.

Regards,

Mouhssine