Lookupfiile Limits

  • 0
  • 1
  • Question
  • Updated 1 year ago
  • Answered
Is there a limit on the number of lines of lookup file used for Secure Lookup?
Photo of Koichi Shibayama

Koichi Shibayama

  • 542 Points 500 badge 2x thumb

Posted 1 year ago

  • 0
  • 1
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 1,526 Points 1k badge 2x thumb
Hi Koichi,
there shouldn't be limits on number of rows on SL files. However keep in mind that all data read from SL file stay in RAM while job is running, so you should increase the max size of memory in the job definition form. The default value is 1024MB and could be not enough. If you increase the max size of RAM I suggest you to increase the minimum size too for better performance of the garbage collector of the JVM.

Best Regards
Gianpiero Piccolo
Photo of Karsten Stoehr

Karsten Stoehr, Employee

  • 870 Points 500 badge 2x thumb
There are practical limits and there have been issues with multi-million files in the past.
However, it's not recommended to create large lookup lists.

First, you should know that no matter how many entries your lookup list has, even if they are more values than your original data has, it will always create duplicate masked values! This is intentional and built into the code.

The protection of the data and the irreversibility stem from mapping multiple, even thousands of original data, to the same masked value. Otherwise the cardinality of an entry could still identify the original data. Take an address as an example. You may have only one customer from a small village. Now if that village's name would be replaced with a masked name but it's still the only entry in your table - then you can still identify the customer. By mapping thousands of different villages to the same masked name it's impossible to identify the original town's name.

Perhaps counter intuitively the lesser values your Secure Lookup algorithm uses, the more you have secured your data.
Keep in mind the goal is to protect the sensitive data - the goal is _not_ to create the best similar looking fake data possible.

You need to find a balance between protection and a number of different values that enables to test the application. A lookup list with some hundreds or some thousands of values will usually serve best.
Photo of Koichi Shibayama

Koichi Shibayama

  • 542 Points 500 badge 2x thumb
Sorry for the late reply. Thank you for your reply!