Pinpointing mapping failures when using segmented mapping alogorithm

  • 0
  • 1
  • Question
  • Updated 1 week ago
  • Answered
We have encountered a problem trying to mask a column using segment mapping technique. Masking works fine when the data in the column is in the correct format. However, if the value in the column does not match the expected segment mapping, no masking can occur. We want to be able to identify rows that have this invalid data (in order to go and correct it) but we haven't found a way to do this from the masking job output. Has anybody else had this kind of problem and how have you handled it?
Photo of Christopher Glynn

Christopher Glynn

  • 280 Points 250 badge 2x thumb

Posted 2 weeks ago

  • 0
  • 1
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,516 Points 5k badge 2x thumb
Hi Christopher,

The segment mapping follows a logic and you have to ensure that all fields are normalized and can response to the segment mapping algo you implemented.

The correct and fully functional way is to implement à hooks in premasking job that normalizes data before applying the segment mapping algorithm

But, if you can provide a sample of the filed in correct and reject format plus what you need it to be after masking it would be more relevant for the community and for peoples that are wailing to help.

Regards,

Mouhssine
Photo of Christopher Glynn

Christopher Glynn

  • 280 Points 250 badge 2x thumb
Salut Moussine and thanks for that.

Our client understands that having "clean" data is prerequisite for masking these columns, but is reluctant to commit resources to clean the source data. They do not want the masking job itself to clean the data before masking as that will disguise where they have a problem with incorrectly formatted data in the source database. The hope was that we could at least pin-point where the masking fails, which would give them an indication of exactly where the source data needs to be fixed.

It strikes us as an obvious exception that Delphix Masking  should be capturing information about to include in the masking job log. Unfortunately it does not log exceptions at this level of granularity and so we had wondered if anyone in the community had any workarounds to capture the information from the masking job or its output.

What we have done is to amend the selection criteria for the ruleset to include a binary test that verifies the format of the data and its content. If the test is passed the column is masked in segments as expected. In a separate ruleset, if the same test is failed, we are masking with a default text string.

This stops potential leakage of unmasked values, at the same time providing a fairly straightforward means to pinpoint the source data which actually needs to be fixed. The downside is that it does mean scanning the same table.column two times in order to handle the two different masking outcomes as well as having to run a post masking script to report the rows that have columns masked with the default string.


Cheers,
Chris  
Photo of Mayank Ahluwalia

Mayank Ahluwalia

  • 504 Points 500 badge 2x thumb
Hello Chris,

Will you be able to provide the intent of masking using Segmented technique ? As an example, is the objective to just mask the values and have a unique output for each unique input OR, are there specific rules implemented at the position level within the string in your implementation of Segmented Algorithm ?

Also, what is the maximum length of data that is present in the column ? Is it a number column or a varchar ?

Regards,
Mayank
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,516 Points 5k badge 2x thumb

Bonjour Christopher,

Now i got your need clearly 

I can announce you that delphix still can help you, identify the columns with issue and do it without any pain.

The action plan is the following:

1/ create a masking job without taking any action to normalize the bad columns
2/ run the masking job on the column
>> normally you will end up with the columns responding to masking criteria masked and the bad once will not be masked

Now to identify the bad column
3/ create a certify job and run it on the column, it will let you know which one aren't responding to the masking algorithm criteria.

Regards,

Mouhssine 

Photo of Christopher Glynn

Christopher Glynn

  • 280 Points 250 badge 2x thumb
Hi Mouhssine.

We are not sure the CERTIFY capability has been maintained in 5.3.

Chris
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 5,516 Points 5k badge 2x thumb

Hi Chris,

You're right on this.

One idea without knowing the structure of you data, try to mask the column first using the same masked value.

ex AZE23RRT
     AZE00RRX

in this example I’m assuming masking chars of position 4 and 5 with 0 and the 9th chars with x

Later, to identify the columns that are out of the scope you have to supply a sql order that shows the lines that doesn't have 00 in 4 and 5 position and X in the 8th one.


Hope you've got the idea.


Regards,

Mouhssine