Data level profiling (EMAIL)

  • 0
  • 1
  • Question
  • Updated 1 year ago
  • Answered

Hi,


I'm trying to set up profiling on masking engine at data level but can't get the job woking it terminates successfuly but the EMAIL colomns are not tagged as sensitive one.

I'm using this regexp to profile email addresses
\b[[:alnum:]]([-_.]?[[:alnum:]])*@[[:alnum:]]([-.]?[[:alnum:]])*\.([a-z]{2,4})\b

The list of email address to profile are (found on medical_records and patient tables of the demo delphix schema)

Regards,

Mouhssine
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 4,762 Points 4k badge 2x thumb

Posted 1 year ago

  • 0
  • 1
Photo of Jaclyn Schoof

Jaclyn Schoof, Community Manager

  • 5,092 Points 5k badge 2x thumb
Hi Mouhssine,
Does "- \b[[a-zA-Z0-9]]([-_.]?[[a-zA-Z0-9]])*@[[a-zA-Z0-9]]([-.]?[[a-zA-Z0-9]])*\.([a-z]{2,4})\b" work?
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 4,762 Points 4k badge 2x thumb
Hi Jaclyn,

Will give it a try and keep you informed, but I think I tested it first with no success.

So please bear with me time to try again and give a feed-back

Mouhssine
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 4,762 Points 4k badge 2x thumb
Hi Jaclyn,

After testing this new regexp nothing new happen i still can't profile emails.

Email reg


Profile


Profiler job


Results (2 tables profiled)



But no column has been identified as sensitive even we have one email column defined per table



Regards,

Mouhssine
Photo of Mouhssine SAIDI

Mouhssine SAIDI

  • 4,762 Points 4k badge 2x thumb
Hi Jaclyn,

I fixed it.

After discussing with kersten about this issue he gaves me some great advices, and foud that my profiling dont tag the column because of the sampling algorithm it uses.

To be more clear, the profiler will relay on a configuration file that fixes some key values (NO_OF_ROWS and PERCENTAGE_REQUIRED=80). So this mean that it will look for NO_OF_ROWS that defaults to 100 insied the columns to profile and should find out opf theme at least 80% that matches the regexp defined.

I finally updates my tables to get 100 recored and used this regexp "\b[[a-zA-Z0-9]]([-_.]?[[a-zA-Z0-9]])*@[[a-zA-Z0-9]]([-.]?[[a-zA-Z0-9]])*\.([a-z]{2,4})\b" and voilĂ  things works magically.

Regards,

Mouhssine
Photo of Jaclyn Schoof

Jaclyn Schoof, Community Manager

  • 5,092 Points 5k badge 2x thumb
I'm glad it's all worked out!