Delphix Products

  • 1.  search for sensitive data on plain txt files

    Posted 05-25-2017 10:35:00 AM

    Hello,


    Can Delphix profile unstructured .txt files?

    There is documentation about profiling structured files.

    In addition there is documentation about virtualize unstructured files.

    Maybe does it has to do with profiling unstructured files?

    Luigi



  • 2.  RE: search for sensitive data on plain txt files
    Best Answer

    Posted 06-01-2017 12:32:00 PM
    Hi Luigi,
    Unstructured files are hard to process, however we can redact them using some creativity.
    Approach is below:
    1. We will treat the file as delimited, we have to find a character which does not exist in file to be used as field delimiter.
    2. This will ensure every row is treated as a whole single field.
    3. We will use Multi PHI checkbox on field to keep finding sensitive fields ( the default algorithm to be applied is in a property file), this will find out all patterns existing in the file.
    4. We apply the "Free Text Redaction" Algorithm with values and patterns to be redacted to the one field.
    5. Execute masking job.

    User will have to do a few iterations to get this accurate though.
    --Hims


  • 3.  RE: search for sensitive data on plain txt files

    Posted 06-09-2017 10:26:00 AM

    I have got theese errors

    If i try to profile text file delimited by comma and with end of line as record delimiter

    miei_nomi ( file data input):

    LUIGI,FRANCO
    FRANCESCO,MARIO


    if i include file format by import a custom file where i have specified the two fields that compose a row in the input file.

    miei_nomi_ff  ( format file )   contents:

    Testo1
    Testo2

    Error Report

    157_317_miei_nomi.txt 2017/06/09 11:19:45 - Kitchen - Logging is at level : Basic logging
    2017/06/09 11:19:45 - Kitchen - Start of run.
    2017/06/09 11:19:46 - miei_nomi - Loading transformation from XML file [/var/delphix/dmsuite//output/edma/DMSApplicator/rs_miei_nomi/157/KETTLE_PROFILING_XML_157_miei_nomi_317.xml]
    2017/06/09 11:19:46 - miei_nomi - Dispatching started for transformation [KETTLE_PROFILING_XML_157_miei_nomi_317]
    2017/06/09 11:19:47 - Text File Input.0 - fileNamesList = [miei_nomi]
    2017/06/09 11:19:47 - Text File Input.0 - calling openNextfile method to load buffer
    2017/06/09 11:19:47 - Text File Input.0 - FTP connection found!
    2017/06/09 11:19:47 - Text File Input.0 - Children length for given directory: 33
    2017/06/09 11:19:47 - Text File Input.0 - InputStream :org.apache.commons.vfs.provider.DefaultFileContent$FileContentInputStream@5b301d6c InputStream :false
    2017/06/09 11:19:47 - Text File Input.0 - meta.getEndOfLine() :- null
    2017/06/09 11:19:47 - Text File Input.0 - ---------- File read by buffer reader ---------
    2017/06/09 11:19:47 - Text File Input.0 - template step initialized successfully.......!!!
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unexpected error
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : java.lang.NullPointerException
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.row.RowMeta.searchValueMeta(RowMeta.java:393)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.row.RowMeta.exists(RowMeta.java:118)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.core.row.RowMeta.addValueMeta(RowMeta.java:129)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.steps.addsequence.AddSequenceMeta.getFields(AddSequenceMeta.java:292)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.steps.addsequence.AddSequence.processRow(AddSequence.java:121)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
    2017/06/09 11:19:47 - Add sequence.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.lang.Thread.run(Thread.java:745)
    2017/06/09 11:19:47 - Add sequence.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)
    2017/06/09 11:19:47 - miei_nomi - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errors detected!
    2017/06/09 11:19:47 - miei_nomi - KETTLE_PROFILING_XML_157_miei_nomi_317
    2017/06/09 11:19:47 - miei_nomi - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errors detected!
    2017/06/09 11:19:47 - miei_nomi - KETTLE_PROFILING_XML_157_miei_nomi_317
    2017/06/09 11:19:47 - Text File Input.0 - Finished processing (I=2, O=2, R=0, W=2, U=0, E=0)

    STOP

    Regards,

       Luigi