Delphix Toolkits (dxToolkit and dxmToolkit)

 View Only
  • 1.  DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-17-2022 10:25:00 AM
    Hi,

    Currently using DXM v0.91 for extracting the columns from a Masking Engine (Delphix v5.3.9) at for a Danish customer and it's throwing a strange error on just one of the masking engines. -> "UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position"

    I suspect it might be something related to Danish characters support that is tripping up some part of the internal code in DXM

    Here is just a small extract of the error log. Any idea how this issue can be resolved?

    --- Logging error ---
    Traceback (most recent call last):
    File "logging\__init__.py", line 1088, in emit
    File "encodings\cp1252.py", line 19, in encode
    UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position
    208: character maps to <undefined>
    Call stack:
    File "dxmc.py", line 22, in <module>
    File "click\core.py", line 829, in __call__
    File "click\core.py", line 782, in main
    File "click\core.py", line 1259, in invoke
    File "click\core.py", line 1259, in invoke
    File "click\core.py", line 1066, in invoke
    File "click\core.py", line 610, in invoke
    File "click\decorators.py", line 73, in new_func
    File "click\core.py", line 610, in invoke
    File "dxm\dxm.py", line 1526, in list
    File "dxm\lib\DxColumn\column_worker.py", line 272, in column_list
    File "dxm\lib\DxColumn\column_worker.py", line 810, in column_worker
    File "dxm\lib\DxColumn\DxColumnList.py", line 131, in LoadColumns
    File "dxm\lib\DxTools\DxTools.py", line 38, in paginator
    File "dxm\lib\masking_api\api\file_field_metadata_api.py", line 58, in get_all
    _file_field_metadata

    ------------------------------
    Tyrone Nel
    Solutions Architect, Presales
    FWD View Limited
    ------------------------------


  • 2.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-17-2022 11:41:00 AM
    Edited by Michael Torok 05-17-2022 11:42:17 AM
    Hi Tyrone,
    I found a stack overflow article (https://stackoverflow.com/questions/54664815/unicodeencodeerror-charmap-codec-cant-encode-character-ufeff-in-position) that points to the \ufeff being a byte order mark used in UTF-8-SIG encoding. Is there a way to transform the exported data from the column to UTF-8 or encode the data in UTF-8? I believe that may be the issue.

    Thanks,
    Michael

    ------------------------------
    Michael Torok
    Community Mgmt & Experience, Sr. Director
    Delphix
    ------------------------------



  • 3.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-18-2022 03:29:00 AM
    Hi Michael,

    Thanks for coming back to me.

    This error is occurring inside the Delphix DXM code from what I can make out when it performs the export. So DXM would need to be adapted to handle this type of encoding. The table names and columns in the inventory use some Danish characters. The other masking engines at the customer also use Danish characters and this export works fine. 

    So I'm not sure why this particular masking engine is tripping up the DXM.

    Here is the command I am calling

    D:\dxtoolkit\dxmc>dxmc column list --engine mymaskingengine1 --format csv --debug >D:\dxtoolkit\dxmc\Inventory\inventory.csv
    --- Logging error ---
    Traceback (most recent call last):
    File "logging\__init__.py", line 1088, in emit
    File "encodings\cp1252.py", line 19, in encode
    UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position
    208: character maps to <undefined>
    Call stack:

    ------------------------------
    Tyrone Nel
    Solutions Architect, Presales
    FWD View Limited
    ------------------------------



  • 4.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-18-2022 10:02:00 AM
    Hi Tyrone,
    I am going to 'at' mention two other people who may be able to better help you here. I'm thinking that @Anders Karlsson or @Tino Pironti may be better equipped to take this on. 

    Tino and Anders,
    Do you know why the dxmToolkit may be having an issue with the \ufeff BOM or any way around it?

    I will keep an eye on the thread.

    Thanks,
    Michael​​

    ------------------------------
    Michael Torok
    Community Mgmt & Experience, Sr. Director
    Delphix
    ------------------------------



  • 5.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-18-2022 10:25:00 AM
    DXM is an open source tool .. it would be good to create in GITHUB > DXM  a ticket for this type of issue so it gets tracked.
    We would need the command that has been executed ... e.g.
    D:\dxtoolkit\dxmc>dxmc column list --engine mymaskingengine1 --format csv --debug >D:\dxtoolkit\dxmc\Inventory\inventory.csv
    I believe our tool supports UTF-8 .. but the scripts calling DXM obviously must be saved in UTF-8 encoding as well.
    This issue seems to be about the BOM so I wonder if tha  csv file was already existing having a BOM character at the beginning ?
    Maybe use Notepad++ instead Notepad to edit > Notepad is known to always enter a BOM into ascii files
    BR .. Tino

    ------------------------------
    Tino Pironti
    Masking SME
    Technical Manager
    Delphix
    ------------------------------



  • 6.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-19-2022 08:01:00 AM
    Thanks Tino,

    Let me investigate a little more and see if it's the piping to CSV that causing the issue with the encoding.

    Many thanks!
    Tyrone

    ------------------------------
    Tyrone Nel
    Solutions Architect, Presales
    FWD View Limited
    ------------------------------



  • 7.  RE: DXM Toolkit Failing to extract Columns - Unicode Error

    Posted 05-19-2022 08:42:00 AM
    Edited by Tyrone Nel 05-19-2022 09:24:50 AM
    Hi Tino,

    I've run a bit more testing and it is just one environment on the masking engine that is throwing up this error, it's known to have Danish chars in the table names.

    I've removed the file output piping/redirecting to a CSV file to simply the fault finding and exclude it as a cause.

    1. This command works

    PS D:\dxtoolkit\dxmc> .\dxmc.exe column list --engine mymaskingengine1 --envname "My Target" 


    2. But this command throws an error when I specifically use the "--format csv" parameter for dxmc.exe

    PS D:\dxtoolkit\dxmc> .\dxmc.exe column list --engine mymaskingengine1 --envname "My Target" --format csv
    EXCEPTION: UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 2241: character maps to <undefined>


    So I think there is an issue in the internal coding for dxmc when the "--format CSV" parameter is used.

    Let me know what you think, I'm happy to have a remote session so you can have a closer look.

    I've logged an issue on Github as recommended - > "19/05/2022 [BUG - EXCEPTION: UnicodeEncodeError CSV format]"


    ------------------------------
    Tyrone Nel
    Solutions Architect, Presales
    FWD View Limited
    ------------------------------