Unstructured files - DSource Snapsync

  • 0
  • 1
  • Question
  • Updated 2 years ago
  • Answered
How does snapsync work with unstructured files. Are the changes captured incremementaly ?, or is there a full capture of the filesystem during a snapshot. eg If I have a 50gb filesystem does it ingest 50gb each time it does a snapsync ?
Photo of Veejan

Veejan

  • 130 Points 100 badge 2x thumb
  • concerned

Posted 2 years ago

  • 0
  • 1
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 2,336 Points 2k badge 2x thumb
Each snapshot runs a complete rsync. Fortunately, Delphix compresses the ingested files and thanks to its algorithms (the core of its engine) can reutilize common blocks (unchanged blocks) through different snapshots. This makes the Delphix Engine space and time efficient product.

Regards.

Gianpiero
(Edited)
Photo of Veejan

Veejan

  • 130 Points 100 badge 2x thumb
Thank you. Just to get further clarity. Are you saying the entire 50GB files will be ingested again using rsync ?
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 2,336 Points 2k badge 2x thumb
I'm saying that all files will be read again through rsync. However what will be saved into the storage of the engine will be only changed blocks. Let's make an experiment (I'll do later): take two snapshots leaving all files unchanged and let' see the storage capacity of the engine before and after the second snap. I'll post screenshots.
Photo of Veejan

Veejan

  • 130 Points 100 badge 2x thumb
Thank Gianpiero, that will be good. How intrusive will this snapsync be to the actual source file system ? Lets say my source is 7tb in size .. 
Photo of Gianpiero Piccolo

Gianpiero Piccolo

  • 2,336 Points 2k badge 2x thumb
I'm attaching two screen captures.
I linked a directory from the source environment. This directory contains compressed and uncompressed files for a total 344MB unvirtualized (du -h on the source).
After first snapshot (when I linked the directory) the used storage on the Delphix Engine was 233MB (2/3 of the unvirtualized space): this is a good feature of the Delphix Engine (not new: you can use some sort of file commpression manager to achieve the same result, but pratical: you take the compression advantage transparently, not taking care of the compression step).
Then, leaving the directory content unchanged, I took a second snapshot and the usage storage on the Delphix Engine was the same (233MB). 
Lastly I added a zip file of 5MB of size into the directory and took a third snapshot. The engine is using now a total space of 238MB (the 5 MB of the zip file cannot be reduced any more) keeping three snapshots of the directory. Now one can rewind the directory to the first or second snapshot in a very short time respect to a traditional way: in fact in a traditional way, you have to copy from the old backup and the time of copy is function of the size of the old backup; in the delphix way, the time is function only of the process of changing the block references, re-indexing of the block of data!

In regard to your last question, the only impact on your source file system is the reading process by the rsync program. I don't know if rsync take note of the changed files for avoiding to read all files again. Let somebody reply us. In case of Oracle DB, for example, it is possible enabling the BCT (Block Change Tracking) feature of Oracle to take advantage of reading only changed block of data ;-)

Reagrds.
Gianpiero   


Photo of Adam Bowen

Adam Bowen, Official Rep

  • 17,418 Points 10k badge 2x thumb
Veejan, Gianpiero is right (as usual). Delphix Snapsync in non-intrusive and should only register in the smallest percentages of CPU. Are these questions just theoretical? Or have you identified unstructured data to capture with Delphix? If so, what is the rate of change? What is the purpose of this set of files? Is the source a production system?