Question with subsetting

  • 0
  • 1
  • Question
  • Updated 19 hours ago
  • Answered
Hi all

We are going to leverage delphix for a transformation project test data management

The existing prod DB size is 4tb and has close to 1000 tables. 2 questions I have

1, Is it possible if each tester can select their own tables( which are affected by their test case alone and manage their own DB)

For example my TC for sprint1 may require only 10 tables from total 1000 list

Can I maintain data only in those 10 tables in snapshot and load it to my DB .In this case what will be the data in other tables , will it be null. should I load in all other tables which have a foreign key relationship with above 10 tables also along with the affected 10 tables.?

2.Can I subset data in a table , for example I select first 5 rows from a table which contains 1000 rows.The intention is to save space of snapshot

Will the related data from other tables( which has a foreign key relationship above table) will be auto pulled? or should i define the relationship in my rule set?
Photo of Suja

Suja

  • 90 Points 75 badge 2x thumb

Posted 1 week ago

  • 0
  • 1
Photo of Tim Gorman

Tim Gorman, Field Services

  • 3,184 Points 3k badge 2x thumb
Official Response
Suja,

1) Is it possible if each tester can select their own tables (which are affected by their test case alone and manage their own DB)?

AnswerNot in the way I think you are thinking.  Delphix virtual databases are provisioned and managed at the database level, so that everyone gets a full virtual copy of the dSource on provision, refresh, branch, or rewind/restore.  However, using "hooks" which are programmatic callouts intended for customization, you can automatically remove any schemas or objects you wish after provisioning, refresh, and rewind/restore.

2) Can I subset data in a table , for example I select first 5 rows from a table which contains 1000 rows?

AnswerAgain, same answer as above.

This may sound disappointing because both of my answers are "No you cannot", but the reality is that the space occupied by a virtual copy is determined by the changes made to it after it is provisioned or refreshed.  The space occupied is not determined by the volume of the source database, or by the number of objects, or the number of rows within tables and indexes.

Long story short:  it is important to let go of your original assumptions, and understand how virtual databases are created.  Your final comment (i.e. "the intention is to save space of snapshot") indicates that you are thinking about virtual databases using assumptions based on previous experience with database technology in general.  Please take a look at our documentation, specifically the data virtualization concepts introduced HERE?

Another long story short:  there are indeed a few reasons to subset virtual databases (VDBs), but saving space is not one of them.  In fact, once you understand how VDBs work, you will realize clearly that the very act of removing objects to create your subset will make the VDB larger, consuming more storage.  This is because the amount of storage consumed by a VDB is determined by the changes made to it.

It is a bit counter-intuitive, but it makes perfect sense once you understand what VDBs are and how they work.

Please let us know what you think?

Thanks!

-Tim