Question with subsetting

  • 0
  • 1
  • Question
  • Updated 18 hours ago
  • Answered
Hi all

We are going to leverage delphix for a transformation project test data management

The existing prod DB size is 4tb and has close to 1000 tables. 2 questions I have

1, Is it possible if each tester can select their own tables( which are affected by their test case alone and manage their own DB)

For example my TC for sprint1 may require only 10 tables from total 1000 list

Can I maintain data only in those 10 tables in snapshot and load it to my DB .In this case what will be the data in other tables , will it be null. should I load in all other tables which have a foreign key relationship with above 10 tables also along with the affected 10 tables.?

2.Can I subset data in a table , for example I select first 5 rows from a table which contains 1000 rows.The intention is to save space of snapshot

Will the related data from other tables( which has a foreign key relationship above table) will be auto pulled? or should i define the relationship in my rule set?
Photo of Suja

Suja

  • 90 Points 75 badge 2x thumb

Posted 1 week ago

  • 0
  • 1
Photo of Tim Gorman

Tim Gorman, Field Services

  • 3,184 Points 3k badge 2x thumb
Official Response
Suja,

1) Is it possible if each tester can select their own tables (which are affected by their test case alone and manage their own DB)?

AnswerNot in the way I think you are thinking.  Delphix virtual databases are provisioned and managed at the database level, so that everyone gets a full virtual copy of the dSource on provision, refresh, branch, or rewind/restore.  However, using "hooks" which are programmatic callouts intended for customization, you can automatically remove any schemas or objects you wish after provisioning, refresh, and rewind/restore.

2) Can I subset data in a table , for example I select first 5 rows from a table which contains 1000 rows?

AnswerAgain, same answer as above.

This may sound disappointing because both of my answers are "No you cannot", but the reality is that the space occupied by a virtual copy is determined by the changes made to it after it is provisioned or refreshed.  The space occupied is not determined by the volume of the source database, or by the number of objects, or the number of rows within tables and indexes.

Long story short:  it is important to let go of your original assumptions, and understand how virtual databases are created.  Your final comment (i.e. "the intention is to save space of snapshot") indicates that you are thinking about virtual databases using assumptions based on previous experience with database technology in general.  Please take a look at our documentation, specifically the data virtualization concepts introduced HERE?

Another long story short:  there are indeed a few reasons to subset virtual databases (VDBs), but saving space is not one of them.  In fact, once you understand how VDBs work, you will realize clearly that the very act of removing objects to create your subset will make the VDB larger, consuming more storage.  This is because the amount of storage consumed by a VDB is determined by the changes made to it.

It is a bit counter-intuitive, but it makes perfect sense once you understand what VDBs are and how they work.

Please let us know what you think?

Thanks!

-Tim
Photo of Suja

Suja

  • 90 Points 75 badge 2x thumb
Thanks Tim

I may require 32 Instances of a 1TB as  snapshots for each iterations because we have 32 sprints planned

The test Data for each sprint needs to be stored as a snapshot and should be loaded to the VDB whenever required.

Is this possible without much data infrastructure overhead
Also where these snapshots are actually stored? is it in Delphix cloud server?
Photo of Tim Gorman

Tim Gorman, Field Services

  • 3,184 Points 3k badge 2x thumb
Suja,

Assuming that the 32 sprints are intended to be run consecutively, as part of a single project, then you'll likely only require one instance of virtual database (VDB) which you might provision once and refresh 31 times?

Or, if you plan all 32 sprints consecutively, to be run consecutively, as part of 32 separate projects, then you'll likely be provisioning 32 VDBs to start with, and then refreshing them N times (where N is the number of sprint iterations per project)?

Either way, following each provision or refresh, you'll be initializing each VDB with test data for the sprint.  You'll use the same mechanism to insert that data as always (i.e. import, script, etc) because a VDB is manipulated like any other database.

From a storage standpoint, immediately following a provision or refresh, a VDB consumes no storage of its own, temporarily sharing all storage with its dSource.  But as changes are made to the VDB, those "deltas" comprise the storage consumed by the VDB.  Thus initializing the test data for the sprint will consume storage within the Delphix virtualization engine on behalf of the VDB, and then the test cases themselves, when making modifications, will consume more storage.

Be aware too that Delphix data virtualization deduplicates and compresses database blocks/pages, so the storage consumed within the engine is generally much less than the "unvirtualized" total reflected by the database.  So while changes or deltas are being stored, they are compressed.

But the main thing to understand is that Delphix VDBs consume storage due to changes made.  A read-only VDB consumes no storage of its own.

Does that make sense?

Please let me know what you think?

Thanks!

-Tim
Photo of Suja

Suja

  • 90 Points 75 badge 2x thumb
Thanks Tim

With respect to our project your first assumption is correct .i.

e ,32 consecutive sprint for a single project

As per you , the Delta which is getting added to the VDB only consume space

Suppose I take a snapshot of the VDB in sprint 1, During sprint 1 testing the VDB may accommodate changes and delta may occupy space.

But after sprint 1 testing, I replaced the changed VDB with VDB snapshot above
over which I can create my Sprint 2 test data and take the snapshot and can continue this process

So that at the end of 32 sprints,I expect to have 32 snapshots with a single VDB instance which is running( which can be provisioned with any of the 32 snapshots saved)

Hope My understanding is correct.

Also ,Please confirm if the dsources are stored in cloud Delphix server and hope no data security threats.
Photo of Tim Gorman

Tim Gorman, Field Services

  • 3,184 Points 3k badge 2x thumb
Suja,

The possibilities are boundless...

  1. You can keep restoring/rewinding back to the initial snapshot (or Self-Service bookmark) in the same VDB used in 32 consecutive sprints, so that every sprint starts off from the same initial point-in-time state of the database.  In this situation, you're likely to end up with only one snapshot or bookmark, having erased the work performed during the first 31 of the 32 sprints, and a VDB consuming the storage incurred by the last sprint.
  2. Or, you can create a snapshot/bookmark at the start of each sprint, so that each sprint starts with the end state of the database from the previous sprint.  In this way, you'll have a single very large VDB consisting of the changes applied in all 32 of the 32 sprints, with 32 snapshots/bookmarks marking the beginning of each sprint.
  3. Or, you can perform a refresh from the source at the start of each sprint.  This is rather like scenario #1, except the starting point of each of the 32 sprints is the current state of the source database, which might itself be changing over time.

Or, you can use the branching feature in Delphix Self-Service to with any of the three previously-mentioned scenarios to maintain completely separate timelines for each of the 32 sprints, with each branch starting from the same initial point-in-time as in scenario #1, or with each branch starting from the end-point of the previous sprint as in scenario #2, or branching off and then performing a refresh from the source as in scenario #3.

There is so much versatility with data virtualization that it is more productive to start all of your requirements, and then employ all of the available features to meet those requirements.

If you're not in contact with someone at Delphix already, please let us know how we can help?

Please let me know what you think?

Thanks!

-Tim