Here's how I would think about it:
The Delphix DevOps Data Platform brings capabilities to Test Data Management that are simply impossible with other solutions, and that begins with thinking differently about Test Data. Testers face constant tension with test data:
To be more specific about data synchronization:
- Feature delivery can take a hard shift right as errors pile up from stale data or as rework enters because new data breaks the test suite. Why is the data out of date? Most companies fail to provision multi-Tb test datasets anywhere near the timeframes in which they can build their code. Delphix can deliver fresh, full size datasets at scale in minutes. For example, a tester can get a freshly masked up-to-date copy of their test data loaded and ready to go in minutes. A 5 Tb copy takes about 5 minutes to provision.
- To solve the pain of provisioning large test datasets, test leaders often turn to subsetting to save storage and improve execution speed. Unfortunately, poorly crafted subsets are rife with mismatches because they fail to maintain referential integrity. And, they often result in hard-to-diagnose performance errors that crop up much later in the release cycle. Solving these subset integrity issues often comes at the cost of employing many experts to write (seemingly endless) rulesets to avoid integrity problems that foul-up testing. Unfortunately, it's rare to find any mitigation for the performance bugs that subsetting will miss. Delphix lets you use full size datasets at the cost of a subset.
- It's worse with federated applications. Testers are often at the mercy of an application owner or a backup schedule or a resource constraint that forces them to gather their copy of the dataset at different times. These time differences create consistency problems the tester has to solve because without strict consistency, the distributed referential integrity problems can suddenly scale up factorially. This leads to solutions with even more complex rulesets and time logic. Compounding Federation with Subsetting can mean a whole new world of hurt as subset rules must be made consistent across the federated app. Delphix can maintain referential integrity, distributed referential integrity, and masked distributed referential integrity (that is, 3 different kinds of databases all in sync where masked data matches masked data) OUT OF THE BOX.
- Synthetic data can be essential for generating test data that doesn't exist anywhere else. But, when synthetic data is used as a band aid to make a subset "complete", we reintroduce the drawbacks of subsets. To reach completeness, the synthetic data may need to cover the gap where production data doesn't exist, as well as determine integrity across both generated and subset data. Marrying synthetic data and subsets can introduce new and unnecessary complexity. With Delphix, dropping your synthetic data into your "Test Data Trunk" results in everyone being a 5-minute refresh away from having that synthetic data available.
- Protecting your data introduces more speed issues. Those that mask test data typically can't deliver masked data fast or often enough to developers, so they are forced into a tradeoff between risk and speed, and exposure usually trumps speed when that decision is made. As a Gartner analyst quipped: 80% of the problem in masking is the distribution of masked data. Moreover, masking has its own rules that generally differ from subsetting rules. Delphix delivers freshly masked data via self-service on-demand to testers and developers and anyone else who needs it.
- Environment availability also prevents us from getting the right data to the right place just in time. Many testers use a limited number of environments, forcing platforms to be overloaded with testing streams - such that the resultant sharing and serialization force delay, rework and throwaway work to happen. Some testers wait until an environment is ready. Others write new test cases rather than wait, and still others write test cases they know will be thrown away. With Delphix, all that clobber, wasted time, and lost work disappears. Each developer can have their own environment without stuffing the box. For example, one customer took the same hardware that used to host 5 shared dev environments and made 200 individual environments, unsticking their dependency problem and helping them achieve true CI/CD. That customer used to do monthly releases. Now, they release 10,000x / year.
- Compounding this problem, platforms that could be repurposed as test-ready environments are fenced in by context-switching costs. Testers know the high price of a context switch, and the real possibility that switching back will fail, so they simply hold their environment for "testing" rather than risk it. Behaviors driven by the cost of context-switching create increased serialization, more subsetting, and (ironically), by "optimizing" their part of the product/feature delivery pipeline, testers end up contributing to one of the bottlenecks that prevent that pipeline from moving faster globally. With The Delphix Self-Service interface, developers literally have a library of different versions of datasets at their fingertips that they can swap in and out at the click of a button in 5 minutes. Testers can share their "before" and "After" dataset snaps with their developers and go immediately back to work on something else because the dataset is already in the library.
- Reproducing defects can also slow down deployment. Consider that quite often developers complain that they can't reproduce the defect that a tester has found. This often leads to a full halt in the testing critical path as the tester must "hold" her environment to let the developer examine it. In some cases, whole datasets are held hostage while triage occurs. As mentioned above, this simply is no longer the case with Delphix.
- These problems are all subsumed into a tester's most basic need: to restart and repeat her test using the right data. Consider, then, that repeating the work to restore an app (or worse a federated app), synchronize it, subset it, mask it, and distribute it scales up the entire testing burden in proportion to the number of test runs. That's manageable within a single app, but can quickly grow unwieldy at the scale of a federated app. But, with Delphix, it works right out of the box.
Masked Referential integrity at scale requires temporal and semantic integrity across 1 or many heterogeneous datasets. Delphix Masking relies on several key capabilities to enable this.
First, Delphix’ immutable time flow combined with our Data Library allows you to maintain a temporally consistent view of a set of any N heterogeneous datasets.And to be more specific about Subsets:
Second, our policy driven single-point of masking rule definition allows Delphix to maintain semantic consistency among a set of any N heterogeneous datasets. (Which means that even if Oracle stores a name as First, Last and AWS Aurora stores it as Last, First - both will get masked the correct way to semantically equivalent values).
Third, our Profiler combined with our one-way transform can identify both explicit and implicit relationships (such as Explicit PII field vs. a field that might contain PII data) and mask them to the same (or semantically equivalent) value. Thus, whether you know if it needs to be masked or not, you will still maintain RI and distributed RI.
A guest dataset ingested into Delphix can be anything - full size, subset, masked, etc. The vast majority of customers subset to save space. But, many customers are unaware that, with Delphix, subsets are not needed to achieve the same cost savings goals even with our full size read-write virtual datasets.
Many solutions on the market were not purpose built for masking like Delphix was. For these other solutions, storage savings drives subsetting, and subsetting drives rule-writing. So, to save storage, customers have to write and maintain a large number of rules to maintain Referential Integrity within the subset. But, since Delphix read/write copies consume very little storage, can be provisioned in minutes and require little or no labor; you don’t need to write all those rules to save storage - you have already saved the storage cost with Delphix. With Delphix, you get that storage savings without writing any subsetting rules. For example, as compared with other solutions, the time you spend writing rules with Delphix is about 80% less the first time, and 95% less each subsequent time you change the data model. And buyer beware: Companies who were born in the ETL space trying to sell masking will often discount their products significantly because they make so much money on the backend writing these rules. Some of our customers tell us it can take 6 months to get their first table masked.
Beyond the cost of writing these rules there are the significant disadvantages of subsets:
- Subsets are notorious for referential mismatches - Poorly crafted subsets are often rife with mismatches because they fail to maintain referential integrity. Our customers tell us that their rule-writers spend a lot of time chasing down these errors.
- Subsets are often statistically skewed - Rule writers are also notorious for scoping subsets of data to match the test data coverage goals instead of creating statistically representative data. This can cause errors related to the frequency of problems as well as emergent edge cases to be missed. And, this is also a problem that can crop up for Quantitative Developers/AI Practitioners.
- Subsets cause hard-to-diagnose performance errors. Errors related to volume shift right in the release cycle when you use subsets. Unfortunately, it’s rare to find any mitigation for the performance bugs that subsetting will miss.
However, if after all of this you are still sure you need subsets, then rest assured you can use a subsetted data source or VDB just like any other data source ;-)
Let me know if you have more questions.
VP, Global Presales
Sent: 06-14-2021 05:17:27 AM
From: Sreshtha Das
Subject: TDM Implementation
I am trying to reach out to Delphix team to understand how Delphix fits our requirements in terms for TDM implementation. We are mainly focused on Data Synchronization, Data Masking and Data Subsetting.
Can anyone help us get connected to the right person who can help more on this. I have reached out to Contact Delphix for demo and also sales team earlier. But failed to get a response.
Please help me connect to someone concerned at the earliest possible time.
Delphix Community Members