How to generate test data safely and efficiently to promote enterprise innovation

Source: Internet
Author: User
Keywords Security innovation test data

Let's guess the riddle: Every organization needs it, and every organization can generate it, but fast and easy generation can lead to high risk and high spending. Guess what I'm talking about? The answer, of course, is to test the data.

In recent years, IBM has been promoting the idea of building intelligent Earth. To a large extent, the core of your enterprise's wisdom in the eyes of its customers and shareholders is the application, which can help improve operational efficiency, engage buyers, discover new market opportunities, and accelerate product development applications. If you need to ensure the quality of these applications (which is also necessary), you must thoroughly test them before they are put into production.

Effective application testing relies on data. If there is a problem with the data section, you are bound to get into trouble. is the value subset of the critical field contained in the test data too small? The result is that the enterprise will not be able to discover data processing errors, which will eventually result in application users encountering such errors. From the perspective of referential integrity, do you want to test database tables and files for inconsistencies? This may require some corrective action, and the necessary data repair work will delay the completion of the test and may lead to deviations in the application's implementation of the target date. Invalid values in test data cause false positives related to application code errors, causing a lot of time to be wasted in diagnostics, and even "fixing" nonexistent problems caused by data values that do not exist in the real production environment.

What should we do?

Easy way to get you into trouble

Recognizing the importance of testing data integrity and quality, some organizations have chosen the seemingly easiest way to replicate production data (and all production data) directly into the program development and testing environment. This approach reduces the problems associated with the low representation of test data in the production environment (but does not completely eliminate such problems, which I will explain later), but also poses some major challenges.

This can seriously damage your digital privacy control work.

If you do not expect results, your business name will appear in the headlines that are leaking from highly sensitive data (credit card information, government identification number, account ID, or system password). This situation can seriously damage the reputation of your business in the eyes of customers and potential customers, and if sensitive data falls into the hands of cyber criminals, many will face financial or other losses.

With such threats in mind, IT security personnel may have taken a number of steps to lock down data in the production system. However, if you copy this data into a development and test environment, you have more users with database tables and file read access to sensitive data fields. What if the number of users increases by five times times? 10 times times? 100 times times? This is an important issue because data privacy violations are often caused by internal personnel's various actions, sometimes unintentionally, and sometimes intentionally.

This could result in a lot of money loss.

During my last survey, no vendor abandoned disk storage capacity. In the past, you might have been shocked to note that "our database is 1 TB of disk space," but the same capacity (perhaps twice to three times times) in the test and development environment will not cause undue concern. Today, data marts can have 1 TB or even more data, and cloud environments are essential. Your production database hub may have dozens of terabytes of data and indexes (or even petabytes), each of which can contain billions of of rows of data. From a cost standpoint, is it really reasonable to use so much disk space in a test and development system just to ensure the integrity of the test data and the validity of the data value?

Such disk space is only part of the cost of generating test data using this "Dump and wish" method. How much production system CPU time does it take to unload a large database? How many cycles are used to push all this data into the test tables and files? How long does it take your programmer to wait for any long, large data load job to run? Do you strive to implement strict IT management to control costs? So this is definitely not the ideal way to achieve a goal.

This approach is extremely inflexible.

One of the things you can't avoid in an application development system is database design changes. People don't change for the sake of change. The goal is to attempt design changes to improve application performance and scalability. Splitting a table into two tables, adding columns to a table, and changing the data type of a column are very common changes that are often helpful.

However, if the method of populating the test and development database is to unload and load directly, the target data structure must match the source structure. What does that mean? This means that you must provide specialized development and test databases (as well as all disk space) and use them only to receive data unloaded from the production environment. You will then need to provide another database to reflect the current design in development (which may have some differences compared to the production environment). To load a database from a production replica database into the current design development database has become the responsibility of programmers and developers supporting DBAs, do you really want to waste time on this kind of mapping effort with skilled IT professionals? How much time is wasted when you try to launch a new application?

Therefore, the overall move of the production database into the test and development environment will result in increased data privacy violation risk, increased storage and CPU capacity requirements, seriously affecting flexibility and agility. In addition, this method should be said to be an excellent method.

There's a better way.

The wisdom era of test data generation has arrived, and the IBM infosphere Optim test Data Management solution product is a representative product of this wisdom era (for simplicity, hereinafter referred to as Optim TDM). Optim TDM is a comprehensive solution that addresses test data generation requirements for different platforms, DBMS, and file systems, including VSAM on DB2 for z/OS, IMS, and IBM System z servers. For all of the test data generation challenges mentioned above, Optim TDM provides you with the tools you need to help you accomplish your tasks in a secure, efficient, and responsive manner. Features include:

Data protection

Optim TDM provides a variety of mechanisms that enable sensitive data in your production system to be protected in a test and development environment. Combining Optim TDM with the Infosphere Optim data shielding solution, you can even get more sophisticated data protection options.

If field A and field B do not need protection, but the combination of field A and field B must be protected, then random arrangement is an ideal choice. For example, for a hacker, a person X's identification number, if paired with the name of the person Y, may not be worth the data. Perhaps the data shield is a reasonable approach, and if this is done, the Optim solution ensures that the generated value is authentic. Masks can be done in a random fashion, or in a way that protects sensitive data while preserving referential integrity relationships (sometimes referred to as a technique of "repeatable masking").

In addition, you can use lookup tables and routines to securely provide valid values for names, mailing addresses, and e-mail addresses. Do you want to design a custom data transformation routine that meets the specific requirements of your environment? Optim Solutions can also help you achieve this goal.

Reasonable adjustment of test data scale

As mentioned earlier, it is costly to dump a large production database into a test and development environment, perhaps to a number of such environments. On top of that, many developers want a smaller set of test data, sometimes just 1% of the production data. In addition, program testers want to get a different proportion of production data than program developers.

Meeting these requirements can lead to data integrity problems. How do you want to generate a really useful subset of related data? In other words, if you get 1% of the rows from the parent table Y, how do I make sure that I get the appropriate rows from a table with a reference relationship? Optim TDM also considers this issue for you, not only considering the referential integrity constraints of the database, but also helping you transfer the "right" subset of production data to the test and development system.

Perhaps all you need to do is sample ("Get one row per n row in this table, stop after getting 1 million lines"), or you might want to focus on a particular set of records (for example, a record of a customer who lives in Ontario Prov.). You might want to choose very precisely the production data records that need to be introduced into the test or development environment, using the "right orientation" mode of work (I need this line, that line, and that line.) )。 Regardless of how you want to get a reasonable subset of production data that is reasonably sized, consistent (and reasonably shielded), and used in test and development systems, Optim TDM can help you accomplish tasks more easily.

Accept the design differences between the source and destination databases

Optim TDM provides table mapping and column mapping capabilities that provide the necessary flexibility to handle production systems and develop data model differences between systems. If you need to show how the mapping works, Optim TDM is responsible for running the mappings.

Given all these options and the flexibility, you might think that managing Optim TDM is an extremely difficult or tedious task, and that is not the case. The solution includes a test data management Self-Service center. This browser-based interface eases the burden on DBAs and allows testers and developers to participate more in the definition and execution of related workflows and processes for testing data generation (for example, approval of the process of test data refresh requests). DBAs can use this tool to facilitate data refresh operations. Self-service centers also increase productivity and demonstrate measurement methods for testing and developing data generation activities.

You may still shake your head, and you may still think: "It sounds like a good idea to safely generate a complete, consistent subset of production data for testing and development purposes, but it all assumes that you know what all these referential relationships are, but frankly, we don't know. "First of all, there is nothing to be disheartened about, whether many sites adopt application based referential integrity or data entity relationships across database boundaries (the data in table A may have a referential relationship to the data in file B). Second, we also provide solutions to this problem. This solution, called Infosphere Discovery, will be an excellent complement to Optim TDM.

The purpose of Infosphere Discovery is as follows: It examines data storage (including data storage inside and outside the database management system) and reports the obvious data relationships found. With such information, you can use Optim TDM to provide testers and developers with data with referential integrity and consistency without having to replicate all production data into the test system. You will be able to reasonably adjust the size of your test data while avoiding gaps in the cognitive blind spots of data relationships in your production environment, and Infosphere Discovery will eliminate these blind spots.

Innovation perspective

Today's organizational success is always related to "update" and "faster": quickly identify potential new customers, quickly retire new products and services to the market, and quickly determine which factors can generate new business from existing customers. It is expected that it implements these functions.

Consider how the IBM infosphere Optim test Data management solution will enable test data generation to become part of the enterprise innovation engine rather than cause confusion.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.