Using production data

Data selection is the first step down the road of data-driven testing. You'll need to select the data that either drives the navigation of your application, represents the data that gets entered into your application, or both. One way to select data for testing is to use production data.

Although you shouldn't rely solely on this type of data, it can be one of the richest sources of scenarios for automated testing, both because the data is representative of real scenarios the application will face, and because it will most likely provide a high number of different scenarios. You can load the data straight into the test environment, read it into data files for processing later, or read it in real time and convert it as you use it.

Production data is also an excellent source for parallel testing. If you use production data in the system you're developing, you'll quickly know if that system works like the system in production. This technique can especially help in finding problems with floating-point values, conversion ratios, and lengths associated with data types.

There are some caveats about using production data, however. Production data will most likely not contain many of the special cases you'll want to test for, and it's not a replacement for well-thought-out test scenarios. There are also potentially some legal issues surrounding the use of production data. Especially if you outsource some of your testing, you'll want to be sure to check your company's policies on the use of production data; if no formal policy exists, consult someone in your legal department. Even if you can't directly use production data, odds are you'll be able to change some values (names, social security numbers, and such) and use the rest of the data.