NEW
Test Data and AI: What Makes Good Test Data?
In this series of articles we’re going to be talking about how to use LLMs to generate synthetic data for QA testing, starting with the basics of test data, then moving on to generation methods, and finally looking at examples for generating test data for the purpose of validating LLM products. But let’s start at the beginning - in this article we’re going to talk about how to use synthetic test data more generally, what makes good or bad test data, and we’ll also look at some traditional QA methodologies and how test data can inform them. Synthetic data refers to any machine-generated data that can be used to execute test cases or mock a production environment scenario. This includes data produced by LLMs, procedural data, and human curated or created data generated outside of production. Of course, production data is incredibly valuable for testing, and when it’s possible to use it, it should be used - but often this is not possible, legal or scalable. Generating production data can also be an expensive process for a new feature or product since you need to hire beta testers. Synthetic data also has some other advantages other than cost.