For testing i care a lot about repeatability. Specifically, i'm interested in te...

openquery · on Aug 18, 2020

Hey!

> Is there a way to guarantee a specific set of test data each time

Absolutely. You can seed the model so that the data you get each time is completely reproducible

> For this testing it's not as important that the data is repeatable, but the testers have a few different scenarios they want to test, so I'd need a way to make a low-data, medium-data, and high-data test set where the backing data fit within some ranges.

This is a great use-case for Synth. With the upcoming Firehose API you can point it at an existing database and specify how much synthetic data you want to generate and pump into your db.

For now you can either create a database and write the ETL, or do `synth model sample <model-id> --ouput <some-directory> --sample-size <number-of-rows>` to sample directly from the model into a directory of CSV files and use that to load your database

Feel free to get in touch if you would like to learn more :)