How do you source your data in pre-production environments?
For faster development and increased security we want to have a fully separate pre-production cloud environment.
Ideally, it would have an exact copy of the data present in our production warehouse or lake. Problem is that this poses a security risk and is not cost effective.
How are you dealing with this issue? Do you have an anonymization pipeline from prod into staging? Do you generate fakes?
It seems complicated for us since our dbt project is very large and some raw sources can’t be easily reproduced. Especially SaaS sources that rarely have a non-prod equivalent as compared to an operational database which lives by default in pre-production.