Would be great if Dagster employees could blog a little more about good pipeline / Data Engineering practice:
• How partitions help you to achieve idempotency and as a result more robust pipelines that are easier to debug and refactor (for performance improvements with identical expected data outputs) + parallelise execution for backfills.
• How software defined assets enable you to increase the transparency of your pipelines so that data consumers can debug / understand problems and their downstream consquences.
• How extracting and loading to S3 with a separate sensor to run the load to target datasource for when the data becomes available can lead to more efficient resource use (/the extraction step, if mostly just running a slow DB query doesn't use much RAM but still can block other processes if the two pipelines are kept together).
• The value of type annotation and mypy for catching bugs early (don't think prefect has that?)
These things were not obvious to me when I started and I'm not even sure they're all correct/true now / I'd write more about it but I'm not sure that what I'd be writing would be correct. One of the things that dagster has helped me most with is how to structure work so that it is maintainable.
Edit: + More thought leadership on how to test pipelines, you've done a lot for unit testing in data engineering, but Data Engineers tend to also use Prod databases -> Dev DataWarehouse, to check that the whole pipeline works properly, which your Yaml Configs handle very nicely. Or ways to check that an asset is the same after a refactor (checksum / some form of hashing?)
06/08/2022, 1:22 PM
blog about it even if you're wrong! i think a big gap right now is that from the outside it's not clear how many people are using dagster in production, and even if what you're writing is not the best practices, it could spur discussion and help others (like me!) out
but i also think some "how we use dagster at dagster" posts would be tremendous
06/08/2022, 5:49 PM
@George Pearse@Stephen Bailey thanks for the feedback!! we do plan to write more about these topics and “how dagster uses dagster” is definitely on the radar too! plz stay tuned big dag eyes