03/04/2021, 2:14 PM
Hi All, I recently built a POC for my company using Dagster and got a thumbs up for setting it up to cover more data flows. The POC was a bit rushed, so I didn't organise the code very well. I am wondering if there is a best practices guidelines for greenfield projects? I have questions like - • preferred deployment methods (VM, kubernetes, etc.) • support for gitOps style deployment • preferred environments (linux, windows, although I assume linux 🙂 ) • best practices to store SQL, as most of data transformations are done using SQL (Redshift) • preference for specifying config: yaml Vs python API Thanks in advance!
👍 3


03/04/2021, 8:26 PM
Hi Deveshi 👋 our recommended deployment method is kubernetes with our “out-of-the-box” helm chart (however, many dagster users do deploy on ec2 / vm directly — so that is definitely possible)
“support for gitOps style deployment” <- not totally sure what this means, but you can deploy all dagster components via the helm chart in ci/cd
linux! but we do run tests on windows too
“preference for specifying config: yaml Vs python API” <- yaml is often used for the dagster instance and workspace. internally, we use python apis for everything else to make it easier to compose/modify/read/test configs (although this is a bit of personal/team preference)
“best practices to store SQL” <-- i’m not as sure about this one. i’ve seen this either in-line in the solid, in-line a method of the redshift resource. many users use dbt to version their sql


03/04/2021, 8:38 PM
Thanks @cat 🙂 This is very helpful