Curious if anyone in the community has regretted the way the dagster #dagster-feedback

Curious if anyone in the community has regretted t...

Jon Simpson

08/24/2023, 4:28 PM

Curious if anyone in the community has regretted the way they structured their Software Defined Assets and how they adjusted them over time. Evaluating dagster and I feel like there’s a good amount of room to go wild if not defining standards before hand. But I’m not sure what the standards would be

👀 2

Zach

08/24/2023, 5:34 PM

I definitely regret not spending more time up front thinking about how to name assets and jobs. Run history is tied to the keys and names of each asset / job, so if you change them, you lose your run and materialization history. The cloud team has offered to re-link history after changing names, but I haven't spent time yet to figure out what our long-term naming strategy should be and don't want to have to ask them to re-link run history multiple times. I also would feel bad asking them to re-link 50 or 60 jobs and assets. Really wish there was a tool or something to do this on my own. If you host it yourself you can probably dig into the database to update job names / asset keys, but that also sounds a bit dangerous. Similarly, I think coming up with a prefix strategy early on for assets has huge benefits and another thing I slept on while getting my team off the ground - now we have no prefix strategy and similar problems with adding / renaming prefixes dropping run history.

Matt Clarke

08/24/2023, 8:50 PM

There’s a risk of over correcting. We have a workflow where we have some core logic we want to deploy, with various configs, on multiple data sets. The plan was to have a customer deployment, dependant on a “base” repo which defines asset factories, dependant on a core library. Great in principle, but the number of branches you need open to validate that some new feature in core works for everyone is mental. The main issue is that when you create something as an asset, it isn’t directly useable without some knowledge of Dagster and runtime contexts. This is a bit of a sharp edge which has prevented our pure analysts from writing things which are over ready for Dagster.

Zach

08/24/2023, 9:45 PM

Yeah I can't recommend mono-repos for Dagster deployments enough, at least if you plan to share code across projects. Coordinating changes to code that is used in multiple repos is a huge pain. Going mono-repo and implementing a build tool like Pants saves a ton of headaches.

Matt Clarke

08/25/2023, 11:48 AM

I think theres a middle ground to be found. In our case a feature might get modified, and in order to validate that we're happy with it we'd need to create branch deployments for each customer, ZCC the prod snowflake db, run that modified query, and check we're happy. Having core development not directly tied to our deployments is nice, but I think the first and second layers could be one, but it requires some upskilling of our team to achieve it.

2 Views

Open in Slack

Previous Next