https://dagster.io/ logo
Title
m

Matt Callaway

05/25/2021, 8:31 PM
Hello. Newbie here trying to get started. While I do find the documentation quite good, I’m finding it frustrating wrapping my mind around Dagster to get beyond pretty basic hello world style work. I’m looking for examples of real world implementations of things like “Write this file to a filesystem as an Asset and then use it in the next step. Only create the file if it doesn’t already exist” (asset materialization with IO managers?) And “use X database in development and Y database in production” (modes?) And “how about a scatter/gather pattern” and “how about common implementations of CWL bioinformatics operations”. I’ve been looking for “real world examples”. I’ve found this https://github.com/sspaeti-com/awesome-dagster but it’s pretty sparse. I’ve spent a good part of today just scanning up dagster-support here. Can anyone recommend any such examples?
s

sandy

05/25/2021, 9:54 PM
Hey Matt. Thanks for the feedback. I don't have a concrete recommendation for you, but if it would be helpful, we could find some time to chat directly and I could help talk you through what it would look like to accomplish some of these. I agree that this would be valuable content for us to include.
👍 2
r

Raghu

05/26/2021, 4:46 AM
Appreciate it. Am new here too and would be happy to curate some examples to help someone similar like me who don’t have prior experience in airflow but wanna start here with dagster! Count me in if you are look for examples!
m

Matt Callaway

05/26/2021, 12:50 PM
@sandy It would be amazing for you to help us get started! I’m also at an org with Airflow experience, where we’ve added custom improvements on top of Airflow to deal with its flaws. One flaw that we’re never going to work around is the ability to run workflows locally. I want to run a thing on my macbook, then have the identical thing run in AWS (or GCP). This is the first “win” that Dagster could offer. To that end, I’ve got a simple workflow I’m running on my Mac. I now need to figure out “modes” such that I can represent “environments”, ie. “dev” (local) vs “test” (AWS account 1) vs “prod” (AWS account 2). The differences in environment must account for IO/storage (local filesystem on a mac vs. S3 bucket in cloud). My simple workflow has a step to touch a file to create an Asset to illustrate this. I think this will amount to the creation of a
dagster.yaml
and figuring out the steps to go through dev -> test -> prod. Any illustrative examples of this would be greatly appreciated.
I wonder if it might be useful to submit github issues related to the creation of a “Dagster Cookbook”. I think spelling out use cases like this would be of tremendous value to you all.
Further, the bioinformatics community has been trying to push for WDL and CWL as standards, I’m sure you’re aware of this. Implementing CWL examples in Dagster would also be very helpful.
s

sandy

05/26/2021, 9:30 PM
we generally track these kinds of issues as "Content Gaps": https://github.com/dagster-io/dagster/issues?q=is%3Aissue+is%3Aopen+%22Content+Gap%22. I filed a couple based on what you listed above, but adding more would definitely be helpful. here's a link for filing them: https://github.com/dagster-io/dagster/issues/new?assignees=&labels=documentation&template=report-missing-documentation.md&title=%5BContent+Gap%5D
@Raghu we are definitely looking for examples. If you have thoughts on what's missing or would like to contribute an example, both would be very helpful
👍 1