Hey all, I just found out about dagster a few days...
# ask-community
v
Hey all, I just found out about dagster a few days ago and have been testing it locally, pretty exciting stuff. I'd like to bring this into my team and phase out airflow in the near-mid term future. One of the main ways we're running airflow is by generating dynamic DAGs with the same 3-4 tasks based on a provided config file (including source infos, schedules, etc), since a lot of our extraction jobs follow the same pattern. We have airflow set up so that whenever a new config file is pushed, it automatically generates a new DAG for it. Is there a way to get the same functionality in Dagster? I found this on StackOverflow but that looks like it would require us to use a configurable job and configure it manually as far as I understood, maybe I'm missing something or it's not obvious to me as I'm very much unfamiliar with the general dagster workflow. Can anyone point me in a general direction?
1
z
It kinda depends on what is dynamic about your dags that you're trying to model, but a configurable job isn't generally used to create jobs with dynamic ops. with the Dynamic Mapping & Collect and Conditional Branching features you set up a graph that will change dynamically based on its inputs, so the same graph can have a different structure (say you need to perform QC on N number of files, an op that yields each file as a DynamicOutput that is linked to your QC op will produce a dag with N QC ops performed). This way you don't have to generate dags from dags based on some config (honestly blew my mind when I realized Airflow didn't have a way to do dynamically generate dags out of the box, it's one of the things that made me choose dagster). it's possible though that these dynamic graph features don't fit your use case, depending on how your dags get restructured from those 3-4 tasks.
s
Hey @Vinnie- one approach would be the graph yaml approach under the example "Graph DSL" in our docs: https://docs.dagster.io/concepts/ops-jobs-graphs/jobs-graphs#graph-dsl
v
@sandy @Zach Thanks for the tips! I think the Graph DSL is more or less what I was looking for. For a little more context: We’re using an ETL “framework” I built that’s a collection of python scripts that are dynamically activated depending on the configs passed on a YAML file, thinks like “source, schedule, sink, etc” are some of the information. Since 99.9% of the jobs we’re running through this framework are the exact same (run framework, copy data to snowflake, do QC) and we always know the response type beforehand (framework returns remote file path, etc), I had airflow generate DAGs for each of the configs. The entire ETL script is a lambda/ECS Task in a single Airflow task. I was looking for a way I can say “here’s all the configs in this folder, iterate through them and build identical jobs for each”. Looks like the Graph DSL fits the bill perfectly! 🙂 I’m now just wondering if it still makes sense to run the entire ETL in a single task or not. Dagster seems to have a slight different philosophy there.
s
I’m now just wondering if it still makes sense to run the entire ETL in a single task or not. Dagster seems to have a slight different philosophy there.
There's no single right answer on this one. The main advantages of splitting into multiple ops are: • If you hit a failure, you can re-execute from the middle instead of going back to the beginning • If ops don't depend on each other, they can execute in parallel So it comes down to whether those advantages matter in your situation.