Hi, I’m new to Dagster, and have a couple of simple questions that I can’t seem to figure out by reading the documentation.
Let’s say, there is a task (C) dependent on task (A) and (B), where A and B are independent of each other. I could write a Python script where Solid A, B will be run from C, and they all run successfully.
1. How can I instruct Dagster to run A and B concurrently?
2. If I need to re-run A and C, but not B, how do I do that without changing the script?
02/28/2022, 1:03 AM
Hey! I used Luigi some years ago. In your item #1, by default dagster will run applicable ops concurrently. In your item #2, there are a few different ways, but in dagit web UI, this is easily accomplished. You literally just select/highlight task A and in the dropdown option, there is option to just re-run task A or run Task A and any downstream task(s). That's it. Done.
02/28/2022, 1:34 AM
Cool. I will take your word for it and keep digging. 🙂 I once tried Prefect and got bitten pretty bad before I decided not to pursue it.
03/03/2022, 2:48 AM
By design Dagster is a data orchestrator not a task runner. Dagster is trying to keep track of the state of the data assets being orchestrated.
Whilst what @Daniel Kim says is true and will work; I think doing this goes against the way Dagster is designed / intended to be used.
Taking a step back; what is the higher level problem you are trying to solve?
Maybe there is a more “Dagster-y” way to since it?
03/03/2022, 2:51 AM
So following the example I gave at the top,
Task A is client grouping (ie. client id -> client group)
more specifically, loading client group mapping is task A.
Task B is loading transaction data which includes client id.
Task C which depends on A & B, calculates commissions, etc and produce a summary report showing results by client group.
Let’s say client mapping changed after Task C successfully completed. We would want to reload the mapping, (A) and rerun task C, but not B.
Hope it makes sense.
03/03/2022, 7:55 PM
1. Concurrency is determined by your Executor settings. You'll probably want to start by using the MultiprocessExecutor - https://docs.dagster.io/deployment/executors - and telling it how many things it is allowed to run in parallel (when the DAG allows)