Eric Loreaux02/10/2023, 12:17 AM
Would love some guidance, as I'm just starting to build my own mental model for dagster design
["*metric_1*", "*metric_2*", ..., "*metric_n*"]
sean02/10/2023, 1:57 PM
Ops instead of assetsHard to say without more details-- in general we recommend assets for any kind of data transformation, where you aren’t doing something explicitly imperative (e.g. sending an email or other notification). Assets can have a configurable physical location via IO manager, and it’s unclear what’s meant to me by “configure any input data”-- if that data has a consistent form, assets are likely the way to go.
Configuring pipeline runs through op selection syntaxAlso hard to provide much guidance here without knowing more, but, my instinct is that you should consider having multiple jobs rather than a mega-job that you subset through op-selection to achieve different objectives. You can reuse the same op across jobs.
Eric Loreaux02/11/2023, 7:43 AM
Ops instead of assetsBased on the way assets are presented in the documentation, using an asset for my type of data just feels wrong . In the documentation, assets are presented as a collection of data that always has the same interpretation but may need refreshing from time to time, e.g. all stars from a github repo or the recorded temperature for the past week. But in my case, the data definition is always different - it could be data coming from product logs, or it could be fake data, or external data. It could have even have different features available. From this perspective, it seems awkward to consider this asset as going "stale," as each time the job is run the content should be completely different.
Configuring pipeline runs through op selection syntaxThis is an interesting proposal! So basically, each metric can be defined as a job, which specifies the chain of ops that get from input data to metric value. Some follow up questions: 1. is there a way to combine jobs into a single job to launch at runtime? It would be nice if user provides list of metric names and this kicks off all associated jobs 2. even though an op definition is shared across jobs, can its execution be shared? For example, if metric1 job looks like
, and metric2 job looks like
op1 -> op2 -> op3 -> metric1
can I launch these in a way where
op1 -> op2 -> metric2
only needs to execute once? Thanks again for taking time to respond, sorry for all the questions
op1 -> op2
sean02/13/2023, 9:22 PM
is there a way to combine jobs into a single job to launch at runtime? It would be nice if user provides list of metric names and this kicks off all associated jobsI don’t think so-- if you want to launch all this stuff at the same time then putting it in the same job (and possibly subsetting with op selection) is the way to go.
even though an op definition is shared across jobs, can its execution be shared?No, as above in that case they should be in the same job.
sorry for all the questionsNo prob at all, that’s why we have this channel