I must confess I’m a bit confused by some of the relationships between ops/graphs/jobs. Can you nest ops inside each other? Can you nest graphs inside each other? Jobs inside each other? If I want to define something where one person's “pipeline” is just another person’s “op” how hard is that to do?
08/08/2022, 11:29 PM
Hi Josh. Let me take a stab at trying to clarify these relationships:
• Ops cannot be nested inside each other. Jobs also cannot be nested inside each other.
• Graphs can contain ops and other graphs. Under the hood, Dagster flattens a graph (and all the nested graphs inside) into a flat mapping of ops.
• The core of a job is a graph, which defines the job's computation. The graph is bound to execution specific configuration (e.g. run config, executor, resources) to create a job.
• Graphs are meant to be an organizational tool. Some use cases:
◦ You have complex computation you want to reuse in many places. You can define each computation in an op and combine them to form a graph. Then, you can reuse this graph everywhere you'd like this computation.
◦ You want to reuse the same computation across many jobs, but with different configuration or resources. In this case, you can create one graph. You can bind this graph to each different set of configuration and resources to create a job.
If you want to create a piece of computation that is reused by multiple people/tasks, seems like you want to create a graph which others can reuse and interconnect with other computations (via ops or other graphs).