Gemini Trek
02/06/2024, 9:36 PMScout
02/06/2024, 9:36 PMGemini Trek
02/06/2024, 9:39 PMclass OpsConfig(Config):
env: str = None
test_env = RunConfig(
ops={
"pipeline_a": OpsConfig(env="test"),
"pipeline_b": OpsConfig(env="test")
}
)
@op(ins={"start": In(Nothing)})
def pipeline_a(config: OpsConfig):
print(f"pipeline_a in {config.env}")
@op(ins={"start": In(Nothing)})
def pipeline_b(config: OpsConfig):
print(f"pipeline_b in {config.env}")
@job(name="Testing_Scenario", config=test_env)
def testing_scenario():
a = pipeline_a()
b = pipeline_b()
test_env_graph = RunConfig(
ops={
"test_graph": {
"ops": {
"pipeline_a": OpsConfig(env="test"),
"pipeline_b": OpsConfig(env="test")
}
}
}
)
@graph
def test_graph():
a = pipeline_a()
b = pipeline_b()
@job(name="Testing_Scenario_Graph", config=test_env_graph)
def testing_scenario_graph():
test_graph()
When I select a specific pipeline in the job "Testing_Scenario", the launchpad does not update it self, neither the remove extra config button works. It only works when using the job "Testing_Scenario_Graph" that has build using a graph.Scout
02/06/2024, 9:40 PMtesting_scenario
and testing_scenario_graph
, with the latter being constructed from a graph test_graph
. Both jobs are configured using RunConfig
, but they are structured differently. For testing_scenario
, the RunConfig
is applied directly to the ops within the job: python test_env = RunConfig( ops={ "pipeline_a": OpsConfig(env="test"), "pipeline_b": OpsConfig(env="test") } )
For testing_scenario_graph
, the RunConfig
is nested within the graph name: python test_env_graph = RunConfig( ops={ "test_graph": { "ops": { "pipeline_a": OpsConfig(env="test"), "pipeline_b": OpsConfig(env="test") } } } )
The behavior you're experiencing with the "Remove extra config" button not being enabled for testing_scenario
could be related to how the configuration is being applied to the job versus the graph. In Dagster, the configuration for a job or graph is expected to match the structure of the job or graph itself. If there's a mismatch, the UI may not recognize which parts of the configuration are "extra" and should be removed. Here are a few steps you can take to troubleshoot the issue: 1. Ensure that the configuration schema for your ops (pipeline_a
and pipeline_b
) matches the structure of the RunConfig
you are providing. If there's a discrepancy, the UI may not be able to determine what is considered extra configuration. 2. Check if the nesting of the configuration for the graph within test_env_graph
is necessary and correct. The RunConfig
for a graph should typically not need to specify the graph name as a key, unless the graph itself is nested within another graph. 3. Verify that the RunConfig
is being interpreted correctly by Dagster when you launch the job. You can do this by running the job in a test environment and checking the logs to see if the configuration is being applied as expected. 4. Review the Dagster documentation and any relevant release notes for updates or changes to the configuration system that might affect the behavior of the "Remove extra config" button. 5. If you suspect there might be a bug in the UI, consider reaching out to the Dagster community or filing an issue on the Dagster GitHub repository with details of your configuration and the behavior you're observing. Remember that the configuration system in Dagster is explicit, and the structure of the configuration must align with the structure of the job or graph for it to be interpreted correctly by the system. If the structure is off, it could lead to unexpected behavior in the UI.Gemini Trek
02/07/2024, 12:02 AMScout
02/07/2024, 12:02 AMop_a
and op_b
, where op_b
depends on op_a
. Additionally, I've included a RunConfig
with dummy variables for each op.
python
from dagster import job, op, In, Out, Nothing, RunConfig
# Define op_a which does not depend on any inputs and produces an output
@op(out=Out(Nothing))
def op_a(context):
<http://context.log.info|context.log.info>("Running op_a")
# Perform some action here
return "result_from_op_a"
# Define op_b which depends on the output of op_a
@op(ins={"start": In(Nothing)})
def op_b(context, start):
<http://context.log.info|context.log.info>(f"Running op_b after {start}")
# Perform some action here
# Define the job with op_a and op_b, where op_b depends on op_a
@job
def my_job():
result = op_a()
op_b(start=result)
# Define the RunConfig with dummy variables for each op
my_run_config = RunConfig(
ops={
"op_a": {"config": {"dummy_variable_a": "value_a"}},
"op_b": {"config": {"dummy_variable_b": "value_b"}},
}
)
# Now you can execute the job with the run config
if __name__ == "__main__":
result = my_job.execute_in_process(run_config=my_run_config)
In this example, op_a
is an op that does some work and produces an output of type Nothing
, which is a special type in Dagster used to indicate that an op is only used for its side effects and does not produce meaningful output. The op_b
op takes an input of type Nothing
, which is provided by op_a
, indicating that op_b
should run after op_a
.
The RunConfig
is used to provide configuration for each op. In this case, we have dummy variables dummy_variable_a
and dummy_variable_b
for op_a
and op_b
, respectively. These are just placeholders and would be replaced with actual configuration relevant to your ops.
Please note that the RunConfig
class and its usage in this example are illustrative and may not directly correspond to the actual API provided by Dagster. The actual way to provide runtime configuration for a job execution may differ based on the version of Dagster you are using. Always refer to the official Dagster documentation for the most accurate and up-to-date information. Reuse op within a job with different con... button Op jobs | Dagster button Thank you for your feedback!Gemini Trek
02/07/2024, 12:07 AMScout
02/07/2024, 12:07 AMop_a
and op_b
, where op_b
depends on op_a
. Additionally, I've included a RunConfig
with dummy variables for each op.
python
from dagster import job, op, graph, In, Nothing, RunConfig
# Define op_a which does not depend on any inputs and produces an output
@op
def op_a(context):
<http://context.log.info|context.log.info>("Running op_a")
# Perform some action here
return "result_from_op_a"
# Define op_b which depends on the output of op_a
@op
def op_b(context, input_from_a):
<http://context.log.info|context.log.info>(f"Running op_b with input: {input_from_a}")
# Perform some action here
# Define the graph that includes op_a and op_b
@graph
def my_graph():
result = op_a()
op_b(input_from_a=result)
# Convert the graph to a job
my_job = my_graph.to_job()
# Define the RunConfig with dummy variables for each op
my_run_config = RunConfig(
ops={
"op_a": {"config": {"dummy_variable_a": "value_a"}},
"op_b": {"config": {"dummy_variable_b": "value_b"}},
}
)
# Now you can execute the job with the run config
if __name__ == "__main__":
result = my_job.execute_in_process(run_config=my_run_config)
In this example, op_a
is an op that does some work and produces an output, which is then passed as an input to op_b
. The my_graph
function uses the @graph
decorator to define the dependency between op_a
and op_b
.
The my_job
variable is created by calling the to_job()
method on the my_graph
graph, which converts the graph into a job that can be executed.
The RunConfig
is used to provide configuration for each op within the graph. In this case, we have dummy variables dummy_variable_a
and dummy_variable_b
for op_a
and op_b
, respectively. These are placeholders and would be replaced with actual configuration relevant to your ops.
Please note that the RunConfig
class and its usage in this example are illustrative and may not directly correspond to the actual API provided by Dagster. The actual way to provide runtime configuration for a job execution may differ based on the version of Dagster you are using. Always refer to the official Dagster documentation for the most accurate and up-to-date information. Testing | Dagster button Wonky op graph displays when creating As... button Thank you for your feedback!Gemini Trek
02/07/2024, 12:24 AMScout
02/07/2024, 12:24 AMpython from dagster import job, op @op def op_a(): # logic for op_a @op def op_b(): # logic for op_b @job def my_job(): op_b(op_a())
Running a Job with a Single Graph: When you run a job with a single graph, you are encapsulating the ops within a graph abstraction. This allows you to reuse the graph in multiple jobs or even nest it within other graphs. It also provides a clear separation between the definition of your computation (the graph) and the orchestration of that computation (the job). Here's an example of a job with a single graph: python from dagster import job, op, graph @op def op_a(): # logic for op_a @op def op_b(): # logic for op_b @graph def my_graph(): op_b(op_a()) @job def my_job(): my_graph()
Key Differences: 1. Reusability: Graphs can be reused across multiple jobs or nested within other graphs. This is not possible when you define ops directly within a job. 2. Abstraction: Graphs provide a level of abstraction that separates the definition of the computation from its orchestration. This can lead to cleaner code, especially for complex workflows. 3. Testing: Graphs can be tested independently from the job, which can simplify testing workflows. 4. Flexibility: When using graphs, you can create multiple jobs from the same graph with different configurations or resources, providing more flexibility in how you execute the same computation under different conditions. 5. UI Representation: In the Dagster UI (Dagit), graphs can provide a more organized and visually clear representation of the computation, which can be helpful for understanding and monitoring workflows. In summary, while you can achieve the same end result with both approaches, using graphs provides more flexibility, reusability, and clarity, especially as the complexity of your workflows increases. The documentation suggests using graphs when you anticipate the need to reuse computation patterns or when you want to maintain a clear separation between the definition of your computation and its orchestration.Gemini Trek
02/07/2024, 12:26 AMScout
02/07/2024, 12:26 AM