Binoy Shah
03/11/2024, 7:38 PMScout
03/11/2024, 7:38 PMRunConfigData
for tweaking the execution or materialization of your assets via GraphQL, you would typically use the launchRun
mutation in the GraphQL API. The RunConfigData
is a type used when passing an arbitrary object for run config in the GraphQL mutation. This config must conform to the constraints of the config schema for the job you are executing. Here's a step-by-step guide on how to use RunConfigData
with your assets: 1. Define the launchRun
mutation with the necessary arguments. The required arguments include selector
, which contains the repository location name, repository name, and job name, and runConfigData
, which is the run config for the job execution. 2. Provide the RunConfigData
as part of the executionParams
in the mutation. This data should match the config schema defined for the assets or jobs you are running. 3. Execute the mutation with the appropriate parameters to launch a run. Here is an example of how the GraphQL mutation might look: graphql mutation LaunchRunMutation( $repositoryLocationName: String! $repositoryName: String! $jobName: String! $runConfigData: RunConfigData! ) { launchRun( executionParams: { selector: { repositoryLocationName: $repositoryLocationName repositoryName: $repositoryName jobName: $jobName } runConfigData: $runConfigData } ) { __typename ... on LaunchRunSuccess { run { runId } } ... on RunConfigValidationInvalid { errors { message reason } } ... on PythonError { message } } }
In this mutation, you would replace $repositoryLocationName
, $repositoryName
, $jobName
, and $runConfigData
with the actual values for your specific use case. Please note that RunConfigData
is any-typed in the GraphQL type system, but it must adhere to the config schema for the job or assets you are running. If the provided config does not match the schema, the mutation will return a RunConfigValidationInvalid
response. If you need to provide configuration for assets that are not part of a job, you can use the Dagit UI's launchpad to supply the configuration when materializing the assets. When you click "Materialize" in Dagit, and at least one selected asset has a defined config schema, you'll be presented with a modal Launchpad interface that allows you to provide config values before launching the run. For more detailed information on how to structure your RunConfigData
and how to use the GraphQL API with Dagster, you should refer to the Dagster documentation on GraphQL and the specific schema definitions for your assets.Binoy Shah
03/11/2024, 7:40 PMRunConfigData
get passed into the assetScout
03/11/2024, 7:40 PMRunConfigData
values you pass in when launching a run will be available in the context
under context.op_config
for the asset. Here's an example of how you might define an asset with a config schema and access the run config within the asset's compute function: python from dagster import asset, AssetExecutionContext @asset(config_schema={"param": str}) def my_asset(context: AssetExecutionContext): # Access the run config value for 'param' param_value = context.op_config["param"] # Use the config value in your asset's logic # ...
When you execute this asset, you would provide the RunConfigData
that includes the param
key with the desired value. For example: python from dagster import materialize result = materialize( [my_asset], run_config={"ops": {"my_asset": {"config": {"param": "value"}}}} )
In this example, the run_config
dictionary is structured to match the asset's config schema, with the param
key set to "value"
. When the asset is executed, the param
value is accessed within the asset's compute function via context.op_config["param"]
. Remember that the run_config
dictionary must match the structure expected by the asset's config schema. If you have nested config schemas or multiple assets with different config schemas, you would structure the run_config
accordingly, ensuring that each asset's config is provided under the asset's key within the ops
dictionary.Binoy Shah
03/11/2024, 8:26 PMScout
03/11/2024, 8:26 PMconfig_schema
, you can define a class that inherits from Config
. This class will have attributes that represent the configuration options for your asset. You can then use this class directly in your asset function by specifying it as a parameter. Here's an example based on the documentation: python from dagster import asset, Config # Define a config class using the new Pythonic way class MyAssetConfig(Config): param: str # Use the config class as a parameter in your asset @asset def my_asset(config: MyAssetConfig): # Access the run config value for 'param' param_value = config.param # Use the config value in your asset's logic # ...
When you execute this asset, you would provide the RunConfigData
in the same way as before, but now you're using the class-based config schema: python from dagster import materialize result = materialize( [my_asset], run_config={"ops": {"my_asset": {"config": {"param": "value"}}}} )
The run_config
dictionary is structured to match the asset's config schema, with the param
key set to "value"
. When the asset is executed, the param
value is accessed within the asset's compute function via config.param
. This approach provides a more Pythonic and type-safe way to define and access configuration for assets. It eliminates the need to use dictionaries and string keys to access config values, which can help prevent errors and improve code readability.Binoy Shah
03/11/2024, 8:28 PMScout
03/11/2024, 8:28 PMConfig
object and RunConfigData
are related in the sense that they both represent configuration data for assets or jobs in Dagster, but they serve different purposes and are used in different contexts.
1. Config Object: The Config
object is a Python class that inherits from Dagster's Config
base class. It is used to define the configuration schema for an asset or resource in a Pythonic way, using type annotations. This class-based approach allows you to define configuration options as class attributes, providing type safety and better integration with code editors and IDEs for autocompletion and type checking.
2. RunConfigData: RunConfigData
is a term used in the context of Dagster's GraphQL API to refer to the configuration data that you pass when launching a run. It is a dictionary that conforms to the configuration schema defined for the job or asset you are executing. When you use the GraphQL API to launch a run, you provide this configuration data as a JSON object, which is then interpreted by Dagster according to the schema of the job or asset.
To make Config
objects and RunConfigData
compatible, you need to ensure that the dictionary you pass as RunConfigData
matches the schema defined by the Config
class. When you execute a job or asset, Dagster internally converts the RunConfigData
dictionary into an instance of the corresponding Config
class, which is then passed to the asset or resource function.
Here's an example to illustrate the relationship:
python
from dagster import asset, Config
# Define a config class using the new Pythonic way
class MyAssetConfig(Config):
param: str
# Use the config class as a parameter in your asset
@asset
def my_asset(config: MyAssetConfig):
# Access the run config value for 'param'
param_value = config.param
# Use the config value in your asset's logic
# ...
# Define the run config data as a dictionary
run_config_data = {
"ops": {
"my_asset": {
"config": {
"param": "value"
}
}
}
}
# Execute the asset with the run config data
result = materialize(
[my_asset],
run_config=run_config_data
)
In this example, the run_config_data
dictionary is structured to match the schema defined by MyAssetConfig
. When you call materialize
, Dagster takes the run_config_data
dictionary, validates it against the MyAssetConfig
schema, and creates an instance of MyAssetConfig
with the provided values. This instance is then passed to the my_asset
function as the config
parameter.
It's important to note that while Config
objects provide a Pythonic way to define and interact with configuration schemas in code, RunConfigData
is a more general term that refers to the raw configuraScout
03/11/2024, 8:28 PMBinoy Shah
03/11/2024, 8:30 PMScout
03/11/2024, 8:30 PMConfig
data object does not inherently preserve state between different or consecutive asset materializations in Dagster. Each time an asset is materialized, the configuration for that run is provided explicitly. The configuration data is used for that specific run and does not persist to subsequent runs unless it is provided again. When you materialize an asset, you typically provide the configuration for that particular run. This configuration is used to set up the environment, parameters, and any other necessary settings for the asset computation. Once the run is complete, the configuration used for that run is not automatically used for future runs. If you want to use the same configuration for subsequent runs, you would need to provide it each time you materialize the asset. If you need to reuse configuration across runs, you have a few options: 1. Default Config Values: Define default values in your asset's config schema. This way, if you don't provide specific values when materializing the asset, the defaults will be used. 2. Config Files: Store your configuration in a YAML or JSON file and load it whenever you materialize your assets. This allows you to maintain a consistent configuration across runs without having to redefine it each time. 3. Config Management Tools: Use configuration management tools or environment variables to manage and inject configuration for your assets. This can help you maintain consistency across different environments (e.g., development, staging, production). 4. Dagster Instance: Use the Dagster instance to store and retrieve run configurations. For example, you can query the instance for the last run configuration of a particular asset and use that configuration for a new run. Remember that the configuration is specific to each run and is not meant to maintain state across runs. It is up to you to manage and provide the configuration for each asset materialization according to your needs.