https://dagster.io/ logo
#ask-ai
Title
# ask-ai
g

Giacomo Colombo

03/11/2024, 5:13 PM
How to run a dagster job defined with "define_asset_job" explicitly with execute_job method?
s

Scout

03/11/2024, 5:13 PM
To run a Dagster job that has been defined with
define_asset_job
using the
execute_job
method, you would typically follow these steps: 1. Define your assets using the
@asset
decorator. 2. Create a job using
define_asset_job
, specifying the assets that the job should include. 3. Use the
execute_job
method to run the job, providing any necessary configuration and specifying the instance of
DagsterInstance
to use. Here's a step-by-step example based on the documentation:
Copy code
python from dagster import asset, define_asset_job, execute_job, DagsterInstance  # Define your assets @asset def my_asset():     # Your asset computation logic     return "result"  # Create a job with the defined assets my_job = define_asset_job(     name="my_job",     selection=["my_asset"]  # Specify the assets included in the job )  # Execute the job result = execute_job(     my_job,     instance=DagsterInstance.get()  # Use the default Dagster instance or provide your own )  # You can inspect the result object for details about the execution
Please note that the
execute_job
method is used to execute the job in-process, which means that the job will run within the same Python process as the script that calls
execute_job
. This can be useful for testing or running jobs in a simple environment. If you need to run the job with a specific configuration, you can pass a
run_config
dictionary to the
execute_job
method. If your job requires resources, you can also provide them in the
execute_job
call. Keep in mind that the exact usage of
execute_job
may vary depending on the version of Dagster you are using and the specific requirements of your job. Always refer to the latest documentation for the most accurate and up-to-date guidance.
g

Giacomo Colombo

03/12/2024, 10:33 AM
@Scout no it does not work, it want that my_job is defined with a job decorator to get the a job object
s

Scout

03/12/2024, 10:33 AM
I apologize for the confusion. If you have a job defined with
define_asset_job
and you want to execute it programmatically, you should ensure that the job is reconstructable if it needs to cross process boundaries or if you're executing it in a different system. To execute a job defined with
define_asset_job
, you would typically make it reconstructable and then use the
execute_job
function. Here's how you can do it: 1. Define your assets using the
@asset
decorator. 2. Create a job using
define_asset_job
. 3. Make the job reconstructable by wrapping it in a function and using the
reconstructable
decorator. 4. Use the
execute_job
function from the
dagster
API to run the job. Here's an example based on the documentation:
Copy code
python from dagster import asset, define_asset_job, reconstructable, execute_pipeline from dagster.core.execution import execute_pipeline_iterator from dagster.core.instance import DagsterInstance  # Define your assets @asset def my_asset():     # Your asset computation logic     return "result"  # Define a function that returns your job def define_my_job():     return define_asset_job(         name="my_job",         selection=["my_asset"]  # Specify the assets included in the job     )  # Make the job reconstructable reconstructable_job = reconstructable(define_my_job)  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_pipeline(     reconstructable_job,     instance=instance )  # You can inspect the result object for details about the execution
Please note that
execute_pipeline
is used here instead of
execute_job
, as
execute_job
expects a
JobDefinition
object, which is typically obtained from a job decorated with
@job
. Since
define_asset_job
does not return a
JobDefinition
directly, you need to use
execute_pipeline
with a reconstructable job. Remember to check the documentation for the version of Dagster you are using, as APIs and best practices may evolve over time.
g

Giacomo Colombo

03/12/2024, 11:03 AM
@Scout no, it cannot work! Unresolved reference 'execute_pipeline_iterator' and 'execute_pipeline' I cannot import them
s

Scout

03/12/2024, 11:03 AM
I apologize for the confusion earlier. It seems there was a mistake in the provided information. The
execute_pipeline
and
execute_pipeline_iterator
functions are not the correct functions to use for executing jobs in Dagster. To execute a job defined with
define_asset_job
, you should use the
execute_job
function from the
dagster
API. However, as you mentioned,
execute_job
expects a
JobDefinition
object, which is typically obtained from a job decorated with
@job
. Since
define_asset_job
returns a
JobDefinition
, you should be able to execute it directly with
execute_job
. Here's how you can do it:
Copy code
python from dagster import asset, define_asset_job, execute_job from dagster.core.instance import DagsterInstance  # Define your assets @asset def my_asset():     # Your asset computation logic     return "result"  # Create a job with the defined assets my_job = define_asset_job(     name="my_job",     selection=["my_asset"]  # Specify the assets included in the job )  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=my_job,     instance=instance )  # You can inspect the result object for details about the execution
Please ensure that you have the correct imports and that you are using the appropriate version of Dagster that supports these functions. If you are still encountering issues, it may be helpful to consult the latest Dagster documentation or seek support from the Dagster community for the version-specific guidance.
g

Giacomo Colombo

03/12/2024, 11:12 AM
@Scout no it cannot work! the problem is exactly that the job object result in a UnresolvedAssetJobDefinition type, instead of a real dagster job! The error in fact is : Loadable attributes must be either a JobDefinition, GraphDefinition, or RepositoryDefinition. Got UnresolvedAssetJobDefinition I tried to add the job decorator to "my_job" but it is wrong! I get the error: @job 'def ine_my_job' returned problematic value of type <class 'dagster._core.definitions.unresolved_asset_job_definition.UnresolvedAssetJobDefinition'>. Expected return value from invoked node or dict mapping output name to return values from invoked nodes
s

Scout

03/12/2024, 11:12 AM
I apologize for the confusion. The error message indicates that you have an
UnresolvedAssetJobDefinition
, which needs to be resolved to a
JobDefinition
before it can be executed. This typically happens because the asset job is not yet bound to the assets it targets within a repository context. To resolve an
UnresolvedAssetJobDefinition
, you need to have it within a repository. The repository provides the necessary context to resolve the job definition. Here's how you can define a repository and retrieve a resolved job definition from it:
Copy code
python from dagster import asset, define_asset_job, repository, execute_job from dagster.core.instance import DagsterInstance  # Define your assets @asset def my_asset():     # Your asset computation logic     return "result"  # Create an unresolved asset job unresolved_asset_job = define_asset_job(     name="my_job",     selection=["my_asset"]  # Specify the assets included in the job )  # Define a repository that includes the job @repository def my_repo():     return [unresolved_asset_job, my_asset]  # Retrieve the resolved job definition from the repository resolved_job = my_repo.get_job("my_job")  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=resolved_job,     instance=instance )  # You can inspect the result object for details about the execution
In this example, we define a repository that includes both the unresolved asset job and the asset itself. We then use the
get_job
method on the repository to retrieve the resolved job definition, which can be executed with
execute_job
. Please ensure that you are using the correct version of Dagster and that the
get_job
method is available in your version. If you continue to encounter issues, it may be helpful to consult the latest Dagster documentation or seek support from the Dagster community.
g

Giacomo Colombo

03/12/2024, 11:31 AM
@Scout ok not it works, the problem is that one of the assets need a resources that I usually pass in the Definitions, but in this case how to pass the resource? I cannot find a valid attribute related to resources to indicate the resource for the repository decorator either for the "define_asset_job" or the execute_job. i cannot pass the resource to the repository since it does not want object of type dagster._core.definitions.resource_definition.ResourceDefinition in the repository list. My code is :
Copy code
# Define a function that returns your job
unresolved_asset_job = define_asset_job(
        name="job_sample_2",
        selection=[
            "my_asset_1",
            "my_asset_2"
        ]
    )

# Define a repository that includes the job
@repository
def my_repo():
    return [unresolved_asset_job, my_asset_1, my_asset_2]

# Retrieve the resolved job definition from the repository
resolved_job = my_repo.get_job("job_sample_2")

reconstructable_job = reconstructable(resolved_job)

# Execute the job
instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own
result = execute_job(
    job=reconstructable_job,
    instance=instance,
)
Namely, rhe error is : esource with key 'custom_resource_1' required by op 'my_asset_1' was not provided. Please provide a ResourceDefinition to key 'custom_dataset_destination_resource', or change the required key to one of the following keys which points to an ResourceDefinition: ['io_manager']
s

Scout

03/12/2024, 11:31 AM
To provide resources to your assets within a job, you need to define the resources and then pass them to the job using the
resource_defs
argument. However, since you're using
define_asset_job
, which doesn't directly accept
resource_defs
, you'll need to use a different approach. One way to do this is to define a new job that wraps your assets and includes the necessary resources. Here's how you can do it:
Copy code
python from dagster import asset, define_asset_job, repository, job, execute_job from dagster.core.instance import DagsterInstance  # Define your assets @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"  @asset def my_asset_2():     # Your asset computation logic     return "result2"  # Define your custom resource @resource def custom_resource_1(_):     # Your resource creation logic     return "my_custom_resource"  # Create an unresolved asset job unresolved_asset_job = define_asset_job(     name="job_sample_2",     selection=["my_asset_1", "my_asset_2"] )  # Define a new job that wraps the assets and includes the necessary resources @job(resource_defs={"custom_resource_1": custom_resource_1}) def my_wrapped_job():     my_asset_1()     my_asset_2()  # Define a repository that includes the job @repository def my_repo():     return [my_wrapped_job, my_asset_1, my_asset_2]  # Retrieve the resolved job definition from the repository resolved_job = my_repo.get_job("my_wrapped_job")  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=resolved_job,     instance=instance, )  # You can inspect the result object for details about the execution
In this example, we define a
custom_resource_1
resource and then create a new job
my_wrapped_job
that wraps the assets and specifies the
resource_defs
. We then include this wrapped job in the repository and retrieve it for execution. Please note that the
reconstructable
function is not necessary in this context, as we are directly executing the job within the same Python process. The
reconstructable
function is used when you need to reconstruct the job across process boundaries, such as when using the Dagster CLI or when executing in different systems. Make sure to replace the placeholder resource creation logic with your actual resource definition. This approach allows you to provide the necessary resources to your assets when executing the job.
g

Giacomo Colombo

03/12/2024, 11:42 AM
@Scout no!!! the error is still : resource with key 'custom_resource_1' required by op 'my_asset_1' was not provided. Please provide a ResourceDefinition to key 'custom_dataset_destination_resource', or change the required key to one of the following keys which points to an ResourceDefinition: ['io_manager']
s

Scout

03/12/2024, 11:42 AM
I apologize for the confusion. Let's address the issue with the correct approach. If you have an asset that requires a resource, you should define that resource and then ensure it's provided to the asset when the job is executed. Here's how you can define the resource and ensure it's available to the asset within a job:
Copy code
python from dagster import asset, ResourceDefinition, repository, define_asset_job, build_assets_job, execute_job from dagster.core.instance import DagsterInstance  # Define your custom resource @resource def custom_resource_1(_):     # Your resource creation logic     return "my_custom_resource"  # Define your assets @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"  @asset def my_asset_2():     # Your asset computation logic     return "result2"  # Define a job that includes your assets and the required resources my_job = build_assets_job(     "my_job_with_resources",     assets=[my_asset_1, my_asset_2],     resource_defs={"custom_resource_1": custom_resource_1} )  # Define a repository that includes the job @repository def my_repo():     return [my_job]  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=my_job,     instance=instance, )  # You can inspect the result object for details about the execution
In this example, we define a
custom_resource_1
resource and then use
build_assets_job
to create a job that includes both the assets and the resource definitions. The
build_assets_job
function is a way to construct a job from a list of assets while also specifying resource definitions. When you execute the job, the
custom_resource_1
resource will be available to the
my_asset_1
asset, which requires it. This should resolve the error you're encountering. Please ensure that the
build_assets_job
function is available in your version of Dagster, as APIs can change over time. If you continue to encounter issues, it may be helpful to consult the latest Dagster documentation or seek support from the Dagster community.
g

Giacomo Colombo

03/12/2024, 11:51 AM
@Scout Nope! cannot import build_assets_job from dagster!
s

Scout

03/12/2024, 11:51 AM
I apologize for the confusion. If
build_assets_job
is not available in your version of Dagster, let's try a different approach using the
@job
decorator to define a job that includes your assets and the necessary resources. Here's how you can define a job with resources:
Copy code
python from dagster import asset, job, op, ResourceDefinition, repository, execute_job from dagster.core.instance import DagsterInstance  # Define your custom resource @resource def custom_resource_1(_):     # Your resource creation logic     return "my_custom_resource"  # Define your assets @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"  @asset def my_asset_2():     # Your asset computation logic     return "result2"  # Define a job that includes your assets and the required resources @job(resource_defs={"custom_resource_1": custom_resource_1}) def my_job_with_resources():     my_asset_1()     my_asset_2()  # Define a repository that includes the job @repository def my_repo():     return [my_job_with_resources]  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=my_job_with_resources,     instance=instance, )  # You can inspect the result object for details about the execution
In this example, we use the
@job
decorator to define a job that includes the assets and specifies the
resource_defs
. This job is then included in the repository and executed with
execute_job
. Please ensure that you are using the correct version of Dagster and that the
@job
decorator is available in your version. If you continue to encounter issues, it may be helpful to consult the latest Dagster documentation or seek support from the Dagster community.
g

Giacomo Colombo

03/12/2024, 11:55 AM
@Scout what if i need to add more resources in my job in the resource_defs instead of only?
s

Scout

03/12/2024, 11:55 AM
If you need to add multiple resources to your job, you can include them all in the
resource_defs
dictionary when defining your job with the
@job
decorator. Each resource is specified by a key that corresponds to the
required_resource_keys
of your assets. Here's an example of how to define a job with multiple resources:
Copy code
python from dagster import asset, job, op, ResourceDefinition, repository, execute_job from dagster.core.instance import DagsterInstance  # Define your custom resources @resource def custom_resource_1(_):     # Your resource creation logic for custom_resource_1     return "my_custom_resource_1"  @resource def custom_resource_2(_):     # Your resource creation logic for custom_resource_2     return "my_custom_resource_2"  # Define your assets @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"  @asset(required_resource_keys={"custom_resource_2"}) def my_asset_2(context):     # Your asset computation logic that uses context.resources.custom_resource_2     return "result2"  # Define a job that includes your assets and the required resources @job(resource_defs={     "custom_resource_1": custom_resource_1,     "custom_resource_2": custom_resource_2 }) def my_job_with_resources():     my_asset_1()     my_asset_2()  # Define a repository that includes the job @repository def my_repo():     return [my_job_with_resources]  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=my_job_with_resources,     instance=instance, )  # You can inspect the result object for details about the execution
In this example,
my_asset_1
requires
custom_resource_1
, and
my_asset_2
requires
custom_resource_2
. Both resources are included in the
resource_defs
when defining
my_job_with_resources
. This way, when the job is executed, both assets have access to their required resources. Remember to replace the placeholder resource creation logic with your actual resource definitions. This approach allows you to provide the necessary resources to your assets when executing the job. If you have additional resources that are not directly required by the assets but are needed for the job, you can also include them in the
resource_defs
dictionary.
g

Giacomo Colombo

03/12/2024, 12:08 PM
@Scout now I get this error: Error 1: Missing required config entry "ops" at the root. Sample config for missing entry: {'ops': {'my_asset_2': {'inputs': {'my_asset_1': '<selector>'}}}}
s

Scout

03/12/2024, 12:08 PM
The error message you're encountering suggests that there is a configuration issue related to the dependencies between your assets. In Dagster, when you define assets that depend on each other, you need to specify how the outputs of one asset are passed as inputs to another. However, with software-defined assets, Dagster should handle these dependencies automatically based on the asset dependencies you define in your code. If you're seeing this error, it could be due to a few reasons: 1. Asset Dependencies: Ensure that your assets are correctly defined with dependencies if one asset depends on the output of another. Dagster should infer these dependencies automatically when using the
@asset
decorator. 2. Job Configuration: When you execute a job, Dagster expects a certain structure in the run configuration. If you're manually providing a run configuration, ensure it's correctly structured. However, if your assets are defined correctly, you typically don't need to provide this part of the configuration manually. 3. Dagster Version: There might be a version-specific issue or a bug. Ensure you're using a version of Dagster where the behavior you expect is supported. Here's a simplified example of how you might define dependent assets:
Copy code
python from dagster import asset, repository, job, execute_job from dagster.core.instance import DagsterInstance  # Define your assets @asset def my_asset_1():     # Your asset computation logic     return "result1"  @asset def my_asset_2(my_asset_1):     # Your asset computation logic that depends on my_asset_1     return f"result2 with input {my_asset_1}"  # Define a job that includes your assets @job def my_job():     my_asset_2(my_asset_1())  # Define a repository that includes the job @repository def my_repo():     return [my_job]  # Execute the job instance = DagsterInstance.get()  # Use the default Dagster instance or provide your own result = execute_job(     job=my_job,     instance=instance, )  # You can inspect the result object for details about the execution
In this example,
my_asset_2
depends on
my_asset_1
, and this dependency is expressed by passing
my_asset_1
as an input to
my_asset_2
within the job definition. Dagster should automatically handle the passing of the output from
my_asset_1
to
my_asset_2
. If you continue to encounter issues, please ensure that your assets are defined correctly with the necessary dependencies, and that you're using a compatible version of Dagster. If the problem persists, it may be helpful to consult the latest Dagster documentation or seek support from the Dagster community.
g

Giacomo Colombo

03/12/2024, 1:00 PM
@Scout I am noticing that I cannot see the dagit UI when using the execute_job ! the localhost webserver does not work. How to execute the job from the command line instead of via python code? Maybe in that case is possible to see the UI and then launch the command to execute job from another terminal?
s

Scout

03/12/2024, 1:00 PM
To execute a Dagster job from the command line and view the results in the Dagit UI, you can use the Dagster CLI. Here's how you can do it: 1. Start Dagit: First, you need to start the Dagit webserver. Navigate to the directory containing your Dagster repository and run the following command:
Copy code
bash dagster dev
This will start the Dagit webserver, and you should be able to access the Dagit UI by going to
<http://localhost:3000>
in your web browser. 2. Execute the Job: To execute a job from the command line, you can use the
dagster job execute
command. You'll need to specify the file that contains your job definition and the repository. For example, if your job is defined in a file named
my_repo.py
, you would run:
Copy code
bash dagster job execute -f my_repo.py -j my_job
Here,
-f
specifies the file containing the repository, and
-j
specifies the name of the job you want to execute. Make sure that the
DAGSTER_HOME
environment variable is set to a directory where Dagster can write instance data. This is where run logs, event logs, and other metadata are stored. If
DAGSTER_HOME
is not set, Dagster uses the default location, which is typically
~/.dagster
. By executing the job via the CLI, the run will be recorded, and you should be able to see it in the Dagit UI under the "Runs" tab. If you encounter any issues with the CLI commands, please ensure that you have the latest version of Dagster installed and that your repository and job definitions are correctly set up. If the problem persists, you may want to consult the Dagster documentation or seek support from the Dagster community for further assistance.
g

Giacomo Colombo

03/12/2024, 1:11 PM
@Scout When I have resources involved, I still cannot run the job with "execute job" ! Here is my code. Where is the problem??
Copy code
@job(resource_defs={
    "custom_resource_1": custom_resource_1,
    "custom_resource_2": custom_resource_2,
    "custom_resource_3": custom_resource_3}
)
def unresolved_asset_job():
    my_asset_2(my_asset_1())


@repository
def my_repo():
    return [unresolved_asset_job, my_asset_1, my_asset_2]

resolved_job = my_repo.get_job("unresolved_asset_job")
reconstructable_job = reconstructable(resolved_job)

instance = DagsterInstance.get() 
result = execute_job(
    job=reconstructable_job,
    instance=instance,
)
The error is : UserWarning: Error loading repository location italydagster. core.errors.DagsterInvalidDefinitionError resource with key 'custom_resource_1' required by op 'my_asset_1' was not provided. Please provide a ResourceDefinition to key 'custom_dataset_destination_resource', or change the required key to one of the following keys which points to an ResourceDefinition: ['io_manager']
s

Scout

03/12/2024, 1:11 PM
The error message you're seeing indicates that the resource with key
'custom_resource_1'
required by the asset
'my_asset_1'
was not provided when attempting to execute the job. This suggests that there might be a mismatch between the resource keys expected by your assets and the resource definitions provided in the
resource_defs
of your job. Here are a few things to check and correct in your code: 1. Resource Definitions: Ensure that the resource definitions for
custom_resource_1
,
custom_resource_2
, and
custom_resource_3
are correctly defined and imported into the file where you're defining the job. 2. Required Resource Keys: Verify that the
required_resource_keys
for each asset match the keys used in the
resource_defs
of the job. For example, if
my_asset_1
requires
custom_resource_1
, the
required_resource_keys
should be set accordingly:
Copy code
python @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"
3. Job Definition: When defining the job, make sure that all required resources are included in the `resource_defs`:
Copy code
python @job(resource_defs={     "custom_resource_1": custom_resource_1,     "custom_resource_2": custom_resource_2,     "custom_resource_3": custom_resource_3 }) def my_job():     my_asset_2(my_asset_1())
4. Repository Definition: In your repository definition, you only need to include the job, not the individual assets, since the job already encapsulates the assets:
Copy code
python @repository def my_repo():     return [my_job]
5. Execution: When executing the job, you don't need to make it reconstructable if you're executing it in the same process. You can directly execute the resolved job:
Copy code
python instance = DagsterInstance.get()  result = execute_job(     job=resolved_job,     instance=instance, )
Make sure that the
execute_job
function is being used correctly according to your version of Dagster. If you're using an older version of Dagster, the API might be different, and you may need to use
resolved_job.execute_in_process()
instead. If you've checked all the above and the error persists, please ensure that your Dagster environment is correctly set up, including the
DAGSTER_HOME
environment variable. If the problem still occurs, you may want to consult the Dagster documentation or seek support from the Dagster community for further assistance.
g

Giacomo Colombo

03/12/2024, 1:30 PM
@Scout i still get an error! dagster._core.errors.DagsterInvalidConfigError: Error in config for job unresolved_asset_job Error 1: Missing required config entry "resources" at the root. Sample config for missing entry: {'resources': {'custom_resource_1': {'config': {'one_dataset_destination': '...'}}, 'custom_resource_2': {'config': {'one_dataset_type': '...'}}, 'custom_resource_3': {'config': {'curr_year': 0}}}}
s

Scout

03/12/2024, 1:30 PM
The error message you're encountering indicates that the job
unresolved_asset_job
is missing the required configuration for the resources at the root level. This means that when you're executing the job, Dagster expects configuration for the resources
custom_resource_1
,
custom_resource_2
, and
custom_resource_3
, but it's not being provided. When you define resources that require configuration, you need to supply that configuration when you execute the job. This can be done in a few different ways, depending on how you want to execute the job. If you want to execute the job from the command line and view the results in the Dagit UI, you can do the following: 1. Start Dagit: Run the Dagit webserver by navigating to the directory containing your Dagster repository and running the command
dagit
. This will allow you to access the Dagit UI at
<http://localhost:3000>
. 2. Execute the Job with Configuration: Use the Dagster CLI to execute the job with the necessary configuration. You can provide the configuration as a YAML file or as JSON. For example, if you have a YAML file named
config.yaml
with the following content:
Copy code
yaml resources:   custom_resource_1:     config:       one_dataset_destination: "value_for_dataset_destination"   custom_resource_2:     config:       one_dataset_type: "value_for_dataset_type"   custom_resource_3:     config:       curr_year: 2023
You can execute the job using the following command:
Copy code
bash dagster job execute -f my_repo.py -j unresolved_asset_job -c config.yaml
This command specifies the file containing the repository (
-f my_repo.py
), the name of the job (
-j unresolved_asset_job
), and the configuration file (
-c config.yaml
). Make sure that the configuration keys and values in
config.yaml
match the expected configuration schema for your resources. The placeholders
value_for_dataset_destination
,
value_for_dataset_type
, and
2023
should be replaced with the actual configuration values required by your resources. By executing the job this way, the run will be recorded, and you should be able to see it in the Dagit UI under the "Runs" tab. If you continue to encounter issues, please ensure that your resources are correctly defined with the necessary configuration schema, and that you're using a compatible version of Dagster. If the problem persists, you may want to consult the Dagster documentation or seek support from the Dagster community for further assistance.
g

Giacomo Colombo

03/12/2024, 1:42 PM
@Scout no i dont like to use to yaml file if not strictly necessary. I dont want to execute the job from the command line. The problem is that normally (withouting forcing a job execution) in my dagster Definitions I pass the resources in this way, in the
Copy code
whole_defs = Definitions(
    assets=all_assets,
    jobs=[....],
    resources = {
            "custom_resource_1": {"curr_year": 2023},  
            "custom_resource_2": {"one_dataset_type": 'businesses'},  
            "custom_resource_3": {"one_dataset_destination": 'managed'}  
     }
how to replicate / copy this behaviour to make the resource be used correclty? Remember that the error is : dagster._core.errors.DagsterInvalidConfigError: Error in config for job unresolved_asset_job Error 1: Missing required config entry "resources" at the root. Sample config for missing entry: {'resources': {'custom_resource_1': {'config': {'one_dataset_destination': '...'}}, 'custom_resource_2': {'config': {'one_dataset_type': '...'}}, 'custom_resource_3': {'config': {'curr_year': 0}}}}
s

Scout

03/12/2024, 1:42 PM
If you want to execute a job programmatically without using a YAML file for configuration and you want to replicate the behavior of the
Definitions
object where you specify resources, you can use the
execute_in_process
method on the job definition. This method allows you to pass the configuration directly as a Python dictionary. Here's how you can do it:
Copy code
python from dagster import execute_in_process, DagsterInstance  # Assuming you have a job definition like this: @job(resource_defs={     "custom_resource_1": custom_resource_1,     "custom_resource_2": custom_resource_2,     "custom_resource_3": custom_resource_3 }) def unresolved_asset_job():     my_asset_2(my_asset_1())  # Define the configuration for your resources as a Python dictionary run_config = {     "resources": {         "custom_resource_1": {             "config": {                 "curr_year": 2023             }         },         "custom_resource_2": {             "config": {                 "one_dataset_type": 'businesses'             }         },         "custom_resource_3": {             "config": {                 "one_dataset_destination": 'managed'             }         }     } }  # Execute the job in process with the provided configuration result = unresolved_asset_job.execute_in_process(     run_config=run_config,     instance=DagsterInstance.get(), )  # You can inspect the result object for details about the execution
In this example,
execute_in_process
is called on the
unresolved_asset_job
job definition, and the
run_config
dictionary is passed to provide the necessary configuration for the resources. The
DagsterInstance.get()
method is used to get the default Dagster instance, but you can also provide a specific instance if needed. This approach allows you to execute the job within the same Python process and pass the configuration directly without the need for a YAML file. The execution result is captured in the
result
object, which you can inspect to see the details of the execution. Please ensure that the
custom_resource_1
,
custom_resource_2
, and
custom_resource_3
resource definitions are correctly defined and imported in your script. Also, make sure that the configuration keys and values match the expected configuration schema for your resources.
g

Giacomo Colombo

03/12/2024, 1:51 PM
@Scout OK!!! now it works!!! i love you!! I usually dont like to use the config.yaml in a huge project. Like in this case better to define a specific run_config dict in the py file. Am I right to follow this idea to avoid to use the yaml? When in general is discouraged to use a yaml
s

Scout

03/12/2024, 1:51 PM
I'm glad to hear that it's working for you now! Regarding the use of a
run_config
dictionary in a Python file versus a YAML configuration file, it really depends on your specific use case and preferences. Both methods are valid, and each has its own advantages. Python
run_config
dictionary:
- Version Control: Easier to track changes in version control systems like Git. - Dynamic Configuration: Allows for dynamic generation of configuration, which can be useful if your configuration needs to change based on certain conditions at runtime. - Code Reuse: Enables the reuse of Python variables and functions within your configuration, which can reduce duplication. - Error Checking: Benefits from Python's syntax checking and can be checked by linters or IDEs for potential errors before runtime. YAML configuration file: - Separation of Concerns: Keeps configuration separate from code, which can make the codebase cleaner and more modular. - Non-technical Users: Can be more accessible for non-developers or less technical team members who may need to modify the configuration. - Standardization: YAML is a widely used standard for configuration files and can be easier to understand for those familiar with it. - Tooling: Some tools and systems expect configuration to be provided in YAML or another declarative format. It's generally discouraged to use a YAML file when: - The configuration needs to be generated dynamically based on complex logic that is cumbersome to express in YAML. - The configuration is closely tied to the code and changes frequently with the code. - You want to leverage Python's error checking and linting capabilities for your configuration. - You prefer to keep everything in Python for consistency and ease of use. Ultimately, the choice between a Python
run_config
dictionary and a YAML file should be based on what works best for your team, your project's requirements, and your deployment practices. It's also possible to use a combination of both, where some base configuration is defined in YAML and overridden or extended by a Python
run_config
dictionary when needed.
g

Giacomo Colombo

03/12/2024, 4:51 PM
@Scout can I put all the code above in another module in a main?
s

Scout

03/12/2024, 4:51 PM
Yes, you can organize your Dagster code by placing the job and asset definitions in one module and then invoking them from a
main
function in another module. This is a common practice to separate the definition of your data pipelines from the execution logic. Here's an example of how you might structure your code: assets.py (or whatever you name your module with asset definitions)
Copy code
python from dagster import asset, ResourceDefinition  # Define your custom resources @resource def custom_resource_1(_):     # Your resource creation logic for custom_resource_1     return "my_custom_resource_1"  # Define your assets @asset(required_resource_keys={"custom_resource_1"}) def my_asset_1(context):     # Your asset computation logic that uses context.resources.custom_resource_1     return "result1"  # ... other assets and resources ...
jobs.py (or whatever you name your module with job definitions)
Copy code
python from dagster import job from .assets import my_asset_1, custom_resource_1  # Define a job that includes your assets and the required resources @job(resource_defs={"custom_resource_1": custom_resource_1}) def my_job_with_resources():     my_asset_1()
main.py (the entry point for executing your job)
Copy code
python from dagster import execute_in_process, DagsterInstance from .jobs import my_job_with_resources  def main():     # Define the configuration for your resources as a Python dictionary     run_config = {         "resources": {             "custom_resource_1": {                 "config": {                     "curr_year": 2023                 }             },             # ... other resources ...         }     }      # Execute the job in process with the provided configuration     result = my_job_with_resources.execute_in_process(         run_config=run_config,         instance=DagsterInstance.get(),     )      # You can inspect the result object for details about the execution     print(result)  if __name__ == "__main__":     main()
In this setup, you have separated the definitions of your assets and jobs from the execution logic. The
main.py
file serves as the entry point to run your job, and it can be executed directly with a Python interpreter. Make sure to adjust the imports based on your actual module and file names. If you're using a package structure, you might need to adjust the import statements to reflect the correct paths to your modules. This structure helps keep your code organized and makes it easier to maintain and understand. It also allows you to potentially reuse the
assets.py
and
jobs.py
modules in other contexts or entry points if needed.
g

Giacomo Colombo

03/12/2024, 4:56 PM
@Scout ok but how to run it now? like a normal python file ? i dont have to launch "dagster dev" right?
s

Scout

03/12/2024, 4:56 PM
Correct, you do not need to launch "dagster dev" to run the job programmatically from a Python script. You can execute your
main.py
script just like any normal Python file. Here's how you can do it: 1. Open a terminal or command prompt. 2. Navigate to the directory containing your
main.py
file. 3. Run the script using the Python interpreter:
Copy code
bash python main.py
This command will execute the
main
function in your
main.py
file, which in turn will execute the job with the provided configuration. Remember to ensure that your Python environment has all the necessary dependencies installed, including Dagster and any other libraries your assets and resources require. By running the Python script directly, you're invoking the Dagster job execution within the same process as your script, and you won't have the Dagit UI available for this execution. However, this approach is useful for testing, automation, and when running jobs in production environments where you may not need the Dagit UI.
5 Views