As a result of upgrading an instance of Dagster from 11 gt 1 dagster #ask-community

As a result of upgrading an instance of Dagster fr...

Andy H

04/01/2022, 2:33 PM

As a result of upgrading an instance of Dagster from 11 -> 13, we are no longer able to kick off a run using

dagster.cli.pipeline.execute_launch_command

. We are moving the codebase to execute via

JobDefinition.execute_in_process

. One blocker we are seeing is that we cannot pass a

run_id

into

execute_in_process

. Is there some other way for us to configure the

run_id

, or an equivalent to

execute_launch_command

that we should now be using to kick off a job? I notice that the run_id is grabbed from the return value of

create_run_for_pipeline

, in

/dagster/core/execution/execute_in_process.py:50

. Thanks in advance!

johann

04/04/2022, 5:41 PM

cc @chris

chris

04/04/2022, 6:12 PM

Hey Andy. What problems are you having with kicking off a run using

execute_launch_command

? I don't believe getting rid of that CLI was intentional

chris

04/04/2022, 6:13 PM

If you are migrating to the graph-job-op APIs, then there is the

dagster job

cli, which should have all the same functionality, but arguments are targeted in terms of the new APIs: https://docs.dagster.io/_apidocs/cli#dagster-job

Andy H

04/04/2022, 6:31 PM

@chris Thanks for the response! We are currently using

execute_launch_command

from a python script entrypoint. It is essentially a rest endpoint which will kick off a dagster run. This endpoint receives a run-id which is generated upstream by another service to group together all assets before and including the job run. Is there a programmatic (from the python side of things) way using the new API to launch a job with run_id?

Andy H

04/04/2022, 6:32 PM

@chris To further explain, we may still be able to use

execute_launch_command

but I think there are some problems surrounding usage of mode definitions, which we've converted to mode-specific jobs in our repo.

chris

04/04/2022, 6:33 PM

Ah I see, you’re using the actual python function execute_launch_command

Andy H

04/04/2022, 6:33 PM

Yea, that's correct.

chris

04/04/2022, 6:41 PM

You should still be able to use that function, underlying a job is a mode called

default

Andy H

04/04/2022, 6:42 PM

So is there no call to deprecate

dagster.cli.pipeline.*

? I perhaps mistakenly assumed we should move away from

"pipeline"

based stuff and embrace jobs/graph/`JobDefinition`

chris

04/04/2022, 6:42 PM

Since there's only the one mode though, you probably don't even need to specify that.. is there a particular error you are seeing

Andy H

04/04/2022, 6:43 PM

I believe there was. Let me confer with my lead engineer and I'll update.

chris

04/04/2022, 6:43 PM

We do encourage moving away from

pipeline

based stuff

👍 1

chris

04/04/2022, 6:43 PM

execute_launch_command

isn't pipeline specific, that's just an unfortunately un-updated module path

Andy H

04/04/2022, 6:46 PM

I think it's a question of the config we pass to it. In our current release, we specify a pipeline and a mode. I was having some trouble determining how to specify a job from the repo instead of a pipline, but perhaps it's as simple as

pipeline -> job

in the config arument.

Copy code

# build run_config dictionary
    run_config = json.dumps({
        'loggers': {
            'console': {},
            'webservice': {
                'config': {
                    'endpoint': settings.LOGGING_ENDPOINT,
                },
            },
        },
    })
    # launch pipeline with run configuration
    result = execute_launch_command(instance, {
        'mode': settings.PIPELINE_MODE,
        'repository': 'main_repository',
        'run_id': queue_id,
        'location': (
            'jobs.main_job'
        ),
        'pipeline': 'main_pipeline',
        'config_json': run_config,
    })

chris

04/04/2022, 6:48 PM

kinda confusing, but you can leave the key as

pipeline

and change the value to the job's name

🎉 1

Andy H

04/04/2022, 6:49 PM

Ok, great. I'm waiting to hear back about specific errors surrounding this issue so will update if that information comes along.

chris

04/04/2022, 6:50 PM

Aside from being able to provide a run_id value, is there any reason you're using

execute_launch_command

? A lot of the pain here is a result of this being an internal API, I don't think it's exposed in our API docs anywhere

Andy H

04/04/2022, 6:54 PM

It has been some time since we moved to launching like this, but if memory serves correctly, the

run_id

injection was part of it, and the other part seems to have been that there was no good way for us to kick off a run programmatically in the background/asynchronously.

Andy H

04/04/2022, 6:55 PM

I may be able to re-label assets along the way with the outer scope run-id, and would definitely be open to moving toward a more standard approach to programmatically kicking off a job.

chris

04/04/2022, 6:58 PM

If you're using a non-default run launcher, then I'd recommend calling the actual CLI from a python script. Since

execute_launch_command

isn't an external facing API, there's no guarantees of support for it

chris

04/04/2022, 6:59 PM

If you aren't using any special run launching / non default executor, then

execute_in_process

should be fine for your use case. I've created https://github.com/dagster-io/dagster/issues/7284, I think this should be pretty easily resolvable

🎉 1

👍 1

Andy H

04/04/2022, 7:00 PM

Absolutely agreed and understood. It has been a pain point since our first delivery of this project-- I'll see what it would take to bring the project in-line with existing standards. Is it possible that custom run launchers, or more programmatic approach to launching runs would be a feature in the future?

Andy H

04/04/2022, 7:00 PM

That would be awesome @chris. I'd love to stop leveraging private api internals.

chris

04/04/2022, 7:02 PM

We currently do support custom run launchers, but only via CLI, dagit, and graphql. A first-class python entrypoint for launching off a run in another process is on the radar, but nothing concrete at the moment

Andy H

04/04/2022, 7:03 PM

Very cool. I think being able to provide run-id as input to the

execute_in_process

method is a huge help.

Andy H

04/04/2022, 7:03 PM

Thanks much for your attention on this!

chris

04/04/2022, 7:05 PM

no worries! Hopefully we can get this out for the upcoming release.

Andy H

04/04/2022, 7:05 PM

That would rock. We are currently moving from 11 -> 13 -> 14, so maybe we can just step from 13 to latest when this hits.

chris

04/04/2022, 7:06 PM

yea 13 -> 14 shouldnt be too much additional pain I think

Andy H

04/04/2022, 7:07 PM

It seems pretty reasonable so far. The 11 -> 13 branch has been a challenge.

chris

04/06/2022, 7:34 PM

@Andy H just to follow up,

run_id

on `execute_in_process`made it into the coming release for tomorrow.

big dag eyes 1

Andy H

04/06/2022, 7:35 PM

@chris I will follow up with my team, thanks for adding this, it's a lifesaver!

👍 1

Open in Slack

Previous Next