As a result of upgrading an instance of Dagster fr...
# ask-community
a
As a result of upgrading an instance of Dagster from 11 -> 13, we are no longer able to kick off a run using
dagster.cli.pipeline.execute_launch_command
. We are moving the codebase to execute via
JobDefinition.execute_in_process
. One blocker we are seeing is that we cannot pass a
run_id
into
execute_in_process
. Is there some other way for us to configure the
run_id
, or an equivalent to
execute_launch_command
that we should now be using to kick off a job? I notice that the run_id is grabbed from the return value of
create_run_for_pipeline
, in
/dagster/core/execution/execute_in_process.py:50
. Thanks in advance!
j
cc @chris
c
Hey Andy. What problems are you having with kicking off a run using
execute_launch_command
? I don't believe getting rid of that CLI was intentional
If you are migrating to the graph-job-op APIs, then there is the
dagster job
cli, which should have all the same functionality, but arguments are targeted in terms of the new APIs: https://docs.dagster.io/_apidocs/cli#dagster-job
a
@chris Thanks for the response! We are currently using
execute_launch_command
from a python script entrypoint. It is essentially a rest endpoint which will kick off a dagster run. This endpoint receives a run-id which is generated upstream by another service to group together all assets before and including the job run. Is there a programmatic (from the python side of things) way using the new API to launch a job with run_id?
@chris To further explain, we may still be able to use
execute_launch_command
but I think there are some problems surrounding usage of mode definitions, which we've converted to mode-specific jobs in our repo.
c
Ah I see, youโ€™re using the actual python function execute_launch_command
a
Yea, that's correct.
c
You should still be able to use that function, underlying a job is a mode called
default
.
a
So is there no call to deprecate
dagster.cli.pipeline.*
? I perhaps mistakenly assumed we should move away from
"pipeline"
based stuff and embrace jobs/graph/`JobDefinition`
c
Since there's only the one mode though, you probably don't even need to specify that.. is there a particular error you are seeing
a
I believe there was. Let me confer with my lead engineer and I'll update.
c
We do encourage moving away from
pipeline
based stuff
๐Ÿ‘ 1
execute_launch_command
isn't pipeline specific, that's just an unfortunately un-updated module path
a
I think it's a question of the config we pass to it. In our current release, we specify a pipeline and a mode. I was having some trouble determining how to specify a job from the repo instead of a pipline, but perhaps it's as simple as
pipeline -> job
in the config arument.
Copy code
# build run_config dictionary
    run_config = json.dumps({
        'loggers': {
            'console': {},
            'webservice': {
                'config': {
                    'endpoint': settings.LOGGING_ENDPOINT,
                },
            },
        },
    })
    # launch pipeline with run configuration
    result = execute_launch_command(instance, {
        'mode': settings.PIPELINE_MODE,
        'repository': 'main_repository',
        'run_id': queue_id,
        'location': (
            'jobs.main_job'
        ),
        'pipeline': 'main_pipeline',
        'config_json': run_config,
    })
c
kinda confusing, but you can leave the key as
pipeline
and change the value to the job's name
๐ŸŽ‰ 1
a
Ok, great. I'm waiting to hear back about specific errors surrounding this issue so will update if that information comes along.
c
Aside from being able to provide a run_id value, is there any reason you're using
execute_launch_command
? A lot of the pain here is a result of this being an internal API, I don't think it's exposed in our API docs anywhere
a
It has been some time since we moved to launching like this, but if memory serves correctly, the
run_id
injection was part of it, and the other part seems to have been that there was no good way for us to kick off a run programmatically in the background/asynchronously.
I may be able to re-label assets along the way with the outer scope run-id, and would definitely be open to moving toward a more standard approach to programmatically kicking off a job.
c
If you're using a non-default run launcher, then I'd recommend calling the actual CLI from a python script. Since
execute_launch_command
isn't an external facing API, there's no guarantees of support for it
If you aren't using any special run launching / non default executor, then
execute_in_process
should be fine for your use case. I've created https://github.com/dagster-io/dagster/issues/7284, I think this should be pretty easily resolvable
๐ŸŽ‰ 1
๐Ÿ‘ 1
a
Absolutely agreed and understood. It has been a pain point since our first delivery of this project-- I'll see what it would take to bring the project in-line with existing standards. Is it possible that custom run launchers, or more programmatic approach to launching runs would be a feature in the future?
That would be awesome @chris. I'd love to stop leveraging private api internals.
c
We currently do support custom run launchers, but only via CLI, dagit, and graphql. A first-class python entrypoint for launching off a run in another process is on the radar, but nothing concrete at the moment
a
Very cool. I think being able to provide run-id as input to the
execute_in_process
method is a huge help.
Thanks much for your attention on this!
c
no worries! Hopefully we can get this out for the upcoming release.
a
That would rock. We are currently moving from 11 -> 13 -> 14, so maybe we can just step from 13 to latest when this hits.
c
yea 13 -> 14 shouldnt be too much additional pain I think
a
It seems pretty reasonable so far. The 11 -> 13 branch has been a challenge.
c
@Andy H just to follow up,
run_id
on `execute_in_process`made it into the coming release for tomorrow.
big dag eyes 1
a
@chris I will follow up with my team, thanks for adding this, it's a lifesaver!
๐Ÿ‘ 1