What s the best way of launching a pipeline through code in dagster #announcements

What's the best way of launching a pipeline throug...

Danny

05/21/2020, 7:05 PM

What's the best way of launching a pipeline through code in a nonblocking way? Both

execute_pipeline

and

execute_pipeline_iterator

seem to always block, and I tried copying how the CLI does it (https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/cli/pipeline.py#L405-L413) but both my code and the cli are throwing this:

Copy code

$ DAGSTER_HOME=tmp/dagster dagster pipeline launch my_pipeline -y pipelines/repository.yaml -e pipelines/my_pipeline_config.yaml 
Traceback (most recent call last):
  File "/home/danny/.pyenv/versions/project-3.7.7/bin/dagster", line 8, in <module>
    sys.exit(main())
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/dagster/cli/__init__.py", line 38, in main
    cli(obj={})  # pylint:disable=E1123
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/dagster/core/telemetry.py", line 231, in wrap
    result = f(*args, **kwargs)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/dagster/cli/pipeline.py", line 417, in pipeline_launch_command
    return instance.launch_run(pipeline_run.run_id)
  File "/home/danny/.pyenv/versions/3.7.7/envs/project-3.7.7/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 788, in launch_run
    return self._run_launcher.launch_run(self, run)
AttributeError: 'NoneType' object has no attribute 'launch_run'

In case it matters: I haven't configured a dagster.yaml file anywhere. If it's a

run_launcher

config that I'm missing, what should it be if I'm running my code or the cli on the same machine where dagit is running on localhost?

Danny

05/21/2020, 7:07 PM

Copy code

$ dagster --version
dagster, version 0.7.12

alex

05/21/2020, 7:37 PM

whats the context? A

run_launcher

is probably the right answer but we likely don’t have one that satisfies your constraints yet (were working on it at the moment).

max

05/21/2020, 7:43 PM

@sashank ^^ if we had

launch_run

this would be straightforward

Danny

05/21/2020, 7:51 PM

Thanks. @alex adding a

run_launcher

config to dagster.yaml will fix the exception? If so, even if it doesn't give me the nonblocking behaviour I want, what should it be for targeting a local dagit instance? This doesn't work:

Copy code

$ cat dagster.yaml
run_launcher:
  module: dagster.core.launcher
  class: RunLauncher

RE working on a way to launch via code in a nonblocking way, is there already a github issue where I can track this work?

alex

05/21/2020, 7:54 PM

what should it be for targeting a local dagit instance?

Thats a piece we dont really have right now, but are working on. Can you talk more about what exactly you are doing? You may be able to just use

threading

multiprocessing

libraries in python to move your

execute_pipeline

call to make it non-blocking depending on your setup.

Danny

05/21/2020, 7:58 PM

I'm trying to get a solid to launch a sibling pipeline, which should run in parallel to the current one, and this solid should immediately cede control back to its own pipeline so it can continue

Danny

05/21/2020, 8:02 PM

@alex threading/multiprocess is a good workaround for the nonblocking issue, thanks. RE targeting local dagit, should I look into setting up and targeting a local graphql endpoint via

dagster_graphql.launcher.RemoteDagitRunLauncher

config for the CLI?

alex

05/21/2020, 8:04 PM

that could work in the interim - that dagit would have to stay up and it cant be the same one you initiated the initial launch from due to weird flask bug with making a request to yourself

alex

05/21/2020, 8:07 PM

its worth trying for now

Danny

05/21/2020, 8:08 PM

My dagit would be up permanently.

it cant be the same one you initiated the initial launch from

@alex just to clarify, you mean if my dagit is executing a solid and that solid is the one that starts a new thread/process to

execute_pipeline()

the sibling pipeline via the localhost graphql endpoint, I'll hit this flask bug?

alex

05/21/2020, 8:10 PM

I think the only real issue is hitting the

Launch Run

button in dagit if you are pointing at yourself - but keep an eye out for weird timeouts trying to reach dagit server

Danny

05/21/2020, 8:12 PM

@alex ok, thanks so much for the help! Really solid project, I'm very much enjoying my current deep dive into it 👍 👍

alex

05/21/2020, 8:13 PM

and just to be clear - I would consider this particular approach risky and likely difficult to migrate forward since we are actively changing how these systems work.

alex

05/21/2020, 8:16 PM

but i am curious if it works

Danny

05/21/2020, 8:29 PM

@alex I just noticed the tutorial shows clearly that a local cli can execute a pipeline in a local dagit server: https://docs.dagster.io/docs/tutorial/basics#executing-our-first-pipeline. So I'm a bit confused. I guess I was looking at the wrong command (

pipeline launch

instead of

pipeline execute

)? Why would I need graphql at all if I can just call

execute_pipeline()

in my code via threading/multiprocessing and get local nonblocking execution?

alex

05/21/2020, 8:32 PM

a local cli can execute a pipeline in a local dagit server

This CLI doesn’t go through

dagit

- the cli command runs

execute_pipeline

in process

Danny

05/21/2020, 8:32 PM

Ahh. Gotcha 🙂

alex

05/21/2020, 8:33 PM

dagit

is a webserver - so when you click

Start Execution

in the web tool a request is sent to the server and it will create a subprocess and effectively execute

execute_pipeline

in there

Danny

05/21/2020, 8:34 PM

Right, makes sense

alex

05/21/2020, 8:34 PM

RunLauncher

is used to transfer responsibility of execution to some other process / system

alex

05/21/2020, 8:35 PM

so the difference between threading / multiprocessing / suprocess.Popen vs using

launch_run

is who owns the process doing execution and all the related complexity of that processes life cycle

Danny

05/21/2020, 8:35 PM

Yep. So graphql seems like the way to go here. It lets me run a central dagit webserver that'll accept requests and won't care if they come from a local or remote cli.

alex

05/21/2020, 8:37 PM

ya the

RemoteDagitRunLauncher

is a way to say “there is dagit running at

$WEB_ADDRESS

go tell it to do this pipeline execution”

alex

05/21/2020, 8:37 PM

you’ll need to make sure that youre using the same

DagsterInstance

across the board https://docs.dagster.io/docs/deploying/instance

👍 1

Danny

05/21/2020, 8:39 PM

we are actively changing how these systems work

Change here means we're tweaking the API, or do you mean you're potentially phasing out graphql and/or planning to make some other major architectural change to everything you've just described above?

Danny

05/21/2020, 8:39 PM

Don't mind tweaking my code to call new APIs, but would rather not bake in an entire architecture that is orthogonal to where the project is going...

alex

05/21/2020, 8:43 PM

well theres currently nothing besides

pipeline_name

used to communicate that we are infact talking about the same pipeline when we do hand-offs so a lot of the changes are introducing structure necessary to improve that situation

alex

05/21/2020, 8:44 PM

we’ve discuessed removign

RemoteDagitRunLauncher

but

RunLauncher

is meant to be a user extendable class so you could always implement your own variant. But we are hoping to write better ones - ie a background daemon responsible for processes instead of a web server - so it might just be a replacement situation

alex

05/21/2020, 8:44 PM

dagit

and

dagster-graphql

are not going anywhere

alex

05/21/2020, 8:44 PM

but a lot in flux so just wanted to give you fair warning

Danny

05/21/2020, 8:48 PM

Got it, thanks

wbonelli

05/21/2020, 11:17 PM

@Danny I had a similar use case and am launching pipelines with a graphql request like this https://gist.github.com/w-bonelli/ac73d9430236b1947a4a202ad1143392

👍 1

🙏 1

❤️ 1

5 Views

Open in Slack

Previous Next