I m having trouble executing a pipeline using the multi proc dagster #announcements

I'm having trouble executing a pipeline using the ...

Ben Sully

06/22/2020, 3:56 PM

I'm having trouble executing a pipeline using the multi-process executor. it looks like i need to wrap my pipeline in

reconstructable

, but as soon as I do that I can't include it in a repository. the trimmed backtrace is this, although i think there's a bug there, and i think the actual root is that the

repository

decorator doesn't accept

ReconstructablePipeline

objects:

Copy code

File "/home/ben/repos/dataplatform-poc/pipelines/dataplatform/repository.py", line 6, in <module>
    @repository
  File "/home/ben/.pyenv/versions/3.7.5/envs/dataplatform-poc/lib/python3.7/site-packages/dagster/core/definitions/decorators/repository.py", line 225, in repository
    return _Repository()(name)
  File "/home/ben/.pyenv/versions/3.7.5/envs/dataplatform-poc/lib/python3.7/site-packages/dagster/core/definitions/decorators/repository.py", line 44, in __call__
    bad_definitions.append(i, type(definition))
TypeError: append() takes exactly one argument (2 given)

alex

06/22/2020, 4:22 PM

how are you executing your pipeline ? you should only need to wrap it in

reconstructable

at the call site

Ben Sully

06/22/2020, 4:26 PM

i'm using dagster-databricks; all works fine with the in-memory executor, but that only allows a single step at a time

alex

06/22/2020, 4:29 PM

i mean, are you calling

execute_pipeline

in a python script or using the cli or using dagit

Ben Sully

06/22/2020, 4:32 PM

ah right, using dagit

alex

06/22/2020, 4:32 PM

Interesting, i wouldn’t expect it to be an issue via dagit

Ben Sully

06/22/2020, 4:33 PM

ah i should clarify. dagit doesn't seem to have a problem running the pipelines, it's just when the steps get to databricks to be executed that the problem appears

alex

06/22/2020, 4:34 PM

aaahhh ok ok - this is likely just some mix up from these architectural changes happening as that PR was being worked on

Ben Sully

06/22/2020, 4:35 PM

the databricks step launcher uses

run_step_from_ref(step_run_ref)

on a pickled

step_run_ref

file to run the pipeline which seems to be the bit that fails 🙂

alex

06/22/2020, 4:38 PM

yep https://dagster.phacility.com/source/dagster/browse/master/python_modules/dagster/dagster/core/execution/plan/external_step.py$129-131

alex

06/22/2020, 4:39 PM

we pull the definition out from the reconstructable pipeline we have and pass that down, but we should be passing

step_run_ref.recon_pipeline

in to

create_execution_plans

alex

06/22/2020, 4:40 PM

a classic migration flexible APIs make it easy to mess up situation

alex

06/22/2020, 4:40 PM

cc @sandy

alex

06/22/2020, 4:55 PM

ill send out a fix ehre

Ben Sully

06/22/2020, 4:58 PM

nice spot, thanks @alex!

Ben Sully

06/22/2020, 8:31 PM

@alex want me to create an issue on github btw?

alex

06/22/2020, 8:31 PM

no need fix is out https://dagster.phacility.com/D3594

alex

06/22/2020, 8:31 PM

for review

alex

06/22/2020, 8:32 PM

(it took longer since I tried to clean up all the

create_execution_plan

callsites which turned out to be too much)

alex

06/22/2020, 8:53 PM

Out of curiosity, why are you trying to use the multiprocess executor? It will only be executing one step at a time right, so its not for parallelism.

Ben Sully

06/23/2020, 7:21 AM

hmm, it was for parallelism yeah; when using the in-process executor dagit only launches one step at a time, but with the multiprocess executor dagit can launch several (even though the databricks launcher only runs for a single step)

alex

06/23/2020, 2:19 PM

oooh ok, I believe I understand the issue now

Open in Slack

Previous Next