https://dagster.io/ logo
Title
d

Dimka Filippov

12/02/2021, 2:29 PM
Hi everyone... need some understanding of the trouble I'm having with executing my pipeline only by Dagit (with direct python and by dagster all works as expected) Sorry, can't share the code, only screenshots from remote VM... I have the following config, job (graph), and a set of ops, almost all of which use an input resource (some Class with functions from a package for DataLake)... except one - aggregate_files don't use it, it takes 2 strings of local paths, uses pandas to aggregate CSVs in one and outputs the string with a local path for it... the paths like "prp/Bluechip/..." are exactly from DataLake and only paths like "PoC/sandbox/..." are local... the problem with Dagit on a step of aggregating files is also present on a screenshot... and can't figure out from where it occurs... the step is configured only for local manipulations... so, why I'm getting this error about "No such file or directory" with a path from a DataLake (that existent for sure because it also used by previous steps to pull files from DL) ???
k

Keshav

12/02/2021, 2:46 PM
Hi @Dimka Filippov. 🙂 You mentioned that aggregate_files is configured only for local manipulations but did you by mistake passed deltalake paths to file1 and file2 and not local as string to the aggregate_files op and using it for local operations ? The path below in error in file Screenshot_20211202_162728.png which is used in aggregate_files op seems like DeltaLake path and not local.
prp/Bluechip/WLB_CFP/2021/11/11/dbo
d

Dimka Filippov

12/02/2021, 3:06 PM
I can show, few secs please
with pulling ops I'm returning strings of local file paths... you see them in a config
the weird is that this works fine by executing from CLI with python or dagster... files are pulled and successfully aggregated and then uploaded back
k

Keshav

12/02/2021, 3:20 PM
This is indeed weird. Is the op failing everytime with Dagit UI or it is often but not always ?
d

Dimka Filippov

12/02/2021, 3:21 PM
with Dagit all attempts were failed
k

Keshav

12/02/2021, 3:22 PM
Can you share screenshot for aggregate_files op ?
d

Dimka Filippov

12/02/2021, 3:23 PM
it is one of the first 6 screenshots

https://dagster.slack.com/files/U02P4P41126/F02P3QZ4NNA/screenshot_20211202_161050.png

k

Keshav

12/02/2021, 3:36 PM
Does it fail on complete re-execution and from failed op also ?
d

Dimka Filippov

12/02/2021, 3:42 PM
now will try from failed op... previously full re-executions were failed too
yes, execution from the last fail is failed too =(
with the same error
k

Keshav

12/02/2021, 3:48 PM
Can you try to reload the workspace from Dagit UI and then try executing the Job again ?
d

Dimka Filippov

12/02/2021, 3:49 PM
yes, will try now
and having the same error on the step of aggregation
k

Keshav

12/02/2021, 4:26 PM
This is really a very primitive way to debug this but can you put
<http://context.log.info|context.log.info>(f"{op name} started")
<http://context.log.info|context.log.info>(f"{op name} ended")
at start and end of the ops ? So that we can check the logs in Dagit Ui whether it has even entered the op to compute ? We need to know where it is even getting that path..
d

Dimka Filippov

12/02/2021, 4:28 PM
ok, I'll try
but the logging of the end I can put only before the return, not after
I will also try to put some prints of outputs to see them
ok, you already helped me a lot 😃
it was my problem
I thought that I copied the whole config to dagit fully identically as it is written in s YAML, but I found that there are replaced keys "local" and "remote"
I'm very sorry that I wasted your time
but 1 more question is still present - how to make Dagit use the already written config.yaml instead of manually putting the contains of it in UI???
it's exactly the bottleneck where the trouble occurred in my situation, and it would be great to eliminate it in future
k

Keshav

12/03/2021, 4:47 AM
Hi. You can try putting the config in dagster.yaml file and check. :-)
d

Dimka Filippov

12/03/2021, 12:15 PM
thanks for the idea, but it's not working... tried to re-launch Dagit after clearing the UI config field, simple reloading of workspace... config is not filling automatically from file dagster.yaml that is placed in the root of the project... so I'm still searching for this feature
k

Keshav

12/03/2021, 12:49 PM
d

Dimka Filippov

12/03/2021, 12:56 PM
yes, it's already seen by me, and I'm using a "--config" for dagster CLI execution, but there is no similar option for Dagit to point to the config file... this article also mention that I can use a Dagit's Launchpad to supply a config in YAML format... my requirement is to set up Dagit to automatically pick the config.yaml or dagster.yaml file into the Launchpad, so that I will not need to manually put config into the UI
k

Keshav

12/03/2021, 1:23 PM
Another thing you can try is to pass the config to Job (pipeline for legacy) at runtime. Parse the YAML and pass it as a dict. Check out the article for both latest and version 0.12.15 for an example and how-to. Latest - https://docs.dagster.io/_apidocs/jobs 0.12.15 - https://docs.dagster.io/_apidocs/pipeline
d

Dimka Filippov

12/03/2021, 2:21 PM
oh, nice thank you... now I can bind the already having config in dict format to the job and even no more need to have a separate file with config 😃
and Dagit executes fine... just showing the empty field of config on Launchpad... that may surprise some of ours 😁
k

Keshav

12/07/2021, 4:13 AM
Glad you got it working 😄
❤️ 1