https://dagster.io/ logo
Title
b

Brian Pohl

10/13/2022, 3:57 PM
here is my current config, for easy copy/pasting and also searchability:
ops:
  execute_mi_sales_zip_30_lag_est_snowpark:
    config: {'inputs': {'models_ran': True}}
  run_metrics_models:
    config: {'inputs': {'lag_365_est': True, 'lag_90_est': True}}
y

yuhan

10/13/2022, 9:18 PM
hey Brian, converting the dicts to yaml format should work:
ops:
  execute_mi_sales_zip_30_lag_est_snowpark:
    config:
      inputs:
        models_ran: True
  run_metrics_models:
    config:
      inputs:
        lag_365_est: True
        lag_90_est: True
r

rex

10/13/2022, 9:20 PM
Hi Brian, I responded to your email, but we can continue this in the thread:
I would suggest a different course of action here. In Dagster, there is a native abstraction to execute subsets of the job. So rather than editing the configuration of your ops to determine whether they should be executed, you can configure the job subset.
https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#executing-job-subsets
Re-execution from failure is a common pattern. Again, Dagster has this re-execution mode built in:
https://docs.dagster.io/guides/dagster/re-execution#re-execution-in-dagit
y

yuhan

10/13/2022, 9:22 PM
ah misread your code, you were trying to pass input values via the
config
field which was why you’re seeing the config error. this should work
ops:
  execute_mi_sales_zip_30_lag_est_snowpark:
	inputs:
	  models_ran: True
  run_metrics_models:
    inputs:
      lag_365_est: True
      lag_90_est: True
b

Brian Pohl

10/13/2022, 9:24 PM
thanks @rex! I am using the selection syntax to pick a subset. I also tried to "re-run from failed", but because i've only just set up my Dagster instance, i haven't yet configured it to store logs in S3. so when i try to re-run, it can't find the file that would say if the previous steps succeeded or failed. this is why i'm trying to artificially feed in the status of the prior steps.
y

yuhan

10/13/2022, 9:33 PM
how’s your op selection look like? the errors you got indicate that all inputs of
run_metrics_models
are satisfied, i.e. its upstream ops
execute_mi_sales_..
were all selected to launch. so there’s no need to provide the input values via the Launchpad, and instead, it will load the inputs from the
execut_mi_sales_
outputs
looked your graph once again, i’d recommend modeling your ops using Nothing Dependency, which is close to the traditional task dependency. with that, you don’t need to provide the boolean
ran
indicator of whether the upstream has run.
👀 1
b

Brian Pohl

10/13/2022, 9:39 PM
@yuhan hmmm i can give that a shot, but i'd prefer to not have to re-run all of this if i can. it took 15 hours to run the first few steps and i was hoping it'd be pretty easy to re-run the last steps. if i am already re-running it, i can re-run it as is and it will pass. but thanks for the tip! i will try this out for future jobs for sure.
y

yuhan

10/15/2022, 12:29 AM
im trying to repro it from my end, but im able to see the
inputs
field. when i selected just one of the upstream, i was able to see the config error which prompt me to add the
inputs
field. here’s the code i tried with:
from dagster import job, op


@op
def upstream_1():
    return True


@op
def upstream_2():
    return True


@op
def upstream_3():
    return True


@op
def run_metrics_models(in_1, in_2, in3):
    return True


@op
def create_sql_export(in_):
    return True


@job
def my_job():
    create_sql_export(run_metrics_models(upstream_1(), upstream_2(), upstream_3()))
could you check to see if the op selection is correctly specified?
b

Brian Pohl

11/02/2022, 8:19 PM
hi @yuhan, sorry that I went silent for so long. my projects got re-prioritized, but i'm back on this. ultimately i was able to fix my issue the first time around by configuring the IO manager and re-running the entire job. but now i'm back again, wanting to avoid re-running this 18 hour job. i can't rely on the IO manager because i've modified this job enough that i can't use the outputs of previous runs. but i'd really only like to run the last few steps, because those are the ones i've modified. i've got new steps,
dynamo_import_0
,
dynamo_import_1
, and
dynamo_import_2
, which are all downstream of
create_sql_export
. I am trying to run these 4 steps, and if I put nothing in the config, I do get an error saying i need to put something at
ops:create_sql_export:inputs
(first pic). however, if i actually do that, i get an error about how
inputs
is not expected (second pic)
inputs
is both unexpected and required 😅
and sorry i cropped it out of the screenshot, but my op selection syntax is
"create_sql_export"++
could this be because i am using an old version of Dagster in the Dockerfile for the workers? i ask because it looks like somebody has this problem a year ago. again as a reminder I am using Dagster Hybrid, so the Dagit UI is not managed by my team. but we are not using the newest version of the
dagster
Python library because I hit some dependency issues. I'm currently using 1.0.2 in the Dockerfile. here's my full list of Python requirements:
agate==1.6.3
dagster==1.0.2
dagster-cloud==1.0.2
dagster_aws==0.16.2
dagster_dbt==0.16.2
dagster_k8s==0.16.2
dagster_snowflake==0.16.2
dagster_pandas==0.16.2
dbt-core==1.2.0
dbt-snowflake==1.2.0
numpy==1.23.3
pandas==1.4.4
pandasql==0.7.3
snowflake-connector-python==2.7.12
snowflake-connector-python[pandas]==2.7.12
snowflake-snowpark-python==0.7.0
statsmodels==0.13.2
@yuhan @rex any other ideas here?
d

daniel

11/10/2022, 6:39 PM
Hey Brian - do you have a link to a run in cloud that you could post here or DM?
or to the job, rather, since the issue is in the Launchpad
b

Brian Pohl

11/10/2022, 6:45 PM
sure thing, i'll DM it to you
for anyone reading this thread, the issue is a bug and was logged here: https://github.com/dagster-io/dagster/issues/10471 there is already a PR out to fix it. thanks everyone for your help!
🙏 1