https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • m

    Mehdi OUAGHLANI

    04/19/2022, 2:21 PM
    Hello all, I'm trying to deploy Dagster for the first time and I Struggle setting the connection with my postgres db (on GCP). I set a Dockerfile and docker-compose, and I try to run it on my machine an I get the following message:
    ❤️ 1
    j
    • 2
    • 3
  • g

    Grigoriy Sterin

    04/19/2022, 2:25 PM
    I'm seeing a strange bug (?) when running Dagster job on ECS using ECSRunLauncher. I have a set of software-defined assets. I start an asset materialization job from Dagit. Then exception is thrown inside the asset function. Exception is legit - there's an error in the input data, so I see step failure in the log in Dagit
    dagster.core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op...
    . And here goes the most interesting part: underlying ECS jobs keep running and Dagster job also keeps running forever (Dagit shows it's status as
    STARTED
    ), but nothing actually happens in this job, since it has already failed (but somehow Dagster doesn't know about it). Has anyone seen something like this before? Dagster version:
    0.14.7
    j
    d
    a
    • 4
    • 37
  • a

    Alec Ryan

    04/19/2022, 2:49 PM
    Is there a best practice for defining parents for an s3 stage in snowflake?
    j
    o
    • 3
    • 17
  • g

    Grigoriy Sterin

    04/19/2022, 3:38 PM
    And related question: I have several thousands of queued runs which I know will hang in the same way. Is there a way to cancel/terminate them all without clicking though pages in Dagit and terminating them in batches of 25?
    j
    l
    • 3
    • 6
  • m

    Makas Tzavellas

    04/19/2022, 4:30 PM
    Hello all! Apologies in advanced if this is stated somewhere in the document (I couldn't find it). But I need help to schedule a job that has dependency on another job. For example, I have Job A that runs at 12AM daily, and Job B will run at 2AM daily, however, Job B should only run if Job A completed successfully as there are data which it depends from Job A. Could someone let me know how I could configure it in Dagster? Thanks in advance! cc @Kyle Downey
    k
    j
    +2
    • 5
    • 7
  • c

    Chris Evans

    04/19/2022, 5:02 PM
    Hey guys, Running into some issues on 0.14.9 after upgrading from 0.14.6 when testing a dynamic output job w/ an inner graph locally (running dagit as a python process; no other infra). The same job works when deployed on K8s on 0.14.9. The job works locally on 0.14.6. Any thoughts?
    dagster.check.CheckError: Invariant failed.
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
        yield
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/core/execution/plan/inputs.py", line 607, in _load_input_with_input_manager
        value = input_manager.load_input(context)
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/core/storage/fs_io_manager.py", line 152, in load_input
        context.add_input_metadata({"path": MetadataValue.path(os.path.abspath(filepath))})
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/core/execution/context/input.py", line 325, in add_input_metadata
        if self.asset_key:
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/core/execution/context/input.py", line 216, in asset_key
        check.invariant(len(matching_input_defs) == 1)
      File "/Users/chrisevans/Repositories/data-platform/dags/bi/.venv/lib/python3.9/site-packages/dagster/check/__init__.py", line 1167, in invariant
        raise CheckError("Invariant failed.")
    j
    o
    • 3
    • 15
  • d

    Dan Mahoney

    04/19/2022, 6:52 PM
    I’m a newbie to dagster…so I’m running dagster 0.14.5 and am seeing this error: Sensor openlineage_sensor was started from a location repos.py that can no longer be found in the workspace, or has metadata that has changed since the sensor was started. You can turn off this sensor in the Dagit UI from the Status tab. Here is repo.py
    from dagster import repository
    from openlineage.dagster.sensor import openlineage_sensor
    from hello_cereal import hello_cereal_job
    @repository
    _def_ hello_cereal_repository():
    openlineage_sensor_def = openlineage_sensor(
    _minimum_interval_seconds_=60,
    _record_filter_limit_=60,
    )
    return [hello_cereal_job, openlineage_sensor_def]
    Thoughts?
    d
    • 2
    • 33
  • h

    Huib Keemink

    04/19/2022, 7:41 PM
    Hi! We’ve been running Dagster for a few months now, mostly to kick of dbt. Last week we introduced a new job that needs to run every 15 minutes, and I’m seeing a lot of failures happening - for some reason the job can not start and is killed by the run monitoring after the timeout. There’s really nothing in the logs, and there should be enough capacity on the k8s cluster. What can I do to debug this? Any pointers?
    j
    • 2
    • 12
  • g

    geoHeil

    04/19/2022, 7:54 PM
    Why is ``@asset( compute_kind="my_kind",
    only displayed in the job view of dagit and not in the asset graph? Is this a bug? I would like to see these tags show up in both views.
    j
    d
    • 3
    • 2
  • m

    Ming Fang

    04/20/2022, 1:01 AM
    I'm integrating our dbt project with dagsters SDA using
    load_assets_from_dbt_project.
    Shouldn't dbt seeds be treated as dagster assets too?
    j
    o
    • 3
    • 2
  • j

    Josh Taylor

    04/20/2022, 1:24 AM
    Has anyone noticed that ops seem to take a bit longer to start (~5s) compared to solids which were instant? https://github.com/dagster-io/dagster/discussions/7338
    j
    • 2
    • 1
  • b

    Bryan Chavez

    04/20/2022, 4:50 AM
    Is there a way to stop a job mid execution (if a certain condition within one op is met) even if there are other ops branching out in the dag?
    a
    j
    • 3
    • 5
  • s

    Sara

    04/20/2022, 7:13 AM
    Good afternoon, Is there any way to control the synchronies? I run several functions within an graph and I have the problem that it does not wait for one to finish before starting the other. Any solution? Thanks! Example: @graph def do_some_task(): function1() -- (can be an external function, op or another graph) function2() -- (can be an external function, op or another graph) * Dagster starts to perform function2 before function1 has finished.
    s
    j
    • 3
    • 8
  • g

    geoHeil

    04/20/2022, 9:57 AM
    I have a multi-model IO manager for duckdb (parquet files) which can work via pandas and pyspark. For some ingest process I know for sure that I only want to use pandas (and not the pyspark resource). Currently, I must list pyspark under the
    required_resource_keys
    . As a result, pyspark must be instanciated first (and this costs some overhead). I s there any way (without making a 2nd copy of the multimodal IO manager without the pyspark resource to get rid of this delay? When trying to pass
    "pyspark": None,
    dagster complains. So far, it looks for me like I need to specify a dummy resource instead. Is this the right way to go?
    z
    j
    +3
    • 6
    • 17
  • g

    geoHeil

    04/20/2022, 10:29 AM
    I think I might have found a bug in dagit. When trying to stop a running sensor in dagit I get: TypeError: stop_sensor() takes 3 positional arguments but 4 were given.
    j
    d
    • 3
    • 6
  • e

    Emanuele Domingo

    04/20/2022, 11:02 AM
    Hi everyone, i use the graphql api to launch dagster runs from an api. I was wondering if is it possible to attach tags to the run. I wasn't able to find any reference in the docs! Thanks.
    :daggy-success: 1
    • 1
    • 1
  • a

    Alec Ryan

    04/20/2022, 11:49 AM
    Are there any examples of how to partitioned job built by an asset group?
    j
    c
    • 3
    • 8
  • g

    geoHeil

    04/20/2022, 12:32 PM
    How can I specify the output type of an asset? Currently, I output a pandas dataframe, but
    context.dagster_type.typing_type
    is of type
    any
    as the asset seems to be not directly consumed by a downstream asset.
    j
    c
    • 3
    • 2
  • d

    Dan Mahoney

    04/20/2022, 3:17 PM
    Thanks, @daniel for your help yesterday. I’m trying to integrate the openlineage-dagster plugin into my local dagster, which @daniel helped me achieve. All seems good on the dagster side, but when I run a job, I see the following in the daemon’s shell:
    Sensor openlineage_sensor skipped: Last cursor: {"last_storage_id": 9, "running_pipelines": {"97e2efdf-9499-4ffd-8528-d7fea5b9362c": {"running_steps": {}, "repository_name": "hello_cereal_repository"}}}
    and no OpenLineage event is generated. I’m not sure if this is dagster itself or the plugin. I’ve attached my repos.py, hello_cereal.py, serial_job.py and workspace.yaml files. Any thoughts??
    repos.pyworkspace.yamlserial_job.pyhello_cereal.py
    j
    • 2
    • 2
  • s

    Simon

    04/20/2022, 4:03 PM
    https://docs.dagster.io/concepts/assets/asset-materializations#linking-assets-to-an-output-definition- sounds pretty good and interesting and I'd like to try it but then for a partitioned asset, how would this work? `Output`doesn't have a partition argument (https://docs.dagster.io/_apidocs/ops#dagster.Output)
    Out
    does (https://docs.dagster.io/_apidocs/ops#dagster.Out) but that seems like I'd be hardcoding it to a fixed partition (I don't really understand the usecase for this tbh). Is there something else I need to do to link an Op to a partitioned Asset?
    j
    c
    • 3
    • 9
  • m

    Mark Fickett

    04/20/2022, 4:46 PM
    Is it possible to extend the
    @op
    decorator to introduce some common setup code? I don't think a resource can do it since I want access to the op's
    context
    (and a single-process executor wouldn't call it for each op). I tried making a wrapping decorator but I get errors from Dagster's inspection of the argument lists:
    def my_op(*args, **kwargs):
        """Dagster @op that opens a custom context manager based on some shared state."""
        def wrapper(func):
            @op(*args, **kwargs)
            def raw_op(context, *args, **kwrags): # my wrapper that does a little extra work
                with my_custom_setup(context):
                    func(context, *args, **kwargs) # the original function that would normally be decorated with @op
            return raw_op
        return wrapper
    a
    • 2
    • 6
  • j

    jasono

    04/20/2022, 5:38 PM
    How do I pass an argument to dagstermill op from another op? Code example in the thread.
    j
    • 2
    • 6
  • a

    Aaron Bailey

    04/20/2022, 7:03 PM
    Hello all - just starting my journey with dagster as a converted Airflow'er. Having an issue as soon as I start dagit. Operation name: InstanceSensorsQuery Message: (sqlite3.OperationalError) near "(": syntax error [SQL: SELECT subquery.id, subquery.selector_id, subquery.tick_body FROM (SELECT job_ticks.id AS id, job_ticks.selector_id AS selector_id, job_ticks.tick_body AS tick_body, rank() OVER (PARTITION BY job_ticks.selector_id ORDER BY job_ticks.timestamp DESC) AS rank FROM job_ticks WHERE job_ticks.selector_id IN (?)) AS subquery WHERE subquery.rank <= ? ORDER BY subquery.rank ASC] [parameters: ('03064b2df6239201293d653e3d171fd86ca400fa', 1)] (Background on this error at: https://sqlalche.me/e/14/e3q8) Path: ["repositoriesOrError","nodes",0,"sensors",0,"sensorState","ticks"] Locations: [{"line":152,"column":3}] This results in errors on schedules, sensors, and backfills pages not returning.
    j
    p
    • 3
    • 15
  • a

    Alec Ryan

    04/20/2022, 7:15 PM
    Is it possible to pass a
    DailyPartitionsDefinition
    to
    load_assets_from_dbt_project
    ?
    j
    c
    o
    • 4
    • 5
  • l

    Liezl Puzon

    04/20/2022, 7:40 PM
    My jobs (which are supposed to spawn every 5 minutes and take ~1-3mins to complete) are stuck in “Starting” as far back as 3 days even though I have an autoscale GKE cluster I’m also seeing an extremely slow load then error when trying to go to this sensor page https://dagit.nautilusapp.xyz/workspace/example_repo@custom-example-user-code17/sensors/spawn_train_workers and this page https://dagit.nautilusapp.xyz/workspace/example_repo@custom-example-user-code17/sensors Do I need to beef up or repair my cluster? I’m still getting a bunch of completed runs even til now, so I know my sensor is functioning
    ✅ 1
    a
    • 2
    • 7
  • c

    Charlie Bini

    04/20/2022, 8:06 PM
    is there something like
    Permissive
    that I can use in a
    config_schema
    except the only allowable keys are defined but not mandatory?
    ➕ 1
    j
    d
    • 3
    • 11
  • b

    Bennett Norman

    04/20/2022, 8:11 PM
    Hello! How can I kick off a partition backfill and specify config for resources?
    ❤️ 1
    j
    p
    • 3
    • 4
  • f

    Francis Addae

    04/20/2022, 8:16 PM
    Does anyone know how to use python assertion statements with dagster return variables? Just trying to validate a sql query before I push dbt to run it using dagster
    @job
    def my_pipeline():
        e = checkETL.check_sql_query()
        #assert e == True
        #run dbt Model
    
    if __name__ == "__main__":
        result = my_pipeline.execute_in_process()
        # assert result.output_value ==True
    j
    o
    • 3
    • 10
  • m

    Mark Fickett

    04/20/2022, 8:18 PM
    In events, I see a
    step_key
    like
    my_subgraph.opname
    , but docs say it's deprecated. What's the right thing to use for a similar value? And does
    context.solid_def.name
    match that value? Including some notion of graph hierarchy in the key is useful.
    a
    • 2
    • 3
  • l

    Liezl Puzon

    04/20/2022, 10:00 PM
    Can I get a sanity check on my postgres config for my external server? Replaced X.X.X.X with my public IP for cloud SQL. But I keep getting this error in the dagster-dagit Service logs
    sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "dagster-postgresql" to address: Name or service not known
    d
    • 2
    • 38
Powered by Linen
Title
l

Liezl Puzon

04/20/2022, 10:00 PM
Can I get a sanity check on my postgres config for my external server? Replaced X.X.X.X with my public IP for cloud SQL. But I keep getting this error in the dagster-dagit Service logs
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "dagster-postgresql" to address: Name or service not known
still having a problem with this config. It seems like it should be straightforward but the simple config values don’t work Is there a recommended way to use secrets to set this up? I’m not sure what this part means
global:
  postgresqlSecretName: "dagster-postgresql-secret"
d

daniel

04/21/2022, 1:40 AM
Hi Liezl - after deploying the helm chart there should be a configmap called something like dagster-instance (something ending in -instance). Is it possible to paste the dagster.yaml value inside that configmap here? That will help show exactly what it's using to try to connect to postgres
that host name error is confusing and almost seems like it's somehow still using the host or config from back when you were creating the postgres DB in your cluster? (From a previous post)
l

Liezl Puzon

04/21/2022, 1:46 AM
gotcha, yea I just recently switched back to my original config then ran
helm upgrade --install dagster dagster/dagster --version 0.14.3 -f values/production.yaml
and seemed like it’s fine when I go back to postgres DB in cluster
am I supposed to do something more than just run
helm upgrade --install dagster dagster/dagster --version 0.14.3 -f values/production.yaml
after I update the
postgresql:
field to point to external?
d

daniel

04/21/2022, 1:47 AM
One thing to check - when you re-deployed with enabled:False, did the dagit service restart?
the error is almost like it kept going, expecting the in-cluster DB to still be there
l

Liezl Puzon

04/21/2022, 1:48 AM
ooo let me see if I can figure out how to see that by clicking around
so interestingly the configmap seems like the update timestamp is old
d

daniel

04/21/2022, 1:50 AM
if you scroll down to run_storage in the dagster.yaml data it should say what the postgres hostname is. I'd expect that to be updated as soon as you upgraded the helm chart
and then in theory the dagit deployment is set up to restart whenever that configmap changes
l

Liezl Puzon

04/21/2022, 1:51 AM
yeah looks good
let me fix the ip address
but essentially itll show me the same thing but with my db address
thats what i expect
scheduler:      
  module: dagster.core.scheduler
  class: DagsterDaemonScheduler

schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_db:
      username: postgres
      password:
        env: DAGSTER_PG_PASSWORD
      hostname: "X.X.X.X"
      db_name:  postgres
      port: 5432
      params: 
        dbname: postgres
        password: postgres
        port: 5432
        user: postgres

run_launcher:      
  module: dagster_k8s
  class: K8sRunLauncher
  config:
    load_incluster_config: true
    job_namespace: dagster
    image_pull_policy: Always
    service_account_name: dagster
    dagster_home:
      env: DAGSTER_HOME
    instance_config_map:
      env: DAGSTER_K8S_INSTANCE_CONFIG_MAP
    postgres_password_secret:
      env: DAGSTER_K8S_PG_PASSWORD_SECRET
    env_config_maps:
      - env: DAGSTER_K8S_PIPELINE_RUN_ENV_CONFIGMAP

run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_db:
      username: postgres
      password:
        env: DAGSTER_PG_PASSWORD
      hostname: "X.X.X.X"
      db_name:  postgres
      port: 5432
      params: 
        dbname: postgres
        password: postgres
        port: 5432
        user: postgres

event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      username: postgres
      password:
        env: DAGSTER_PG_PASSWORD
      hostname: "X.X.X.X"
      db_name:  postgres
      port: 5432
      params: 
        dbname: postgres
        password: postgres
        port: 5432
        user: postgres
compute_logs:      
  module: dagster.core.storage.noop_compute_log_manager
  class: NoOpComputeLogManager
d

daniel

04/21/2022, 1:53 AM
got it - but back when you had enabled:true, the hostname in there was "dagster-postgresql" right?
l

Liezl Puzon

04/21/2022, 1:53 AM
ok i see the ip changed
d

daniel

04/21/2022, 1:53 AM
so the mystery is why the dagster-dagit service is giving you errors referencing an old hostname
l

Liezl Puzon

04/21/2022, 1:53 AM
helm upgrading
and checking that now
yep that’s right
d

daniel

04/21/2022, 1:55 AM
ok - so the configmap is updating fine, did you check if the dagit pod is restarting? It should be, but if it wasn't, that could explain it still referencing a hostname that no longer exists
l

Liezl Puzon

04/21/2022, 1:55 AM
let me check
dagster-dagit
and see if its updating
it does sound like its something like that
does this look like a problem?
d

daniel

04/21/2022, 1:57 AM
potentially, yeah - does it give any more info why it would be stuck in PodInitializing?
it could be that it got stuck in this interim 'updating' state and the error you're seeing is from the old pod that will spin down as soon as the new one is ready
l

Liezl Puzon

04/21/2022, 1:57 AM
not sure how to force it to restart
but gotcha, sounds like a cluster thing
GKE blocking the restart or something?
d

daniel

04/21/2022, 1:58 AM
one thing you could do just to rule out weird upgrade issues is do a full helm uninstall first (we should be able to handle upgrades, but would let us start from a place where its working at least)
and yeah, maybe if your cluster is right at its resrouce limit there are no resources available for the new pod to spin up or something?
l

Liezl Puzon

04/21/2022, 2:01 AM
ok so that screenshot was from when I re-deployed my original “postgres in GKE” config, let me see if it still gets stuck with the external SQL config
seems like there’s space
d

daniel

04/21/2022, 2:03 AM
yeah that was just a shot in the dark - another possibility is that there's some other issue with the new pod connecting to the external postgres ( there's a pg_isready call that it makes)
which then keeps the old pod from spinning down, which would explain the confusing error about the old hostname
l

Liezl Puzon

04/21/2022, 2:21 AM
gotcha, ok I went back and tried to actually issue a query and realized I didn’t whitelist any IPs to connect to cloud SQL. once I fixed that it worked
d

daniel

04/21/2022, 2:23 AM
Ah great!
View count: 1