https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • l

    Leo Qin

    09/20/2022, 11:19 PM
    hi all - any tips/inspiration for how to reconcile dbt tags vs dagster asset groups when it comes to asset selection? Each asset can only belong to one group whereas a dbt model may have many tags.
    o
    • 2
    • 2
  • b

    Brian Seo

    09/20/2022, 11:55 PM
    If i wanted to add tags to just backfills launched from dagit by default, where would I need to add the tags in the job definition code?
    d
    • 2
    • 3
  • d

    David Ankin

    09/21/2022, 12:26 AM
    does dagster have startup hooks/init hooks? for running database migrations and such? (i guess that would just go in the repository function?)
    c
    • 2
    • 4
  • n

    Navneet Sajwan

    09/21/2022, 6:29 AM
    Hi everyone, Is there a limit to the pipeline duration in dagster?
    z
    d
    • 3
    • 3
  • g

    Gintvilė Bergerytė

    09/21/2022, 7:12 AM
    Hello 👋, I am quite new to dagster and was wondering whether is it possible to pass values to
    k8s_job_op
    dynamically: https://docs.dagster.io/_apidocs/libraries/dagster-k8s#ops For example my op would look something like this:
    make_magic = k8s_job_op.configured(
        {
            "image": "my_image",
            "args": ["python3", "script.py"],
            "env_vars": [
                "ENV_VAR_1",
                "ENV_VAR_2"
            ]
            "image_pull_secrets": [{"name": "secret1"}],
        },
        name="make_magic",
    )
    and I would like to reuse this op by somehow passing the
    image
    ,
    args
    and
    env_vars
    values. Is it possible to achieve that? Maybe
    k8s_job_op
    could somehow access
    resources
    ?
    👀 1
    d
    • 2
    • 2
  • f

    FranciscoGS

    09/21/2022, 10:25 AM
    message has been deleted
    o
    • 2
    • 2
  • h

    Harry James

    09/21/2022, 10:36 AM
    Hi, when I use materialize() results are stored in DagsterInstance.temp_storage. Can this behaviour be changed such that results are stored in local_artifact_storage? [SOLVED]
    • 1
    • 1
  • m

    MarvinK

    09/21/2022, 2:19 PM
    Hi, question about the DockerRunLauncher. My dagit, dagster, postgresdb and user code server are configured in my docker-compose with a network "docker_network". Communication works fine. My setup is accordingly to the docker-deploy example https://github.com/dagster-io/dagster/tree/master/examples/deploy_docker Now I wanted to use the DockerRunLauncher, but I encounter an error. If i don't specify a network in the dagster.yaml and start a job, the container will be created and starts, but will throw errors because it can't connect to my postgres container. (Cant resolve the hostname because no network) If I specify a network in the dagster.yaml, to ensure that the hostname of my postgres can be resolved, I get following error:
    docker.errors.NotFound: 404 Client Error for <http+docker://localhost/v1.41/containers/7d273d70cfca87e7f2e292ff71a4277e64eaeb7f62a963594b2b1bcdb0335912/start>: Not Found ("network docker_network not found")
    Why does the networking not work? This is my dagster.yaml
    run_launcher:
      module: dagster_docker
      class: DockerRunLauncher
      config:
        env_vars:
          - DAGSTER_POSTGRES_USER
          - DAGSTER_POSTGRES_PASSWORD
          - DAGSTER_POSTGRES_DB
        network: docker_network
        container_kwargs:
          volumes: # Make docker client accessible to any launched containers as well
            - /var/run/docker.sock:/var/run/docker.sock
    d
    • 2
    • 3
  • r

    Romain

    09/21/2022, 4:04 PM
    Hello! As mentioned elsewhere, we encounter an issue while trying to update an ECS cluster with Dagster running on it. The daemon task seems to start, then stop, then restart again in a loop. Sensors and Scheduler are dead. Backfill and Run queue are fine. We can launch individual runs (they are started by the daemon in between restarts). We are running Dagster 1.0.5. Hints : • We push the code using a CI tool that update containers one after the other. Apparently, the daemon container runs just fine up until Dagit is updated. Then sensors and shedules seem to fail. So it might be that the problem occurs between Dagit and the daemon, not within the daemon container by itself. • A rollback to a previous commit runs ok. But adding new work seems to cause the issue. The new work adds three new repositories (10 in total) and ~50 jobs (150 in total). Then we have a sensor and a schedule for each job. Is it possible that there are too many sensors and schedules for the daemon to handle? It seems to be ok in local tough. Daemon is 2 vCPU and 4 Go RAM I can gladly provide more details if necessary
    d
    • 2
    • 65
  • j

    Jeffery

    09/21/2022, 4:09 PM
    Hi team, Is there a “finally” concept for ops such that regardless whether or not I raise a failure it will run a “cleanup” op? And bit of help is greatly appreciated!
    o
    • 2
    • 2
  • c

    Charlie Bini

    09/21/2022, 4:18 PM
    ok here's a twist:
    context.job_def.graph.name
    returns
    sync_all
    which is the name of the parent graph. these are all subgraphs
    o
    • 2
    • 2
  • b

    Balázs Dukai

    09/21/2022, 4:41 PM
    Hello! I'm curious what would be an "idiomatic" way to generate test data with my pipeline, besides the production run? So, basically I would like to run the pipeline on a subset of the source data in order to generate a test set. This test set would be re-generated on each production run, to make sure that we always have a small, up-to-date and complete set for developing the pipeline locally. Should I use a partition for this subset? Or define a separate job with a configuration for the subset? Or a multi-asset? I'm a bit lost here 🙂 Any suggestions?
    o
    • 2
    • 5
  • y

    Yang

    09/21/2022, 6:48 PM
    Hi! I have another question...I don't have a very good understanding of loggers. I'm locally testing a job in a jupyter notebook, and right now it prints the logs into the notebook. How do I send those to a file? Thanks!
    o
    • 2
    • 18
  • d

    Dusty Shapiro

    09/21/2022, 7:23 PM
    👋 Potential terrible question. I’m looking to deploy Dagster via Helm chart, and curious on the method and format of the user/local code that gets deployed alongside the application. Based on the docs, I update the Dagster template values to include my user deployment, which seems to insinuate that my dag code should be built as a Docker image separate from the application. Therefore, is the correct method to do this: 1. Define a new Dockerfile that simply adds the local DAG code to the image 2. Build, tag, and push to ECR 3. Reference that image/tag in the
    dagster-user-deployments
    config?
    d
    • 2
    • 4
  • a

    Antoine Valette

    09/21/2022, 10:31 PM
    Hello all! I'm running Dagster on Kubernetes to train ML models using sometimes GPUs. I have a graph where some ops never require a GPU and some ops sometimes do. Is there a from the run config to configure (maybe using tags?) k8s resource requests and limits at the op level? The goal would be too use GPUs only on ops requiring them, when they're required at all.
    d
    • 2
    • 2
  • l

    Leo Qin

    09/22/2022, 1:36 AM
    hello, i'm trying a thing where i'm setting io manager configurations for each asset using
    with_resources
    , but when I try to materialize the asset from dagit i get an error that says
    __ASSET_JOB cannot be executed with the provided config. Please fix the following errors: Missing required config entry "ops" at the root
    , if i shift-click and have it scaffold the job, i can tell that it knows all the configs needed, but the ones I set during asset definition aren't being respected. Any ideas what's going on?
    o
    • 2
    • 6
  • p

    peay

    09/22/2022, 11:35 AM
    Hello! I've started using software-defined assets. For assets that need specific resources, I've figured out I can create a job out of them, and specify resources in the job. However, is there a way to specify "default resources" for when using the "Materialize" buttons in Dagit directly on the assets? Although I can use a job, the ability to directly materialize from the asset graph sound great, but it fails if you do not provide resource definitions. Thanks!
    • 1
    • 1
  • g

    Grigoriy Sterin

    09/22/2022, 3:06 PM
    Hello. When trying to define a graph input as an
    Optional[Dict[str, str]]
    like this:
    @graph
    def my_graph(my_dict: Optional[Dict[str, str]]):
        pass
    I'm getting the following error:
    dagster/core/workspace/context.py:554: UserWarning: Error loading repository location repo.py:TypeError: Optional[t] requires a single type. Got <dagster.core.types.python_dict._TypedPythonDict object at 0x7fbb4a0662b0>.
    Is there any way around this? Thank you.
    o
    • 2
    • 2
  • c

    Carter

    09/22/2022, 3:45 PM
    Hey guys, had a quick question regarding the run_retries dagster instance option from https://docs.dagster.io/deployment/run-retries - I was wondering if there was a way to configure retry_strategy on an instance level rather than having to specify it on a per-job basis.
    c
    • 2
    • 3
  • m

    Marc Keeling

    09/22/2022, 3:53 PM
    I was wondering what the difference between
    HookContext
    and
    OpExecutionContext
    is? It seems like there is some overlap there, but wanted to see what the experts say.
    o
    • 2
    • 2
  • r

    Ryan Riopelle

    09/22/2022, 4:47 PM
    Are there any examples about how to mock patch a resource or an asset so fake data is returned? The docs give examples for testing that contain mock objects, but have not figured out how or where to set the mock patch return value.
    o
    • 2
    • 3
  • s

    Simon Vadée

    09/22/2022, 5:10 PM
    Hey folks! I’m having troubles with the materialization of a SDA using an IOManager: I have a SDA that takes a very large table as input (bigger than my memory I mean) and I want to stream the output of my SDA to an other table in my database. I’m trying to use generators to handle streaming like this (simplified version) :
    class PandasIOManager(IOManager):
        def __init__(self, con_string: str):
            self._con = con_string
    
        def handle_output(self, context, obj: Iterator[pd.DataFrame]):
            obj.to_sql(table, con=self._con, if_exists="append")
    
        def load_input(self, context) -> Iterator[pd.DataFrame]:
            """Load the contents of a table as a pandas DataFrame."""
            return pd.read_sql(f"SELECT * FROM {table}", con=self._con, chunksize=10)
    
    @asset(
        ins={"my_model": AssetIn(input_manager_key="pandas_df_manager")},
        io_manager_key="pandas_df_manager",
    )
    def downstream_asset(upstream_asset):
        for chunk in upstream_asset:
            # do stuff
            yield pd.Dataframe(result)
    I get
    dagster._core.errors.DagsterInvariantViolationError: Compute function for op "downstream_asset" yielded a value of type <class 'pandas.core.frame.DataFrame'> rather than an instance of Output, AssetMaterialization, or ExpectationResult. Values yielded by ops must be wrapped in one of these types. If your op has a single output and yields no other events, you may want to use `return` instead of `yield` in the body of your op compute function. If you are already using `return`, and you expected to return a value of type <class 'pandas.core.frame.DataFrame'>, you may be inadvertently returning a generator rather than the value you expected.
    an tried wrapping my output without success 🙃 I made some research across issues and messages from this slack but everyone is using
    @op
    and
    DynamicOutput
    which I can’t do since I’m using SDAs produces by the
    dagster-dbt
    integration 🤯 . I found some people saying that SDA were not meant to be dynamic (outputs must be known when building the graph or something...) but then I don’t know how to do! Maybe I should use a different dagster API to achieve what I’m trying to do? help me out!
    o
    • 2
    • 3
  • m

    Matthew Karas

    09/22/2022, 6:48 PM
    I'm sure this has already been asked before but is there a way to store schedules in a database rather than the codebase? If I need to create several schedules - it can get pretty messy.
    o
    • 2
    • 17
  • z

    Zach

    09/22/2022, 8:44 PM
    Hey all. Not sure if this is a bug on your end regarding the package, or Poetry, but I figured I would note in here that Poetry and PDM are having a particularly hard time handling dependency conflicts between dagster and dagster-postgres. Does this have to do with dagster-postgres 1.0.5 being yanked on PyPi? I have 0 experience with publishing packages, so not sure what can go wrong there. Here is terminal output for what used to work without a hitch when specifying package dependencies:
    wh git:(master) ✗ poetry add dagster dagster-postgres
    Using version ^1.0.10 for dagster
    Using version ^1.0.5 for dagster-postgres
    
    Updating dependencies
    Resolving dependencies... (0.0s)
    
      SolverProblemError
    
      Because no versions of dagster-postgres match >1.0.5,<2.0.0
       and dagster-postgres (1.0.5) depends on dagster (1.0.5), dagster-postgres (>=1.0.5,<2.0.0) requires dagster (1.0.5).
      So, because wh depends on both dagster (^1.0.10) and dagster-postgres (^1.0.5), version solving failed.
    UPDATE: Updating Poetry to the latest version fixes this. Feel free to delete if this is too irrelevant. 🙂
    a
    d
    • 3
    • 4
  • j

    Julian Mudd

    09/22/2022, 9:24 PM
    Hey everyone, having an issue with the dagster and dagit CLIs. I'm on Mac M1 and just ran this fresh install from the docs:
    pipenv install dagster dagit requests
    Upon trying to run any command with either the
    dagster
    or
    dagit
    CLIs, I get this error:
    ImportError: dlopen(/Users/papi/.local/share/virtualenvs/dagster-oVuOtnKT/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace (__ZN6google8protobuf15FieldDescriptor12TypeOnceInitEPKS1_)
    I saw in an earlier message a few months ago about an issue with the
    grpcic
    library on Mac M1s. I tried installing with the
    --no-binary
    flag but it didn't change anything. Would love some help here!
    :dagster-bot-responded-by-community: 1
    z
    • 2
    • 5
  • d

    Dave Muse

    09/22/2022, 9:38 PM
    Hello, I am new to Dagster. I am following along with this tutorial and running into issues when attempting to materialize assets. I receive several errors, including STEP_FAILURE ("STEP_FAILURE - Execution of step "cereal_ratings_csv" failed") and RUN_FAILURE errors ("RUN_FAILURE - Execution of run for "__ASSET_JOB" failed. Steps failed: ['cereal_ratings_csv']"). I'm also seeing a FileNotFoundError ("[Errno 2] No such file or directory: 'cereal.ratings.csv.zip'"). My code is exactly as shown in the tutorial. Have attempted to troubleshoot, including providing full path name for location of zip file (which is being downloaded and placed into my working directory correctly) to no avail. No issues with previous tutorial steps.
    o
    • 2
    • 2
  • d

    daniel

    09/22/2022, 9:55 PM
    Hey Ismael - the start argument to that function is actually a datetime object, not a string. You'll want to turn it into a string within that function (by calling str(start)) or start.strftime for a particular date format)
    i
    • 2
    • 11
  • s

    Slackbot

    09/22/2022, 10:25 PM
    This message was deleted.
    d
    d
    +3
    • 6
    • 46
  • w

    William

    09/23/2022, 2:58 AM
    This 30-min auto job run failure happens a lot
  • s

    Sanidhya Singh

    09/23/2022, 3:17 AM
    you need to run Dagster Daemon https://docs.dagster.io/deployment/dagster-daemon
    s
    • 2
    • 1
Powered by Linen
Title
s

Sanidhya Singh

09/23/2022, 3:17 AM
you need to run Dagster Daemon https://docs.dagster.io/deployment/dagster-daemon
s

Stephen Bailey

09/23/2022, 1:32 PM
For local testing of sensors, I recommend spinning up all the services using a docker-compose script, like here.
View count: 1