https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-feedback
  • s

    Simon

    05/20/2022, 9:50 PM
    FYI: The docs and reality seem to disagree 😛 From https://docs.dagster.io/_apidocs/ops#dagster.op
    name (Optional[str]) – Name of op. Must be unique within any
    GraphDefinition
    using the op.
    but if I have two Ops with the same
    name="abc"
    argument in different Jobs but in the same Repository I get the following error:
    .../.venv/lib/python3.7/site-packages/dagster/core/workspace/context.py:563: UserWarning: Error loading repository location repository.py:dagster.core.errors.DagsterInvalidDefinitionError: Conflicting definitions found in repository with name 'abc'. Op/Graph/Solid definition names must be unique within a repository. OpDefinition is defined in job 'test1' and in job 'test2'.
    For reference Dagster versions, I can try with the new release at a later point if that makes a difference
    $ pip freeze | grep dag 
    dagit==0.14.9
    dagster==0.14.9
    dagster-graphql==0.14.9
    :dagster-bot-resolve-to-issue: 1
    e
    d
    • 3
    • 3
  • l

    Liezl Puzon

    05/22/2022, 7:58 PM
    just re-surfacing this feature request. bulk kill tools based on filters would be awesome! (maybe with some extra protection like “type confirm if you actually want to kill all these runs”)
    :dagster-bot-resolve-to-issue: 1
    s
    j
    • 3
    • 8
  • g

    George Pearse

    05/23/2022, 12:04 PM
    Would be nice to have a way to specify and store why you've cancelled a run (just copying from Spinnaker deployment pipelines)
    :dagster-bot-resolve-to-issue: 1
    a
    s
    d
    • 4
    • 4
  • g

    George Pearse

    05/24/2022, 2:57 PM
    When's dagster going to hit version 1.X.Y ? API design feels like it's hit a stable point. Suspect there are some Engineers who have hesitated due to the signalling of the 0.X.Y
    ➕ 3
    :dagster-bot-resolve: 1
    🤔 1
    s
    y
    • 3
    • 2
  • s

    Son Giang

    05/25/2022, 4:07 AM
    It’s minor but are there any plans to support delete backfill runs?
    :dagster-bot-resolve-to-issue: 1
    s
    p
    d
    • 4
    • 5
  • s

    Son Giang

    05/27/2022, 3:32 AM
    Hi there, is there any plan to support
    after_cursor
    and
    before_cursor
    in
    RunsFilter
    like with
    EventRecordsFilter
    ?
    s
    p
    • 3
    • 4
  • b

    Binoy Shah

    05/27/2022, 4:47 PM
    I am seeing issues with Docs, https://github.com/dagster-io/dagster/tree/master/examples/user_in_loop The Readme says go to https://docs.dagster.io/examples/user_in_loop for details, but there’s 404 on the link
    :dagster-bot-resolve: 1
    y
    • 2
    • 3
  • h

    Hendy Irawan

    05/29/2022, 4:10 PM
    I opened PR for dagster Helm chart -> https://github.com/dagster-io/dagster/pull/8112 Dagster also needs to detail more on the Dockerfile needed for User Code Deployment, as current doc is very vague. The build-arg
    BASE_IMAGE
    is not being explained. cc @Shaun McAvinney
    :dagster-bot-resolve: 1
    ❤️ 2
    r
    • 2
    • 1
  • s

    Sanidhya Singh

    05/30/2022, 5:07 AM
    Dagit’s left nav now groups jobs by repository in 0.14.17. will be great if we can give some custom names to the repo tabs from
    workspace.yaml
    :dagster-bot-resolve: 1
    r
    • 2
    • 9
  • l

    Liezl Puzon

    05/31/2022, 2:43 PM
    we should be able to terminate runs directly on the run page
    👍 1
    :dagster-bot-resolve: 1
    👍🏽 1
    d
    r
    • 3
    • 2
  • m

    Mark Fickett

    06/01/2022, 1:13 PM
    When adding a Dagster Cloud location, if I pass
    --module-name
    the CLI times out waiting for the update and the UI just says "Loading" for hours; I don't see an error from the agent. The code location docs say to use
    --package-name
    and don't mention
    --module-name
    . Would it make sense to remove the
    --module-name
    option from the
    dagster-cloud
    CLI entirely? It would be nice to either get an error message, or not have the option to do it the wrong way.
    d
    • 2
    • 26
  • a

    Alec Ryan

    06/02/2022, 12:08 PM
    Not sure if this is the right channel, but it would be good to see how other users are doing blue/green deploys, CI/CD, data ops with dagster. Not sure if that lives in the docs currently
    ➕ 1
    s
    b
    • 3
    • 3
  • m

    Mark Fickett

    06/02/2022, 12:50 PM
    I would love to use the multiprocess executor's start_method: forkserver (https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor) but have hit crashes with it in the past. Is there a tracking bug for improving stability there? Is https://github.com/dagster-io/dagster/issues/4041 the right issue to watch? Are the issues with certain libraries something Dagster could potentially solve, or is that up to the underlying Python multiprocess library?
    o
    a
    • 3
    • 6
  • z

    Zach

    06/02/2022, 11:17 PM
    Interestingly if you have two jobs with the same name in two different code locations Dagster seems to have trouble differentiating between their runs - a run in a job from one code location will show up in the runs history for a job with the same name in a different code location. The status page is a good place to see this - concordance_dnax was launched when there was one user code location, then I deployed user code location from a different branch (with all the same jobs) and now it shows two concordance_dnax jobs that point to the same run.
    ➕ 1
    r
    • 2
    • 1
  • c

    Charlie Bini

    06/03/2022, 8:31 PM
    small nitpick: could you make the search bar persistent on the docs?
    e
    • 2
    • 2
  • c

    Charlie Bini

    06/03/2022, 9:16 PM
    also not sure if this has been brought up yet, but Launchpad doesn't render config YAML properly: list indentations are missing
    r
    • 2
    • 2
  • b

    Binoy Shah

    06/06/2022, 6:12 PM
    A minor UI alignment issue, browser is at 120% zoom
    :ty-spinny: 1
    j
    d
    d
    • 4
    • 3
  • k

    Kayvan Shah

    06/07/2022, 7:09 AM
    Add zoom in, zoom out and auto scaling feature for the window pane to view different charts for DAGS(waterfall, timeline, etc) Currently it is tough to view the entire chart in on go
    j
    d
    • 3
    • 4
  • s

    Stephen Bailey

    06/07/2022, 1:04 PM
    this is probably one of those things that has been thought about a lot, and there's no good solution, but i would love for there to be a more concise way to specify config parameters. Configuring a parameter for the simplest possible op requires four levels of nested config, and while I am used to it now, it still feels heavy to me at times. Current
    @op(config_schema={"message": str})
    def print_something(context):
        print(context.op_config["message"]
    
    default_config = {
        "ops": {
            "print_message": {
                "config": {
                    "message": "foo"
                }
            }
        }
    }
    
    @job(config=default_config)
    def print_message_job():
        print_message()
    what i was expecting when i first started was something where op config was a sort of "primary" config, and everything else was secondary, something like:
    @op(config_schema={"message": str})
    def print_something(context):
        print(context.op_config["message"]
    
    default_config = {
        "print_something.query": "select 1"
    }
    
    @job(op_config=default_config):
    def print_message_job():
        print_message()
    ➕ 1
    z
    s
    • 3
    • 6
  • b

    Binoy Shah

    06/07/2022, 1:33 PM
    Does anybody have experience with age and scalability of Dagster after running couple 100Ks of jobs. How does Db maintenance and job/dagit scaling work. Are there any Job pruning recommended/facilitated ? it would be nice if there were a command that does it
    dagster prune --keep-last-days 10
    dagster prune --keep-last-jobs 50000
    any thoughts on growth and maintenance of dagster? I have read horror stories with scaling of Airflow jobs, wanted to hear more about veteran dagster users on the same. Moderators, please let me know if this discussion topic is appropriate for this channel.
    💡 2
    j
    • 2
    • 1
  • d

    Donny Winston

    06/07/2022, 7:17 PM
    I couldn't figure out how to successfully associate a job runrequest with a run status sensor, where a run gets requested and is picked up by the daemon to run.
    ➕ 1
    s
    • 2
    • 3
  • g

    George Pearse

    06/08/2022, 12:13 PM
    Would be great if Dagster employees could blog a little more about good pipeline / Data Engineering practice: Things like: • How partitions help you to achieve idempotency and as a result more robust pipelines that are easier to debug and refactor (for performance improvements with identical expected data outputs) + parallelise execution for backfills. • How software defined assets enable you to increase the transparency of your pipelines so that data consumers can debug / understand problems and their downstream consquences. • How extracting and loading to S3 with a separate sensor to run the load to target datasource for when the data becomes available can lead to more efficient resource use (/the extraction step, if mostly just running a slow DB query doesn't use much RAM but still can block other processes if the two pipelines are kept together). • The value of type annotation and mypy for catching bugs early (don't think prefect has that?) These things were not obvious to me when I started and I'm not even sure they're all correct/true now / I'd write more about it but I'm not sure that what I'd be writing would be correct. One of the things that dagster has helped me most with is how to structure work so that it is maintainable. Edit: + More thought leadership on how to test pipelines, you've done a lot for unit testing in data engineering, but Data Engineers tend to also use Prod databases -> Dev DataWarehouse, to check that the whole pipeline works properly, which your Yaml Configs handle very nicely. Or ways to check that an asset is the same after a refactor (checksum / some form of hashing?)
    ❤️ 9
    s
    y
    • 3
    • 3
  • g

    George Pearse

    06/09/2022, 1:48 PM
    Would be nice if the description for a job here, also displayed when you clicked overview.
    👍 1
    j
    • 2
    • 2
  • b

    Binoy Shah

    06/09/2022, 3:22 PM
    Discussion Topic: Versioning of Dags/Pipelines and display in dagit. Hi I wanted to bring up discussion on Versioning of dags/pipelines. This is not talking about git versioning but rather treating a single pipeline [dag] as versioned artifact. Displaying the version in the UI so the pipeline executor can visually inspect what’s version is running Deployment Modes: 1. Kubernetes User Deployments: One pipeline [dags] per docker image (deployment) 2. Kubernetes User Deployments: Many pipelines [dags] per docker image (deployment) When working with docker images, the version is managed and deployed, but here the versioning stays limited to CI/CD and not exposed to end consumers/analysts. There is still simplicity due to 1:1 mapping between docker image release and pipeline release, but when working with multiple dags/pipelines within single docker image [user-deployment], the version traceability is mostly lost without any rigid tracking process Is there a conventional way of displaying what dag / job version is running/deployed currently. Can we tag Job definition and display tags for the job on the UI ?
    :dagster-bot-responded-by-community: 1
    z
    g
    • 3
    • 3
  • s

    Stephen Bailey

    06/10/2022, 5:16 PM
    I want to make an
    @whatnot_job
    decorator that merges in default config into job definitions. Has anyone done this before? I am having some difficulty getting the args to pass correctly. What I want is something like this -- anyone
    def whatnot_job(func, **kwargs):
        # merge default tags with passed in tags
        default_tags = {"foo": "bar"}
        kwargs["tags"] = kwargs.get("tags", {}).update(default_tags)
    
        # create a job by passing in updated kwargs to @job decorator
        @job(**kwargs)
        def my_job():
            return func
    
        return my_job
    anyone have ideas on how to do this? also, not sure if #dagster-feedback is the right place for these "best practices" type of questions?
    m
    j
    d
    • 4
    • 8
  • s

    Simon

    06/13/2022, 1:32 PM
    FYI the intro tutorial gives a 404 and Advanced tutorials goes to Run Configuration
    j
    • 2
    • 4
  • v

    Vinnie

    06/13/2022, 2:42 PM
    Hey folks, I have a base job that follows a set of operations. With the way we initially developed our in-house ETL library, we’re passing configs to this library and executing a series of steps, which led really nicely to having a single job in dagster and only defining the configs for each run on the schedules. It would be great to see the
    run_config
    passed to the job when inspecting the schedule. Currently we’d have to go into one of the runs and check the tags and config, and if the schedule never ran, we don’t have a way to review the config (or I’m missing it). Is it something you’ve thought about in the past?
    c
    d
    • 3
    • 7
  • m

    Mark Fickett

    06/14/2022, 4:49 PM
    In a straight-line portion of a DAG, could Dagster collapse multiple ops into one execution step? I'm thinking about some cases in my graphs where things are separate ops mostly just for cleanliness (and because ops are so nice and easy to declare and wire together), but it incurs some overhead. Would it be what most people want / easy to implement in the framework / a worthwhile performance improvement if adjacent ops merged at execution time?
    c
    s
    +2
    • 5
    • 6
  • j

    Jakub Zgrzebnicki

    06/15/2022, 5:52 AM
    Hi Guys, I'm just going through documentation and in https://docs.dagster.io/concepts/ops-jobs-graphs/op-hooks#patterns It seems that notif_all_prod and notif_all_dev should be switched. Am I correct or I don't get the point?
    r
    • 2
    • 4
  • j

    Jordan Wolinsky

    06/16/2022, 2:46 PM
    That guide on when to use SDAs vs ops & graphs is super useful! thank you for that 😄 :daggy-love:😛artydagster:
    :daggy-love: 1
    👍 3
    s
    s
    • 3
    • 5
Powered by Linen
Title
j

Jordan Wolinsky

06/16/2022, 2:46 PM
That guide on when to use SDAs vs ops & graphs is super useful! thank you for that 😄 :daggy-love:😛artydagster:
:daggy-love: 1
👍 3
s

Sanidhya Singh

06/16/2022, 2:47 PM
Could you link it?
j

Jordan Wolinsky

06/16/2022, 2:48 PM
https://docs.dagster.io/guides/dagster/enriching-with-software-defined-assets
👍 1
from #announcements
👍 1
s

sandy

06/16/2022, 3:02 PM
Thanks Jordan! Let me know if there are any places where it could be clearer
j

Jordan Wolinsky

06/16/2022, 3:03 PM
Will do! The chart with the scenario, yes or no, and explanation was def the most helpful
View count: 1