https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-feedback
  • r

    Ramnath Vaidyanathan

    12/05/2021, 7:13 PM
    I just started using Dagster, and am trying to grep the core ideas against airflow as my backdrop. One question I have is around the concept of multiple DAGs in Airflow and being able to connect tasks across DAGs. Is there a similar concept in Dagster, where I can define multiple jobs and have ops in a job depend on an op in another?
    q
    d
    +2
    • 5
    • 5
  • q

    Qwame

    12/07/2021, 5:14 PM
    How can I pass default values in the pipeline and not at runtime. I am using resources. I have defined resources
    resources:
      environs:
         config:
           year: 2021
    How can I tell dagster that when launching from the dagit UI, if no year config us passed, use 2020. In my Python file, I have the job defined as
    @job(resource_defs={"environs": make_values_resource(year=2020)}
    def pipeline_job():
       job_a(job_b)
    The idea is to pass the default value in the job definition but I am getting errors. However, when I do this, I don't get any errors:
    @job(resource_defs={"environs": make_values_resource(year=int)}
    def pipeline_job():
       job_a(job_b)
    Any help?
    p
    • 2
    • 11
  • b

    bitsofinfo

    12/08/2021, 9:00 PM
    Hi all - so i've gone through the all the current airflow, prefect and now dagster documentation as the first part of my evaluation for a large impl and honestly.... so far from what I'm seeing dagster is simply next level. The facilities around treating workflows as an application with proper typing, inputs/outputs, the iomanager abstraction, assets, events/metadata emitting etc.... wow. This has me excited. That said I'm going to have to start doing some prototyping on all 3 of these products.... one question i can see coming from above is: "well dagster hasn't been around as long as the others, smaller community, less mature, will this be around in 3 years or dead?..... etc etc" (in particular to prefect). Do you guys have any articles or comparisons w/ dagster/prefect? I can gleem the differences just reading the docs, but would be good to get your perspective. Also am interested in your cloud offerring/maturity pricing etc.
    :next-level-daggy: 6
    j
    d
    +2
    • 5
    • 8
  • d

    DK

    12/20/2021, 3:27 PM
    Looking through the migration guide, I'm still uneasy about where configs for "ops" are being defined.. why not define config for the op where the op is defined?
    @op(config_schema={"param": str},
        config={"param":"some_value"})
    def do_something(_):
    c
    • 2
    • 6
  • m

    medihack

    12/20/2021, 6:08 PM
    As I already posted in the support channel (but guess it better fits here) the
    @root_input_manager
    and
    @io_manager decorator
    IMHO should work the same way. Especially as
    RootInputManager
    and
    IOManager
    do behave so similar. At the moment
    @root_input_manager
    in contrast to
    @io_manager
    creates a new
    RootInputManager
    itself. That's why you can't simply write
    class DatabaseManager(RootInputManager, IOManager):
        def handle_output(self, context, obj):
            ...
    
        def load_input(self, context):
            ...
    
    @io_manager(required_resource_keys={"database_client"})
    def database_io_manager():
        return DatabaseManager()
    
    @root_input_manager(required_resource_keys={"database_client"})
    def database_root_manager():
         return DatabaseManager()
    but you have to write
    @root_input_manager(required_resource_keys={"database_client"})
    def database_root_manager(context: InputContext):
        manager = DatabaseManager()
        return manager.load_input(context)
    I know that it is not a big deal, but just want to leave some feedback while @root_input_manager is still experimental.
    s
    • 2
    • 13
  • j

    Jeremy Fisher

    12/20/2021, 8:16 PM
    Just wanted to randomly thank the devs for their awesome work. The support available on this slack is phenomenal and the product is great!
    🙏🏼 1
    ➕ 3
    :big-dag-eyes: 2
    :daggy-love: 5
    🙏 11
    :elementl: 4
    🙏🏻 1
    m
    • 2
    • 1
  • a

    Andrea Giardini

    12/22/2021, 2:06 PM
    Hey folks 👋 Can I get some feedback on https://github.com/dagster-io/dagster/pull/5567 ? I'm willing to fix the conflicts if the community is willing to merge it but until now i've received no comments
    w
    d
    • 3
    • 5
  • c

    Colin Sullivan

    01/02/2022, 5:42 PM
    Hello, I'm trying to put together a POC to show to my team about how we could imagine migrating exiting code bases to dagster, which looks so so promising for us. The trouble I'm having is thinking about the
    context
    argument for ops. It feels odd to use the context object in the body of an op function to access a value, especially since the body of the job function doesn't pass the context argument explicitly. I’d rather express those values as parameters to the op function and use the body of the job function to pass explicitly. This would make testing more straightforward to avoid needing to mock a context object and simplify migrating existing projects. Similarly, I feel like I'd prefer to write a schema for an entire Job, rather than on a per op basis. Then have job function take a context or run_config argument, that defines the necessary values to pass to each op in the body. Am I missing the magic that explains why I ought to resist the urge to structure a pipeline like this? Is it possible to access the run_config in the body of the job function? And can I define the schema for an entire job?
  • j

    James Miller

    01/04/2022, 2:12 PM
    Operation name: InstanceWarningQuery Message: Cannot query field "hasInfo" on type "Instance". Path: Locations: [{"line":14,"column":3}] Hello, new to dagster. Was asked by my dept to use dagster, dagit 0.12.9. In a conda env I would always get import errors for the dagster tutorial scripts (i.e. 'from dagster could not import job, op'). Switched to latest version of dagster 0.13.12 and was able to run the example code. Just to experiment, switched back to v 0.12.9 and received the above error. FYI, thanks!
    d
    • 2
    • 2
  • c

    Chris Chan

    01/04/2022, 7:00 PM
    I think at the last community meeting it was mentioned that some guidance was coming on how to organize code / files in Dagster projects (referred to as “recommended project structure”) - did I misunderstand that, or is it still forthcoming?
    c
    q
    a
    • 4
    • 8
  • a

    Alex Service

    01/04/2022, 9:56 PM
    Related to the discussion of recommended structure, I think it would make sense for
    dagster new-project
    to utilize
    pyproject.toml
    (from pep518). I may look into if I can open-source the project setup I have as an example of how that could be used in conjunction with env/package managers like
    poetry
    (I’d be happy to have a chat with any of the folks at Elementl if you’re interested in follow-up conversations 🙂 )
    👍 9
  • n

    Nick Dellosa

    01/11/2022, 2:29 PM
    1. Have you guys ever thought of adding a start and stop time to sensors, so you could have sensors that only run at a certain time of day i.e. a sensor that runs between 5-8PM MON-FRI or something like that. 2. Has there been any thought given to making resources, or maybe a special kind of resource, available to sensors. For example, the S3 file sensor shown in the documentation is pretty limited and feels like a workaround to just letting them use resources IMO.
    d
    • 2
    • 1
  • d

    Daniel Suissa

    01/13/2022, 2:43 PM
    A rundown of Dagster's job composition process would be super helpful 🙂 Both to be able to wrap it and tailor to our systems, but also to consider becoming contributors.
  • a

    Anatoly Laskaris

    01/17/2022, 9:26 AM
    I'm evaluation dagster for use in the company I work in. We are using Nomad (https://www.nomadproject.io/) to run workloads, I was wondering if it would be possible to execute dagster jobs in nomad? I can see there is an official integration with K8S and docker. Are there any plans for nNomad integration?
    m
    • 2
    • 2
  • m

    Matthias Queitsch

    01/18/2022, 9:19 AM
    Hi, we are using the AssetMaterialization more and more in our team. Since we are also thinking about introducing a data discovery tool / data metadata store (Link lyft's amundsen or LinkedIns DataHub, I was wondering if it is possible to ge the data out of the AssetCatalogue and ingest it into these systems?
    ➕ 3
    s
    • 2
    • 2
  • v

    VxD

    01/20/2022, 12:15 AM
    Hi Dagster team! We would like a way to retry all exceptions by default with a broad retry policy, except in a few specific instances where the code has identified it cannot continue. This used to be possible thanks to the
    Failure
    exception, but it got "fixed" to abide by the retry policy. Could we please get an
    Abort
    special exception that aborts the current pipeline?
    s
    a
    • 3
    • 4
  • h

    Huib Keemink

    01/21/2022, 12:15 PM
    Hi! It would be great if the service discovery for the dagster instances (the grpc servers that host user-code) could be done with kubernetes annotations somehow. This would allow an infrastructure team to (automatically) deploy a full dagster stack for a team (or set of teams), without having to accept a PR every time a new workspace (project) is needed by the teams
    ➕ 2
  • h

    Huib Keemink

    01/21/2022, 12:17 PM
    To me having this split of user code and the scheduling tooling itself is one of the biggest appeals of Dagster compared to AirFlow. With AirFlow you pretty much always end up with self-managed instances in the teams themselves, because it’s so easy to break the whole deployment if you’re not careful
  • h

    Huib Keemink

    01/21/2022, 12:19 PM
    Allowing for a centralized instance that teams can (decentrally) use for their projects would be amazing, and it would allow a setup similar to prometheus, where no change is needed to configure new endpoints
    d
    p
    • 3
    • 3
  • v

    VxD

    01/26/2022, 1:04 AM
    Hi there! Is there a way to configure a default execution config that would be used for all pipelines unless specified? It would be great to be able to trigger new pipelines from the launchpad in dagit without having to copy-paste the execution config (which, for the celery broker, contains the redis connection string, including its password, in clear).
    p
    • 2
    • 2
  • n

    Natalie Novitsky

    01/31/2022, 8:08 PM
    Hi! I'm on a team that uses both Dagster and Datadog, and recently worked on integrating Dagster with Datadog APM tracing to enable better visibility into op execution speeds (including calls to third party libraries). To make this work, I had to build out a custom multiprocess executor that inherits from your MultiprocessExecutor class but also handles tracer config, span initialization, and trace context inheritance across processes. We are mildly concerned about issues coming up if the Executor evolves -- are there any anticipated changes to Executor method definitions coming up? Additionally, this could be avoided if there was a way to share one instance of a resource across multiple processes when using the MultiprocessExecutor. Is this a possibility or something that's been considered?
    m
    • 2
    • 1
  • z

    Zach

    02/02/2022, 5:51 PM
    recently my team has been using dagster to orchestrate some large analytics runs, and we've noticed when a job run has thousands to tens of thousands of steps the UI performance takes a pretty hard hit when viewing the run. UI performance will continue to suffer after navigating away from the run (I assume step state is still being retrieved in the background). The performance is fine if the user avoids loading any runs with thousands of steps. This is using the a remote postgres storage backend on RDS, so I understand there's probably some unavoidable latency retrieving state for that many steps, just thought I'd mention it in case it's useful.
    m
    a
    s
    • 4
    • 3
  • g

    George Pearse

    02/03/2022, 1:39 PM
    Any integrations between dagster and a data catalogue (like Amundsen)?
    👀 1
    👏 1
    s
    • 2
    • 1
  • a

    Alessandro Marrella

    02/09/2022, 12:27 PM
    The Asset API looks amazing 👏 and I love the dagit view! Somehow related to my question in dagster support I wonder in general what is going to be a best practice in asset jobs when configuration or the ability of swapping out inputs is involved. Example use cases: • a generic machine learning job that accepts "arbitrary" inputs that are configurable directly from dagit and a config specifying which algorithm to use, etc. How would you create that with assets? • a re-run of a production pipeline that needs to use a previous snapshot of the data (in a regular job i would just use the op selection and manually add the input config)
    s
    • 2
    • 2
  • a

    Alex Service

    02/17/2022, 7:13 PM
    A little while back, I mentioned how I was using poetry & a multi-stage dockerfile for my pipeline project structure. I’m working on a blog post around it now 🙂
    :nice: 5
    d
    a
    • 3
    • 4
  • h

    Hugo Saavedra

    02/17/2022, 10:44 PM
    the changelog page seems to be 404ing currently https://docs.dagster.io/changelog (sorry if this isn't the right channel to point this out!)
    j
    • 2
    • 2
  • m

    Mike

    02/18/2022, 12:27 AM
    The code examples on https://docs.dagster.io/concepts/assets/software-defined-assets don't really show the business value of this new feature. A new way of calculating
    [1, 2, 3] + [4]
    why do I care?
    s
    • 2
    • 2
  • m

    Mike

    02/18/2022, 12:30 AM
    There is a whole lot of "what" and "how", but not a clear description/example of "why".
  • m

    Mike

    02/18/2022, 12:38 AM
    The companion blog post https://dagster.io/blog/rebundling-the-data-platform isn't much clearer on the why of software defined assets either, maybe it makes sense if you are already familiar with Airflow, Airbyte and dbt. But to me it reads mostly as inside baseball.
  • a

    Alex Service

    02/18/2022, 5:42 PM
    How about a dagster inspection & debugging tool called “consarnit” 😛
Powered by Linen
Title
a

Alex Service

02/18/2022, 5:42 PM
How about a dagster inspection & debugging tool called “consarnit” 😛
View count: 1