https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
announcements
  • s

    Sterling Paramore

    06/06/2019, 4:18 PM
    Was thinking about the “intermediates” question asked earlier this week. Any thoughts on how or if it would be able to support intermediates for ELT-style workflows? Would the intermediate artifacts have to be stored in the database, or downloaded to S3? Seems like a really hairy problem
    s
    t
    • 3
    • 10
  • d

    dwall

    06/06/2019, 4:39 PM
    would be cool to set one of these up for a fake dagster deployment: https://github.blog/2019-06-06-generate-new-repositories-with-repository-templates/
    👍 1
    s
    t
    • 3
    • 4
  • s

    schrockn

    06/06/2019, 7:11 PM
    Update (07/03): We just pushed out 0.5.0! Feel free to poke around. We’re very excited about this new core. We are a US-based team so we are taking a few days off, but feel free to play around. You’ll be hearing a lot more from us next week. 🙏🏻🙏🏻🙏🏻 Hey everyone, we wanted to give you an update on the current state of the project as of 6/6/19. Right we are on version
    0.4.3.post4
    . We are working towards a new release,
    0.5.0
    , planned for July 3rd. This release will include many new features, bug fixes, and improvements. Given that the release will include some breaking changes in the API as well as some fundamental new concepts, we want to be fully transparent so that there are no surprises. Here is a (not exhaustive) list of changes that will be released: • Substantial improvements to the documentation, including topic-based guides, such as testability, integration with Airflow, Dagster for data science, and others. • A new fundamental unit of composition, similar to Airflow subdags, but far more first-class. Solids can themselves be graphs of solids, which can then compose/recurse to arbitrary subgraphs, along with Dagit support for navigating that. • A new more literate python DSL for building a dependency graph, as a layer above the dictionary-based approach currently provided. • Optional Python 3 type annotations for building the signatures of solids, to reduce verbosity. • Support for execution on Dask and local multiprocessing in addition to local single process and Airflow. • Fan-in dependencies for consolidating that allows a single input to depend on many outputs. • [breaking] We are replacing the notion of context definitions with mode definitions, which we believe will make this part of the system far more approachable and opt-in. • breaking] Renaming some definition inputs for greater consistency and understandability. We are really excited for these new concepts, and we believe they will provide a stable, compelling base of features for the medium term. This will still be an evolving system, but we will be very deliberate and strive for backwards compatibility whenever possible. Thanks so much for using and experimenting with Dagster!
    🤛 2
    🎉 11
    😍 4
    🙌 4
    🤜 2
    :dagster: 8
    d
    • 2
    • 7
  • f

    fred

    06/18/2019, 9:47 AM
    Sup guys! Even though you're probably hard at work with 0.5, I thought I'd ask for some quick technical with Dagit. Check the error message in the snippet. Any idea of what's going on?
    Untitled.txt
    j
    s
    • 3
    • 5
  • u

    user

    07/04/2019, 3:19 AM
    alangenfeld just published a new version: 0.5.0.
    :dagster: 6
    🇺🇸 4
    🙌 3
  • j

    jack

    07/08/2019, 1:43 PM
    Hey folks, I had a play recently with dagster and really like what you're doing - it definitely fills a gap in the python ecosystem! I have a question regarding your plan for expectations though. Currently it's quite hard to tell if an expectation fails (i.e. it logs at level INFO and in dagit you have to click on the the little arrow on the
    <solid>.compute
    box on the right of the runs window before you can see that one of the expectations failed for that solid). Do you have any plans to make, say, an aggregated report of failed expectations? Even more than this, we have a computation graph where if an expectation fails then we don't want to propagate the results of that solid downstream because they might just be bogus. The risk here is that we end up automatically showing results to the user which are incorrect, and we'd obviously prefer to avoid this! What is you recommendation for this situation? Are expectations the right way to go? Or should we just be raising
    Failure
    instead? One thing I tried was to apply a decorator to the method before the
    @solid
    decorator, which "listens" for
    ExpectationResult
    events and does a
    raise Failure
    if it sees one which failed. Have you thought about putting something which signals to the runner to raise on failed expectations in the
    ExpectationResult
    class?
  • a

    alex

    07/08/2019, 3:23 PM
    Glad you like it!
    Do you have any plans to make, say, an aggregated report of failed expectations?
    We are currently working on rendering the dagster events in a structured and much richer way in the log viewer. This combined with better filtering options should approximate this. Good feedback and something we'll keep in mind.
    What is you recommendation for this situation?
    Great question, expectations are something we are super excited about and are actively iterating on quite a bit. I think if you always want these checks to cause the computation to cease - then raising
    Failure
    might be the right choice.
    Failure
    can report the same metadata that
    ExpectationResult
    can. If you want to be able to vary whether the computations continue or fail - then dagster having a way to toggle whether
    ExpectationResult
    failures cause the step to fail makes a lot of sense (we would need to add this). In the short term I advise writing a little helper function that can do either ie
    yield expectation_check(check_result, metadata)
    so you only have one place to change as you continue experimenting.
    j
    • 2
    • 1
  • u

    user

    07/08/2019, 11:14 PM
    Max Gasner just published a new version: 0.5.1.
    🎉 4
  • t

    Taylor

    07/10/2019, 4:04 PM
    Dbt has a new release out that has an rpc server that could open up better dagster integration than shelling out to dbt in a
    Solid
    ... https://blog.fishtownanalytics.com/dbt-v0-14-0-better-serving-our-users-bf7cdbbcd5d2 https://docs.getdbt.com/docs/rpc
    s
    d
    • 3
    • 3
  • j

    Jay Sen

    07/10/2019, 6:11 PM
    Hi Guys,
  • j

    Jay Sen

    07/10/2019, 6:12 PM
    I am Jay, just came across this platform yesterday and I loved the idea, specially because I was trying to do the same over the airflow.
    👍 1
    s
    • 2
    • 3
  • j

    Jay Sen

    07/10/2019, 6:12 PM
    Thanks for sharing this !
  • j

    Jay Sen

    07/10/2019, 8:32 PM
    btw, I am facing some issue running the hello world example, which channel would be right to report that ?
    n
    a
    p
    • 4
    • 9
  • u

    user

    07/11/2019, 8:53 PM
    Max Gasner just published a new version: 0.5.2.
    :dagster: 3
  • m

    max

    07/11/2019, 8:55 PM
    Changes_in_0_5_2
  • p

    Pushkar

    07/11/2019, 10:21 PM
    Hey guys, great work on Dagster. This looks like a complete solution with a lot of potential. I was exploring the tool more and wanted to know if you have any working example of Dagstermill. Even a simple example will work.
  • a

    alex

    07/11/2019, 10:26 PM
    the airline demo in the examples folder uses dagstermill
  • a

    alex

    07/11/2019, 10:27 PM
    https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/airline_demo/solids.py
    p
    • 2
    • 3
  • a

    alex

    07/11/2019, 10:27 PM
    you can check it out locally by running
    dagit
    in the
    examples
    directory
  • m

    max

    07/11/2019, 10:30 PM
    Hi @Pushkar! There are also a bunch of (toy) examples in the dagstermill package itself
  • m

    max

    07/11/2019, 10:30 PM
    if you go to
    dagster/python_modules/dagstermill/dagstermill/examples
    and run dagit, you should be able to execute the pipelines from there
    p
    • 2
    • 18
  • v

    villebro

    07/12/2019, 6:03 PM
    Just heard about this a few days ago and feel this is very promising. I'm still very novice and still need to digest the docs more, but one thing I find unclear is how does Dagster address unit/integration testing of complex transformations that need to be scaled horizontally due to size with e.g. Spark when run in production? Are the unit tests in practice integration tests, where one defines a very limited set of inputs (say 10-100), runs them through the full cluster and then asserts that those produce the expected outputs? If inputs and outputs get serialized and deserialized, I can see potential for serious resource consumption. Has there been any thought given to integrating this with eg. Apache Arrow to deal with these types of issues? Again I apologize if I'm asking questions with obvious answers.
    ❤️ 1
  • m

    max

    07/12/2019, 6:06 PM
    @villebro there are a couple of approaches here, but typically we would suggest unit testing your spark solids against a local cluster, with a small test data set. you might also want to run integration tests against a prod or staging cluster, again with small test data, but probably less frequently.
  • m

    max

    07/12/2019, 6:06 PM
    there are a couple of facilities in dagster that should make it easier to run tests like this -- our goal is that the solid code (the business logic of the spark job) shouldn't have to change in test
  • m

    max

    07/12/2019, 6:08 PM
    one is the resources system, which is a swappable way to expose external services to solid logic -- for instance, you could have a resource that gave access to a prod database in prod, and to a sqlite database in dev
  • m

    max

    07/12/2019, 6:08 PM
    https://dagster.readthedocs.io/en/0.5.2.post0/sections/learn/tutorial/resources.html
  • m

    max

    07/12/2019, 6:09 PM
    the other is the config system, which lets you parametrize solids at execution time https://dagster.readthedocs.io/en/0.5.2.post0/sections/learn/tutorial/config.html
  • v

    villebro

    07/12/2019, 6:29 PM
    Thanks for the clarification @max, very helpful. I think the examples will be very important in driving the point across to new users. I tried running the examples from master today but got errors, was it just me doing something wrong or are they still WIP or currently being refactored?
  • m

    max

    07/12/2019, 6:31 PM
    @villebro the examples on master should work, but be sure that you've also installed dagster, dagster_graphql, and dagit from master (not from pypi) -- there are unreleased API changes on master
  • m

    max

    07/12/2019, 6:32 PM
    from the root of the repository, you should be able to run
    make install_dev_python_modules
Powered by Linen
Title
m

max

07/12/2019, 6:32 PM
from the root of the repository, you should be able to run
make install_dev_python_modules
View count: 1