https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
announcements
  • u

    user

    09/10/2020, 11:40 PM
    catherinewu just published a new version: 0.9.6.
  • c

    cat

    09/10/2020, 11:55 PM
    0_9_6_Changelog
    🎉 3
  • k

    Kevin

    09/11/2020, 4:55 PM
    general question: is there any appetite/future plans to be able to have solids defined within .ipynb files? so one .ipynb could contain multiple solids rather than the entire notebook being defined as a single solid?
    s
    • 2
    • 3
  • s

    sashank

    09/11/2020, 5:00 PM
    Hey all, we’re opening up a new place on Github for our community to interact and ask questions. Check it out here: https://github.com/dagster-io/dagster/discussions/ Discussions is a great place to: • Ask questions about Dagster • Request features from our team • Share ideas and projects you are working on We’ve historically used Slack as the place to ask questions, but the new default place to ask questions is on Discussions. A lot of the questions asked here in #general are very helpful to the wider community, but get lost in Slack’s history. We hope Discussions becomes a general knowledge base for Dagster in the near future. This Slack will still be very actively used for making announcements, organizing community meetings, having real time conversations, and providing private help to individuals and teams. We encourage everyone to still use this workspace as needed.
    :docster: 6
  • s

    sashank

    09/11/2020, 5:02 PM
    You can take a look at our first two proposals from the team here: • https://github.com/dagster-io/dagster/discussions/2904 • https://github.com/dagster-io/dagster/discussions/2902 We’d love to hear your feedback on these ideas
  • s

    schrockn

    09/11/2020, 5:02 PM
    set the channel topic: https://github.com/dagster-io/dagster/discussions/ for questions please 🙏🏻🙏🏻🙏🏻
    :thumbsup_all: 1
  • s

    schrockn

    09/11/2020, 5:03 PM
    Yeah just wanted to call out https://github.com/dagster-io/dagster/discussions/2902 specifically. A bit of a doozy that will affect everyone so looking for feedback. tl;dr merge
    @pipeline
    and
    @composite_solid
    into a single abstraction:
    @graph
    y
    a
    h
    • 4
    • 4
  • t

    Tobias Macey

    09/11/2020, 8:12 PM
    So, I have a rough understanding of how to approach this, but given my repository at github.com/mitodl/ol-data-pipelines, and the edx pipeline defined there, I want to set up a preset definition where I can drop some YAML data on disk with my config management system to populate the secrets for two different instances of that pipeline targeting different source systems. Does someone have a quick code snippet for how to do that? If it's more than 5 minutes to do, then don't bother as I can figure it out and I'm just being lazy.
    m
    • 2
    • 4
  • c

    cat

    09/11/2020, 11:46 PM
    0_9_7
    👍 1
    😛artydagster: 1
  • u

    user

    09/11/2020, 11:48 PM
    catherinewu just published a new version: 0.9.7.
  • w

    Wali

    09/12/2020, 2:00 AM
    Hey folks, I'm new to dagster. I believe dagster can show table or node level data lineage graph. Is it possible to show column level lineage using dagster?
    s
    • 2
    • 2
  • s

    Slackbot

    09/14/2020, 2:36 PM
    This message was deleted.
    s
    • 2
    • 2
  • s

    Sachit Shivam

    09/14/2020, 2:50 PM
    Hey folks! Is it possible to version your pipelines? Also, is there a way to integrate with GitHub so that pipeline code gets updated when new commits are pushed upstream?
    s
    • 2
    • 6
  • t

    Tobias Macey

    09/14/2020, 10:54 PM
    One thing I just noticed is that, in a deployed instance there's a link in Dagit which displays the full contents of the instance file, which includes the database credentials. Is there any work being done to restrict access to that, or elide secret values?
    a
    • 2
    • 2
  • s

    Sachit Shivam

    09/15/2020, 11:11 AM
    Hi folks, could someone please guide me how to set Dagster up so that dagit / dagster, RabbitMQ and Celery workers are all running remotely (i.e.: on separate EC2 instances)? I have dagit / dagster running on one instance, RabbitMQ on one and multiple instances for Celery workers (need to be able to add more based on workload scaling) I can't seem to figure out how to provide the correct settings / configuration to dagster Also, will some special setup be required for Celery? What app name will have to be passed to it via the CLI?
    m
    • 2
    • 10
  • u

    user

    09/15/2020, 10:48 PM
    yuhan just published a new version: 0.9.8.
  • y

    yuhan

    09/15/2020, 11:00 PM
    0_9_8
    🔥 5
  • r

    Richard Fisher

    09/16/2020, 8:14 AM
    Hi all, I’m struggling a little with importing custom modules in my
    example_pipeline.py
    file (I’m using
    import src.load_and_store
    in the
    example_pipeline.py
    file (which is in the same directory as
    src
    ). This works correctly when running the pipeline using the Python API (through
    execute_pipeline
    in _`__name__ == "__main__"`_ in
    example_pipeline.py
    ) and when running it through
    dagit
    with a
    workspace.yaml
    file:
    load_from:
      - python_file: example_pipeline.py
    However, when I try run
    dagster pipeline execute -f example_pipeline.py
    (from the same directory) I receive the error
    ModuleNotFoundError: No module named 'src'
    . I’m sure this is a fairly basic import error, but could someone please provide a solution/best practice for this?
    a
    d
    • 3
    • 4
  • r

    Rizky Eko Putra

    09/16/2020, 8:23 AM
    Hi all, I am trying to install dagster and dagit on a GCP VM, run the
    dagit -f hello_world.py
    on it and it say
    Serving on <http://127.0.0.1:3000>
    how can I access that localhost:3000 via external IP gcp VM? any other config should I set?
  • b

    Ben Sully

    09/16/2020, 8:44 AM
    @Rizky Eko Putra you could either forward port 3000 via SSH (
    ssh -L localhost:3000:localhost:3000 <GCP IP>
    I think, I forget the exact magic commands), or pass
    -h 0.0.0.0
    to your dagit command (which make it accessible to everyone, if your GCP VM is open)
    r
    • 2
    • 1
  • s

    Sergii Ivakhno

    09/16/2020, 9:21 AM
    Hello all! I am new to dagster (but am trying to become sufficiently proficient with it) and we are trying to use it at the company to create a data workflow pipeline infrastructure. We already use MLflow but find these two tools largely complementary. When running my first test pipeline I get an error
    Error 1: Missing required field "storage" at the root. Available Fields: "['execution', 'intermediate_storage', 'loggers', 'resources', 'solids', 'storage']".
    I haven't seen
    storage
    option in any docs so far. Could you kindly point out me to the resource on how to set this up. We are launching via Python API rather than CLI so I presume it will be some kind of
    @storage
    decorator? Thanks in advance!
    j
    a
    • 3
    • 3
  • g

    Georg Bauerfeind

    09/17/2020, 3:34 PM
    Hi everyone, my goal is to use dagster for automating data jobs in my company. The problem is when deplyoing it to my server and starting the dagit the web interface becomes available for anyone here. Since not everyone should have access to the Web UI I am asking how to restrict access to that? Also there is no obvious Webserver like nginx or apache present in the dagster project. I do not know what to look for. Does anyone have had similar issues or an answer to that?
    t
    a
    t
    • 4
    • 4
  • u

    user

    09/17/2020, 9:28 PM
    Bob Chen just published a new version: 0.9.9.
    :celebrate: 6
  • b

    bob

    09/17/2020, 9:43 PM
    0_9_9
  • m

    Manas Jain

    09/18/2020, 8:04 AM
    Hi Team, I need to pass a dictionary throughout the solids of a pipeline to capture certain parameters in each solid... is there a better way of doing it apart from using output definitions to output it from a solid and again passing it as input to the next one?
    s
    a
    • 3
    • 8
  • s

    Sergii Ivakhno

    09/18/2020, 9:06 AM
    Hello all - a quick questions regarding presets that I could not fully clarify from docs. What is the difference (or interplay) between
    run_config
    defined in presets https://docs.dagster.io/tutorial/advanced_pipelines#pipeline-config-presets and
    execute_pipeline
    ? When I specify
    intermediate_storage
    it works when put in
    run_config
    within
    execute_pipeline
    but not when I create
    PresetDefinition
    object that is then passed to
    preset_defs
    . Thanks for advice in advance!
    s
    • 2
    • 1
  • j

    Jaakko Kangasharju

    09/18/2020, 12:26 PM
    Hi everyone. I'm trying to use a repository on a gRPC server following the instructions at https://docs.dagster.io/overview/repositories-workspaces/workspaces . I have one docker image for dagit, and another one for the repository, with workspace.yaml and the repository image entrypoint like on that web page. Dagit sees the repository and lists all my pipelines and schedules, but when I try to execute a pipeline, the dagit UI just keeps spinning. There are no logs from dagit, but the repository logs the following:
    Process SpawnProcess-1:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
        self.run()
      File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.7/site-packages/dagster/grpc/impl.py", line 184, in start_run_in_subprocess
        run_event_handler=lambda x: None,
      File "/usr/local/lib/python3.7/site-packages/dagster/grpc/impl.py", line 133, in _run_in_subprocess
        EngineEventData.in_process(pid, marker_end="cli_api_subprocess_init"),
      File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 943, in report_engine_event
        check.inst_param(pipeline_run, "pipeline_run", PipelineRun)
      File "/usr/local/lib/python3.7/site-packages/dagster/check/__init__.py", line 183, in inst_param
        obj, ttype, param_name, additional_message=additional_message
      File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 446, in raise_with_traceback
        raise exc.with_traceback(traceback)
    dagster.check.ParameterCheckError: Param "pipeline_run" is not a PipelineRun. Got None which is type <class 'NoneType'>.
    I can't find anything about this error message. Has anyone seen this? Any ideas what I could do to fix it?
    d
    • 2
    • 6
  • k

    Kevin

    09/18/2020, 6:39 PM
    question about dagstermill with passing results-- i've been following the examples provided but I seem to have run into an issue when passing my results from one dagstermill solid to another. This is the typical error received:
    dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline LDA_pipeline, received invalid type <class 'str'> for input "base_path" (passed by keyword) in solid invocation "ETL". Must pass the output from previous solid invocations or inputs to the composition function as inputs when invoking solids during composition.
    just trying to create some pipelines with dagstermill attached is my .py which defines my solids -- each of the notebooks does execute:
    dagstermill.yield_result(...)
    dagstermill_nb_pipeline.py
    m
    • 2
    • 58
  • n

    Nikhil Raut

    09/19/2020, 12:01 PM
    Documentation for setting up repositories as python modules / multi location repositories is not quite clear and I think there is scope for improvements.
    👍 2
  • n

    Nikhil Raut

    09/21/2020, 6:33 AM
    Hi, Your own example of ‘airline-demo’ does not work as python module installed in virtualenv. PFA ‘airline-demo’ module exist in remote venv but dagit tells it does not exist.
    s
    • 2
    • 3
Powered by Linen
Title
n

Nikhil Raut

09/21/2020, 6:33 AM
Hi, Your own example of ‘airline-demo’ does not work as python module installed in virtualenv. PFA ‘airline-demo’ module exist in remote venv but dagit tells it does not exist.
s

Sergii Ivakhno

09/21/2020, 11:31 AM
I am not sure if this is a similar cause but I have problem dynamically importing dagster in virtualenv. Essentially I have a module dagtsrer.py that contains call
importlib.import_module(pipeline)
, where pipeline is a path to dagster pipeline, i.e.
blah/pipeline.py
. When I run
poetry run python  dagtsrer.py --path blah/pipeline.py
I get an error
import dagster_pandas as dagster_pd
  File "/Users/apple/Library/Caches/pypoetry/virtualenvs/pwmf-qUT5bdGW-py3.7/lib/python3.7/site-packages/dagster_pandas/__init__.py", line 1, in <module>
    from dagster.core.utils import check_dagster_package_version
ModuleNotFoundError: No module named 'dagster.core'; 'dagster' is not a package
This does not happen when I run
importlib.import_module(pipeline)
from poetry shell. Other third-party packages also seem to import dynamically with no problems. Any suggestions would be most appreciated - thanks in advance!
My fault -
dagtsrer.py
masked the
dagtsrer
package during import 😱
View count: 5