https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • p

    Paolo Deregibus

    08/31/2021, 10:43 AM
    Hi guys, I am using dagster daemon on a network behind a proxy, both http_proxy and https_proxy env variable are set. When I start the dagster daemon everything runs as expected: I have a sensor that fires when needed, but the command line logs an error every second. The error is: E0831 12:37:58.808000000 151232 src/core/ext/filters/client_channel/http_proxy.cc:82] 'https' scheme not supported in proxy URI . Is there a way to avoid it or at least to avoid it to be printed? Thanks for your help.
    m
    • 2
    • 4
  • l

    Lucas Ávila

    08/31/2021, 12:35 PM
    Hi team, what would be the best way to add a timeout to a solid? I'm using the default RunLauncher, QueuedRunCoordinator, and the default IO Manager. I'm kinda lost on how the solid exists (it's a thread, it's a process etc) and IDK if I can use Unix Signals or something...
    ➕ 1
    b
    • 2
    • 18
  • u

    Utkarsh

    08/31/2021, 1:30 PM
    Hey team, Any plans to release pyspark step launcher for Kubernetes?
    s
    • 2
    • 1
  • s

    Sundara Moorthy

    08/31/2021, 3:58 PM
    Hi Team, is there any way for submitting spark jobs to the remote cluster using dagster_shell.create_shell_command_solid ? @sandy @alex
    a
    s
    • 3
    • 3
  • c

    chrispc

    08/31/2021, 4:18 PM
    Hi team, I am traying to generate an asset with a sample of data. I am using EventMetadata.json but the data has timestamp values and I am getting
    TypeError: Object of type Timestamp is not JSON serializable or TypeError: Object of type date is not JSON serializable
    . I know that json.dump() has the parameter default (
    default
     is a function applied to objects that aren't serializable. In this case it's 
    str
    , so it just converts everything it doesn't know to strings) but I can't use it. If I use json.dump before I got the data in the assets catalog ugly
    y
    • 2
    • 5
  • j

    Jake Smart

    08/31/2021, 10:50 PM
    Is there any way to pass either the whole context object or the context logger into a resource?
    c
    • 2
    • 3
  • c

    chrispc

    08/31/2021, 11:01 PM
    Hi team I found a funny bug in Dagit.
    s
    d
    • 3
    • 3
  • c

    chrispc

    08/31/2021, 11:52 PM
    Hi team. I am trying to display a sample data in the asset catalog. But if the data has a larger number of columns, the web page turn into a white blackboard. Any advice is welcome and appreciated. I would like to know if there is a possibility to download from the GUI this sample data as csv file? thanks in advance.
    • 1
    • 1
  • s

    Sandeep Aggarwal

    09/01/2021, 4:06 AM
    Re-routing it again to see what others think about it and understand how people are dealing with dependency resolution of heterogeneous types (apart from fan-in approach).
  • s

    Slackbot

    09/01/2021, 6:46 AM
    This message was deleted.
    n
    m
    • 3
    • 2
  • j

    Julian Ganguli

    09/01/2021, 1:17 PM
    Hi, I'm currently in the research stage of for a PoC that uses Dagster to kick off dbt cloud jobs after completed Fivetran syncs to Snowflake. I've come across documentation to achieve this with Airflow but not a whole lot with Dagster. Has anyone done this yet? Any pointers or reference docs for this particular use case would be :awesome: !
    s
    • 2
    • 1
  • p

    Peter Bower

    09/02/2021, 12:38 AM
    Hi there, am I right in understanding that SQL Server is a valid IO for e.g. custom IO manager using connection string, but that for Dagster logs and setup for the tool itself is the Postgres, Mysql options etc. per the documentation? Or, can I configure Dagster to send the logs to SQL Server? Thanks
    a
    • 2
    • 2
  • j

    Jay Sharma

    09/02/2021, 4:09 AM
    Hi all, Is there a way to run multiple pipelines in parallel from the Dagit UI? For example: I have about 10 pipelines that I want to run, through the Dagit UI, I can run pipelines one at a time, is there a way to run them all at once? Thanks.
    s
    a
    • 3
    • 4
  • p

    Peter Bower

    09/02/2021, 4:13 AM
    Hi, sorry another question, does the postgres container in the example https://docs.dagster.io/deployment/guides/docker and https://github.com/dagster-io/dagster/tree/0.12.8/examples/deploy_docker database data persist after restart?
    d
    • 2
    • 1
  • n

    Nilesh Pandey

    09/02/2021, 5:23 AM
    Hi, I have a use case, where I want to run a pyspark solid as a databricks job. I went through dagster docs and came across
    databricks_pyspark_step_launcher
    link :- https://docs.dagster.io/_apidocs/libraries/dagster-databricks#dagster_databricks.databricks_pyspark_step_launcher According to the documentation, launcher will help me in satisfying the use case, but I didn't found any example of how to use this step launcher. I did found one for EMR step launcher. I just need an example of Databricks one
    a
    • 2
    • 2
  • n

    Nilesh Pandey

    09/02/2021, 10:33 AM
    Hi, We are using Dagster to orchestrate pipelines. Our dasgter is deployed on Kubernetes cluster. When 10 people are opening the Dagit UI and executing the pipelines simultaneously we are getting below error as shown in the image :-
    d
    • 2
    • 10
  • s

    sourabh upadhye

    09/02/2021, 10:48 AM
    Hi Dagster Team, We are trying to execute a query via graphqlclient in Python which requires us to format the string without "Double Quotes" for the "KEYS" in the query For EX :
    queryArgs = f"""{{
                selector:{{
                    repositoryName: "{repo_name}"
                    repositoryLocationName: "{repo_loc}"
                    pipelineName:"{pipeline_name}"}}
                runConfigData:{run_config}
                mode:"{mode_def}"
                }}"""
    pipeline_execute = f"""mutation {{
                            launchPipelineExecution(executionParams:{queryArgs})
                            {{
                                __typename
                                ... on PipelineConfigValidationInvalid{{
                                    errors{{
                                        message
                                    }}
                                }}
    
                            }}
                            }}"""
    print(pipeline_execute)
    resp = json.loads(client.execute(pipeline_execute))
  • s

    sourabh upadhye

    09/02/2021, 10:48 AM
    run_config = f"""{{
        num: 10,
        dict: {{hello:"Whats up"}},
        string: "Hello"}}"""
    Basically can we allow double quotes around the keys. Currently if the keys have double quotes the query fails. Our use case, we are programmatically executing pipelines. We receive the run_config from the network. On deserializing the dictionary, we have double quotes around the dictionary keys which is rejected by dagster graphql. We could get nested run configs and hence would prefer graphql server to accept the run_config keys with double quotes, rather than we try to remove them across the nested config object.
    a
    j
    • 3
    • 4
  • m

    Matthias Queitsch

    09/02/2021, 10:59 AM
    Hi, I have some questions regarding schedules and modes. If I deploy a pipeline with schedules, the schedule is by default off. How can I configure that to be on by default?. Second question is regarding modes. I know how to configure them and how to choose the modes in the dagit frontend. We have to instances of Dagster running in different AWS Accounts (staging and prod), and I want to run scheduled pipelines in both accounts. Can I tell the scheduler to use a specific mode? I wrote a resource class which can figure out the account id.
    d
    • 2
    • 5
  • n

    Nathan Saccon

    09/02/2021, 12:52 PM
    Quick question: Is there a way to remove an existing run key from a sensor?
    d
    • 2
    • 2
  • a

    Alejandro A

    09/03/2021, 1:16 AM
    @solid
    def hello():
        return 1
    
    
    @solid(
        input_defs=[
            InputDefinition("name", str),
            InputDefinition("age", int),
        ]
    )
    def hello2(hello_: int, name: str, age: int):
        print(f"Hello {name}, {age+hello_}")
    
    
    @pipeline
    def pipe_2():
        x = hello()
        y = hello2(x, "Marcus", 20)
    
    
    @repository
    def deploy_docker_repository():
        return [pipe_2, my_schedule]
  • a

    Alejandro A

    09/03/2021, 1:18 AM
    dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline pipe_2, received invalid type <class 'str'> for input "name" (at position 1) in solid invocation "hello2". Must pass the output from previous solid invocations or inputs to the composition function as inputs when invoking solids during composition.
    d
    d
    p
    • 4
    • 5
  • d

    Drew Sonne

    09/03/2021, 11:46 AM
    I'm trying to solve a GRPC issue. I'm getting the following error:
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1630669409.194800714","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3008,"referenced_errors":[{"created":"@1630669409.194797596","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}" >
    I'm trying to debug it myself, and using grpcurl as
    grpcurl -plaintext <host>:4000 list
    and getting the following response:
    Failed to list services: server does not support the reflection API
    . To help me debug this, what are the symbols exposed on the grpc endpoint I could use to test it's functioning correctly?
    n
    j
    • 3
    • 10
  • r

    Remi Gabillet

    09/03/2021, 2:30 PM
    👋 everyone. I'm very new to Dagster as I'm trying to convert an existing script to a Dagster pipeline for the first time. This is not working: • I have a solid returning a dict • I have a downstream solid attempting to iterate through the dict • =>
    Error loading repository location glue.py:dagster.core.errors.DagsterInvariantViolationError: Attempted to iterate over an InvokedSolidOutputHandle. This object represents the output "result" from the solid "load_partitions". Consider yielding multiple Outputs if you seek to pass different parts of this output to different solids.
    d
    c
    p
    • 4
    • 17
  • d

    Darren Ng

    09/03/2021, 2:35 PM
    Very new to dagster here. I've been trying to read a row of params from google sheet, using that to query company's database, then send some metrics back to write in the same sheet but different cell.
    @solid(
        description="Read from google sheet for data pipeline.",
        output_defs=[
            OutputDefinition(name="row_data"),
            OutputDefinition(name="gspread_object")
            ]
        )
    def read_from_gsheet() -> List:
        print("Instantiate gspread sheet object and read a row..."
        row = sheet.row_values(2)
        yield Output(row, output_name="row_data")
        yield Output(sheet, output_name="gspread_object")
    
    @solid(description="Write to cells in Gsheet")
    def write_to_gsheet(sheet, cell: str, value: int):
        sheet.update(cell, value)
    
    @solid(description="Read from google sheet and query data from NLP")
    def query_NLP_data(context, row):
        received_row = row
        print("Sending request to query NLP graph and return metrics")
        some_bol_metrics = 1000
        <http://context.log.info|context.log.info>(f"Returned NLP metrics: {some_bol_metrics}")
        return some_bol_metrics
    
    @pipeline
    def gather_pipeline():
        row, sheet = read_from_gsheet()
        some_bol_metrics = query_NLP_data(row)
        write_to_gsheet(sheet, 'H2', some_bol_metrics)
    In this example here I'm reading a row from a gspread sheet object, sending that to another solid which should query company db and return some metrics (here 1000) then pass that into another solid which writes to the same sheet object and write into 'H2'. But I'm getting
    /home/dazza/anaconda3/envs/versedai/lib/python3.8/site-packages/dagster/core/definitions/composition.py:65: UserWarning: While in @pipeline context 'gather_pipeline', received an uninvoked solid 'write_to_gsheet'.
      warnings.warn(warning_message.strip())
    /home/dazza/anaconda3/envs/versedai/lib/python3.8/site-packages/dagster/core/workspace/context.py:504: UserWarning: Error loading repository location gather_pipeline.py:dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline gather_pipeline, received invalid type <class 'str'> for input "cell" (at position 1) in solid invocation "write_to_gsheet". Must pass the output from previous solid invocations or inputs to the composition function as inputs when invoking solids during composition.
    Any help or suggestions will be appreciated.
    m
    a
    • 3
    • 6
  • d

    doom leika

    09/03/2021, 2:46 PM
    How do I have multiple
    gRPC
    repository instances that serves same code
  • d

    doom leika

    09/03/2021, 2:47 PM
    assuming I want a repository runs on many different machines
    d
    • 2
    • 5
  • b

    Brien Gleason

    09/03/2021, 3:32 PM
    Hi team - I have a solid that is dependent on the execution of multiple upstream solids that each return
    Nothing
    . I am having issues calling a solid in the pipeline, despite defining the InputDefinition for each upstream dependency. I get this error:
    dagster.core.errors.DagsterInvalidDefinitionError: In @pipeline sample_example_pipeline, could not resolve input based on position at index 0 for invocation sample_example_table. Use keyword args instead, available inputs are: ['input_start1', 'input_start2']
    Has anyone defined a solid that has multiple
    Nothing
    inputs and could provide an example?
    d
    v
    k
    • 4
    • 7
  • v

    Vlad Dumitrascu

    09/03/2021, 10:24 PM
    Hi Dagster ninjas! I have a question: I recently deleted my
    dagster
    and
    dagit
    modules and pip installed
    dagster
    and
    dagit
    again. Last time, the
    dagit.exe
    and other executables were put in the
    bin
    folder of my
    --target=
    directories (where I installed the modules). This time, there's no
    dagit.exe
    executable. Where the hell do I find this thing? It's not in the documentation. I figured it out before, but I forgot where it was. The documentation only says to run 'dagit' but it's clearly not on my system path, so it can't find it. Where is it (by default)? This should be written better in the Docs, because it's a deal-breaker...
    m
    s
    • 3
    • 37
  • d

    doom leika

    09/03/2021, 11:59 PM
    How do I manages the sensor keys, sometimes I would like to change sensor keys in bulk because my upstream sensor code have changed to different key scheme.
    j
    • 2
    • 2
Powered by Linen
Title
d

doom leika

09/03/2021, 11:59 PM
How do I manages the sensor keys, sometimes I would like to change sensor keys in bulk because my upstream sensor code have changed to different key scheme.
j

johann

09/08/2021, 6:52 PM
Hi @doom leika, this isn’t really supported at the moment. A github issue would be great
d

doom leika

09/09/2021, 2:43 PM
bummer, thanks for the info
View count: 1