https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • x

    Xavier BALESI

    08/03/2022, 1:58 PM
    Hi all, has someone used assets and iomanagers with chunks ? for example reading a huge file by chunk, process chunk by chunk and write them one by one but with somthing like coroutines or generators through iomanager ?
    :dagster-bot-resolve: 1
    o
    • 2
    • 4
  • m

    Matt Menzenski

    08/03/2022, 2:28 PM
    I was expecting to see
    dagster new-project
    in the CLI reference but it’s not there. Is there another place in the documentation to find it? Specifically I was hoping to see whether I can pass a custom destination path (to run
    dagster new-project .
    or similar)
    :dagster-bot-resolve: 1
    o
    y
    • 3
    • 7
  • s

    Scott Hood

    08/03/2022, 2:33 PM
    Is it possible from Launchpad to select ops to skip during the manually triggered run?
    :dagster-bot-resolve: 1
    z
    • 2
    • 4
  • d

    Daniel Mosesson

    08/03/2022, 4:09 PM
    For writing dagster tests, is there a good way to do this without causing the package to have to be imported (or otherwise take a long time)? What we are seeing is that in order for dagster to "understand" the package, all the repos in the package have to get loaded at package import. This also means that when we want to run a test, the whole package has to get imported, which is slow in our case. Is there something we are missing that can either make package import fast, or not import the whole package for a test that only tests a given job?
    o
    b
    • 3
    • 11
  • a

    Alan

    08/03/2022, 5:06 PM
    Hi team, A coworker and I are running Dagster locally for scheduling purposes. Both point to the same Postgres instance. Postgres is configured for storage. I see the following error. dagster.daemon.SensorDaemon ERROR another SENSOR daemon is still sending heartbeats. Is there a way to tell Dagster to only run a scheduled job once. Or to check if another daemon has submitted a heartbeat and not show that error
    :dagster-bot-resolve: 1
    r
    • 2
    • 6
  • a

    Anoop Sharma

    08/03/2022, 5:19 PM
    How can I set up the executor config for all the jobs in the all the repositories? I have want to set up max concurrent op limit for all the jobs to 2. I have come across this yaml config:
    execution:
      config:
        multiprocess:
          max_concurrent: 2
    but where do I put this config. Does it go in dagster.yaml? PS: I have tried putting it in dagster.yaml but it doesn't seem to be working.
    :dagster-bot-resolve: 1
    o
    • 2
    • 2
  • a

    Amit Arie

    08/03/2022, 6:02 PM
    Hi all 🙂 Is there a way to extract the duration of specific op, via
    StepExecutionContext
    or
    HookContext
    ?
    :dagster-bot-resolve: 1
    m
    o
    • 3
    • 5
  • m

    Matt Menzenski

    08/03/2022, 6:45 PM
    Can a job get the Run ID of its own in-progress run?
    :dagster-bot-resolve: 1
    o
    • 2
    • 2
  • c

    Chris Hansen

    08/03/2022, 8:42 PM
    hey folks, i’m starting a new repo with
    dagster-io/dagster-cloud-branch-deployments-quickstart
    and i’m running into issues in the
    Deploy to Dagster Cloud
    step.
    │ /tmp/dagster-cloud-cli/dagster_cloud_cli/gql.py:578 in                       │
    │ create_or_update_branch_deployment                                           │
    │                                                                              │
    │   575 │                                                                      │
    │   576 │   name = result.get("data", ***).get("createOrUpdateBranchDeployment" │
    │   577 │   if name is None:                                                   │
    │ ❱ 578 │   │   raise Exception(f"Unable to create or update branch deployment │
    │   579 │                                                                      │
    │   580 │   return cast(str, name)                                             │
    │   581
    what is the
    name
    field that i’m missing here?
  • u

    Ugochukwu Onyeka

    08/03/2022, 9:41 PM
    Good day all. Please could anyone point me to a good book on Dagster or please help me with a link to download Dagster docs in a single PDF document
    o
    • 2
    • 1
  • s

    Saul Burgos

    08/03/2022, 10:13 PM
    if I follow the code of the screenshot everything works. but if I moved the "red part" mark to an ops in another file, give me this error: _"dagster.core.errors.DagsterInvalidInvocationError: Compute function of solid 'airbyte_sync' has context argument, but no context was provided when invoking."_ Is not possible to move only "airbyte_sync_op"? or am I doing something wrong?
    o
    • 2
    • 2
  • s

    Saul Burgos

    08/03/2022, 10:17 PM
  • s

    Sterling Paramore

    08/03/2022, 10:59 PM
    So we’ve got these little “New Run” tabs in the Launchpad. I can add new ones, but they are all “New Run”. Can they be renamed? I don’t see any obvious way to do that.
    :dagster-bot-resolve: 1
    d
    d
    • 3
    • 4
  • v

    Vinnie

    08/04/2022, 9:36 AM
    Hi all, trying to deploy dagster behind a multi-purpose load balancer that forwards to different services depending on the path (e.g. example.com/dagster, example.com/app2) makes it inaccessible, dagit logs a
    Warning: Invalid HTTP request
    whenever I try to open the website. Am I missing some more configuration or is it just not supported? Deploying it under dagster.example.com works fine. And in case it’s not supported, are you planning on supporting it in the future?
    a
    t
    • 3
    • 5
  • s

    Stefan Samba

    08/04/2022, 9:37 AM
    Hi all, I’m still new to Dagster and I have 2 questions to getting serial. Q1: From the docs I can see that for
    @op
    elements it’s easy to get serial (link). In my specific case I’m looking for: • getting serial when running .ipynb files For example:
    import dagstermill as dm
    from dagster import job
    
    download_data = dm.define_dagstermill_op(
        "download_data",
        notebook_path=("download_data.ipynb"),
        output_notebook_name="download_data_output",
    )
    
    prepare_data = dm.define_dagstermill_op(
        "prepare_data",
        notebook_path=("prepare_data.ipynb"),
        output_notebook_name="prepare_data_output",
    )
    
    
    @job(
        resource_defs={
            "output_notebook_io_manager": dm.local_output_notebook_io_manager,
        }
    )
    def dagster_main():
        download_data()
        prepare_data(download_data)
    This is small example for illustrational purposes. The last line will not work because prepare_data can’t take any arguments. It just for illustrational purposes to show that prepare data will depend on download_data. Would it be possible to make this serial in some way? Q2 For ops I can see it’s possible to get a dataflow going from one step to another. Would that dataflow be possible when working with ipynb files? I can imagine this is a challenge as a ipynb file is not returning a value. And ideas here?
    o
    c
    • 3
    • 8
  • i

    Indra

    08/04/2022, 10:07 AM
    Hi, does dagster has "warning" job status?
    :dagster-bot-resolve: 1
    o
    • 2
    • 4
  • b

    Ben

    08/04/2022, 12:55 PM
    Hi all,
  • b

    Ben

    08/04/2022, 12:57 PM
    I would like to set job name with space characters but there is a regex check in core/definitions/utils.py. Is there any technical reason to exclude the whitespace? I just modified it and did not notice any problem. Thanks in advance for your reply!
    :dagster-bot-resolve: 1
    o
    • 2
    • 3
  • t

    Tom Reilly

    08/04/2022, 3:03 PM
    If I have multiple software defined assets configured as a job using
    define_asset_job()
    is there a way to stop execution mid job and have the job still marked as a successful run? For example, if the asset job looked like
    get_new_files --> process_new_files
    and the
    get_new_files
    asset doesn't find new files I'd like the job to finish as successful without initiating any downstream assets (in this case
    process_new_files
    .
    :dagster-bot-resolve-to-issue: 1
    o
    • 2
    • 5
  • m

    Martin O'Leary

    08/04/2022, 6:09 PM
    Hey again folks 👋 I have 2 hopefully easy to answer questions. Can someone point me in the direction of documentation on: • How I can add metadata to track on DBT assets (e.g. associate a table location with the asset instead of a path - I would like to record maybe some high level stats on the table and see them with the asset materialization • How I can have the DBT json logging either a) go back to test format or b) pretty print in dagit? I think this change happened when I updated yesterday to 0.15.8 but not certain 😕
    :dagster-bot-resolve: 1
    o
    • 2
    • 3
  • c

    Chris Hansen

    08/04/2022, 7:12 PM
    does anyone have a simple example of a job that runs a bigquery query? i image
    bq_op_for_queries
    is what i want to use, but the docs lost me at
    Expects a BQ client to be provisioned in resources as context.resources.bigquery.
    :dagster-bot-resolve: 1
    o
    • 2
    • 59
  • m

    Maksym Domariev

    08/04/2022, 10:45 PM
    I think I'm facing a bug. Current sample returns this error :
    context.py:555: UserWarning: Error loading repository location hello_flow:dagster._core.errors.DagsterInvalidDefinitionError: @graph 'test_graph' returned problematic value of type <class 'dagster._core.definitions.composition.DynamicFanIn'>. Expected return value from invoked solid or dict mapping output name to return values from invoked solid
    if I remove return from graph I have this :
    dagster._check.CheckError: Invariant failed. Description: All leaf nodes within graph 'test_graph' must generate outputs which are mapped to outputs of the graph, and produce assets. The following leaf node(s) are non-asset producing ops: {'load_something'}. This behavior is not currently supported because these ops are not required for the creation of the associated asset(s).
    for obvious reasons. Not sure How to fix it. I'm obviously calling collect as manual described,
    Untitled.txt
    o
    • 2
    • 11
  • s

    Sean Lindo

    08/04/2022, 11:21 PM
    I've had some success using ops and jobs to fetch a file from S3, run some transformations, and then save it to a database. I'm attempting to upgrade this example by creating a job that uses the concepts described in Software-Defined Assets, and I’m having trouble getting something to materialize successfully. I'm trying to define a source asset that should load data from an IO Manager that I define. Following that, I'd like to run a number of transformation steps and also some steps that don't produce an asset (such as posting a message to slack). In the attached gist, I get a GraphQL error when materializing “raw_users”. So I have a few questions.. • How do I pass a context to this custom IO Manager so the source asset materializes successfully? • How can I mix in non-asset-producing “steps” into this job? Referenced gist: https://gist.github.com/seanlindo/a096af16dc34df4f4e6f31be3c2c5bae
    o
    • 2
    • 27
  • v

    VxD

    08/05/2022, 1:00 AM
    Hi Dagster team! We have a run status sensor set up to perform operations on graph success, running every 10s with
    minimum_interval_seconds=10
    . We noticed that each time the sensor runs, it only processes one succeeded graph, even if 50 have succeeded over the past 10s. This is heavily problematic because the sensor requires 10 minutes to process 60 completed jobs, which doesn't scale when we need to handle hundreds. Is there a way we can get the sensor to process more than one pipeline in one go?
    r
    p
    +2
    • 5
    • 41
  • m

    Maksym Domariev

    08/05/2022, 1:40 AM
    hey, did you delete hackernews example from github? curios what's the replacement
    r
    y
    • 3
    • 2
  • b

    Bolin Zhu

    08/05/2022, 2:51 AM
    Hello! Would like to ask is there a way to access Op inputs in failure hook? Please refer to this unanswered discussion (https://github.com/dagster-io/dagster/discussions/6275) for more details 🙏
    o
    • 2
    • 2
  • g

    geoHeil

    08/05/2022, 6:50 AM
    In https://docs.dagster.io/guides/dagster/transitioning-data-pipelines-from-development-to-production you mention resources in a dictionary with the values of the deployments (local, staging, prod). Is it better to have the job structured in such a way or have one repository per environment?
    o
    • 2
    • 28
  • t

    Taqi

    08/05/2022, 6:53 AM
    Hi Everyone I am new to dagster. I want to set up a Slack Notification on failed jobs. I see a few documents but am not able to figure out where to add that piece of code in the dagster dashboard, can someone please help?
    o
    • 2
    • 4
  • k

    Kautsar Aqsa

    08/05/2022, 8:36 AM
    Hello everyone, i tried to run Logstash job with dagster using
    shell_op()
    , but unfortunately I got this error
    dagster._core.errors.DagsterInvalidInvocationError: Compute function of op 'shell_op' has context argument, but no context was provided when invoking.
    and I still don't understand what dagster means context here. Here is my .py code
    @op
    def Load_To_Elastic(context, table):
        """Loading jobs from warehouse to elasticsearch"""
        bash_command = "/Users/kautsaraqsa/ELK/logstash-7.15.2/bin/logstash -f /Users/kautsaraqsa/code/AIHSP/badung-case.conf"
        <http://context.log.info|context.log.info>("Indexing table:" + table)
        shell_op(context, bash_command)
    @graph
    def Pipeline():
        """Load table to warehouse then transfer it to Elasticsearch with Logstash"""
        table = Ingest_From_Synchro()
        Load_To_Elastic(table)
    Hope anyone can help me with this, thank you!
    z
    • 2
    • 3
  • x

    Xavier BALESI

    08/05/2022, 10:36 AM
    Hi Everyone, I have an unexpected behaviour with assets and io_manager. a example of code :
    @asset(name="upstream")
    def upstream(_: OpExecutionContext):
        return "upstream"
    
    @asset(name="downstream", ins={"upstream": AssetIn("upstream")})
    def downstream(_: OpExecutionContext, upstream):
        return upstream + "downstream"
    
    assets_with_resources = with_resources([upstream, downstream], resource_defs={...})
    
    materialize_all = define_asset_job(name="materialize_all",config=config_from_files([...]))
    
    @repository
    def repo():
    return [assets_with_resources, materialize_all]
    
    # Note: in my custom io manager I load_input and handle_output a file named respectively from  context.upstream_output.name and context.op_def.name
    in dagit from the job view, when I launch upstream alone then downstream alone, all is right:
    Loaded input "upstream" using input manager "my_io_manager"
    but when I launch the whole job materialize_all it fails after :
    Loaded input "upstream" using input manager "my_io_manager", from output "result" of step "upstream"
    In the working case the output of upstream is named 'upstream' but in the breaking case it's named 'result' I am expecting that when I launch the 2 assets separately and with materialize all, I have the same behaviour
    o
    • 2
    • 2
Powered by Linen
Title
x

Xavier BALESI

08/05/2022, 10:36 AM
Hi Everyone, I have an unexpected behaviour with assets and io_manager. a example of code :
@asset(name="upstream")
def upstream(_: OpExecutionContext):
    return "upstream"

@asset(name="downstream", ins={"upstream": AssetIn("upstream")})
def downstream(_: OpExecutionContext, upstream):
    return upstream + "downstream"

assets_with_resources = with_resources([upstream, downstream], resource_defs={...})

materialize_all = define_asset_job(name="materialize_all",config=config_from_files([...]))

@repository
def repo():
return [assets_with_resources, materialize_all]

# Note: in my custom io manager I load_input and handle_output a file named respectively from  context.upstream_output.name and context.op_def.name
in dagit from the job view, when I launch upstream alone then downstream alone, all is right:
Loaded input "upstream" using input manager "my_io_manager"
but when I launch the whole job materialize_all it fails after :
Loaded input "upstream" using input manager "my_io_manager", from output "result" of step "upstream"
In the working case the output of upstream is named 'upstream' but in the breaking case it's named 'result' I am expecting that when I launch the 2 assets separately and with materialize all, I have the same behaviour
o

owen

08/05/2022, 5:21 PM
hi @Xavier BALESI! We should definitely make this clearer, but the path that you load/store assets to/from should be purely a function of the asset key (and whatever configuration on the IOManager). Using
context.upstream_output.name
and
context.op_def.name
can cause issues like what you're seeing, as these are not guaranteed to stay the same between different executions
x

Xavier BALESI

08/05/2022, 8:02 PM
Hi @owen! Thank you for the explanation 👍
View count: 1