https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • s

    Sterling Paramore

    07/29/2022, 10:37 PM
    In the Dagster docs page there are instructions on how to create op factories. After I’ve used the factory to create some ops, how do I create a graph from those factory-defined ops?
    a
    s
    o
    • 4
    • 17
  • o

    Oren Lederman

    07/29/2022, 11:12 PM
    Somewhat new to Dagster, new to software defined assets. I’m trying to define a simple asset with a parameter and add it to a repository:
    @asset(config_schema={"some_param": str})
    def asset_with_config(context):
        some_param = context.op_config["some_param"]
        return some_param
    
    
    @repository
    def dagster_dags():
        return [
            asset_with_config,
        ]
    Dagit starts just fine. As expected, I’m getting an error I try to materialize the asset:
    __ASSET_JOB cannot be executed with the provided config. Please fix the following errors:
    
    Missing required config entry "ops" at the root. Sample config for missing entry: {'ops': {'asset_with_config': {'config': {'some_param': '...'}}}}
    What’s the right way to provide configuration for the asset?
    o
    • 2
    • 13
  • a

    Antonio Bertino

    07/30/2022, 2:21 PM
    Hello guys. Is there a way to retrieve a job output through a sensor? I want to retrieve some op outputs and use it in a job that will be triggered by the sensor.
    o
    r
    g
    • 4
    • 10
  • g

    geoHeil

    07/30/2022, 7:10 PM
    Is there anywhere an example of how to use https://github.com/dagster-io/dagster/tree/6c97c7585f9f36cce7e557a1eb04d51fac9cc0eb/python_modules/libraries/dagster-datahub?
  • k

    Kushagra Rajput

    07/31/2022, 11:01 AM
    hey guys, I was having a bit of doubt on how dagster parse its cron schedules
  • k

    Kushagra Rajput

    07/31/2022, 11:02 AM
    I needed a schedule which will never be executed, springboot has the ability to set the cron schedule to '-' to disable it, we can also use 31st Feb as date. But dagster interpreter isnt working with these 2 workarounds
    g
    o
    • 3
    • 7
  • k

    Kushagra Rajput

    07/31/2022, 11:03 AM
    is there a way to set dagster cron schedule to such that it may never get executed ?
  • h

    Huib Keemink

    07/31/2022, 11:09 AM
    Is there a way to have dagster ignore an asset-based-job without any assets, instead of raising an error?
    o
    • 2
    • 3
  • m

    Mark

    07/31/2022, 11:37 AM
    Hi, i hope you can point me in the right direction with dagit not starting anymore for some reason: dagit is not starting anymore with multiple workspaces configured in
    workspace.yaml
    , it just hangs immediately and nothing happens. I am using multiple
    python_file
    workspaces which worked for me very well before. Now i have the problem that dagit is not starting anymore if there is more than one workspace. I can start dagit with a single workspace and all others commented out. There are 4 workspaces and dagit starts with each of them seperately, so each individual workspace seems not to be the problem. But dagit will only start if there is only one workspace at all. Also, if i start dagit with just one workspace in the file, then comment in the others and then reload the workspaces via button in the already running dagit instance that also works fine. I already tried to set log-level to trace, but dagit is not logging anything when it hangs in startup with multiple workspaces, it just gets stuck immediately. I am running dagit inside a docker container with verion 0.15.8 but also tried 0.15.7 and 0.15.6. I hope you have some idea what i can try to figure out what the problem is because i am not aware of any changes i made that might have lead to that.
    :dagster-bot-resolve: 1
    d
    • 2
    • 4
  • e

    Edo

    07/31/2022, 3:30 PM
    Hi, I'm a data analyst, a newbie in ETL and orchestration, or data engineering in general. I need some helps here. Currently I'm using Airflow deployed on Google Composer 2 to EL data from Aurora MySQL to BigQuery on daily basis, ~200K rows. It charge me quite costly (around $350 a month). My question is, can I use Dagster to perform that job? Will it be cheaper if I deploy Dagster on GCP? And which GCE instance should I use? Thank you so much. Appreciate the helps. 🙏
    z
    • 2
    • 2
  • m

    Matt Fysh

    07/31/2022, 10:59 PM
    Can dagster be used for streaming pipelines? I have a file-listing sensor, that yields RunRequests for each file found. The job converts the file into 0 or more rows in an asset. For each subsequent file added to the filesystem, the new rows wipe out the existing rows in the assets instead of being appended. How do I get it to append the new rows, for each new file detected?
    • 1
    • 1
  • m

    Mohammad Nazeeruddin

    08/01/2022, 5:34 AM
    Hi Team! Using dagster V0.12.14 with user-code-deployment helm chart. Triggered a pipeline & it's not able to complete this run because of below error. what should be the cause and how we can handle it ? appreciate the helps. Thanks.
    dagster.core.errors.DagsterLaunchFailedError: Tried to start a run on a server after telling it to shut down
    
      File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 1386, in submit_run
        SubmitRunContext(run, workspace=workspace)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/run_coordinator/default_run_coordinator.py", line 32, in submit_run
        self._instance.launch_run(pipeline_run.run_id, context.workspace)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 1450, in launch_run
        self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
      File "/usr/local/lib/python3.7/site-packages/dagster/core/launcher/default_run_launcher.py", line 105, in launch_run
        res.message, serializable_error_info=res.serializable_error_info
    d
    • 2
    • 4
  • a

    Ashish Sharma

    08/01/2022, 7:05 AM
    Hello Team, We have been facing
    Exception: timeout expired
    error from the past 1 month, but when we rerun the job it gets completed successfully. I have been trying to kow the Rootcause with there is no information provided on the net. Can you please check this log and let me know, why this issue happens and how to fix it.
    dagster.core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "complete_dq_request":
    2
    3  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/execute_plan.py", line 230, in dagster_event_sequence_for_step
    4    for step_event in check.generator(step_events):
    5  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/execute_step.py", line 353, in core_dagster_event_sequence_for_step
    6    for user_event in check.generator(
    7  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/execute_step.py", line 69, in _step_output_error_checked_user_event_sequence
    8    for user_event in user_event_sequence:
    9  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/compute.py", line 174, in execute_core_compute
    10    for step_output in _yield_compute_results(step_context, inputs, compute_fn):
    11  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/compute.py", line 142, in _yield_compute_results
    12    for event in iterate_with_context(
    13  File "/usr/local/lib/python3.8/dist-packages/dagster/utils/__init__.py", line 407, in iterate_with_context
    14    return
    15  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    16    self.gen.throw(type, value, traceback)
    17  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/utils.py", line 73, in solid_execution_error_boundary
    18    raise error_cls(
    19
    20The above exception was caused by the following exception:
    21Exception: timeout expired
    22
    23
    24  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
    25    yield
    26  File "/usr/local/lib/python3.8/dist-packages/dagster/utils/__init__.py", line 405, in iterate_with_context
    27    next_output = next(iterator)
    28  File "/usr/local/lib/python3.8/dist-packages/dagster/core/execution/plan/compute_generator.py", line 65, in _coerce_solid_compute_fn_to_iterator
    29    result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
    30  File "/opt/dagster/home/orchestration_manager/ops/data_quality_ops/op_complete_dq_request.py", line 741, in complete_dq_request
    31    raise Exception(l_error)
    :dagster-bot-resolve: 1
    o
    • 2
    • 2
  • s

    Sanidhya Singh

    08/01/2022, 7:21 AM
    Hi All! noticed a gap in the dagster-slack documentation examples and opened up an issue! https://github.com/dagster-io/dagster/issues/9124 I have created a branch to submit a PR but I don’t have access to publish it. Can anyone here assist? Happy to contribute the PR!
    :dagster-bot-resolve: 1
    o
    • 2
    • 5
  • k

    Katrin Grunert

    08/01/2022, 9:05 AM
    Hey all! I am running a job with the
    k8s_job_executor
    and each step of the job is being spawned as an own pod. I am running dagster version 0.14.20 In my job I set the
    job_spec_config
    to
    {'ttl_seconds_after_finished': 120}
    , and in the docs it says that jobs and their associated pods get deleted after the TTL has expired. After the TTL is exceeded, the k8s-job gets deleted, but my
    dagster-step-<SOME_ID>
    - pods are not. Is there some issues with the
    k8s_job_executor
    , that these pods are not associated with the job, and is there a way I can achieve this?
    :dagster-bot-resolve: 1
    j
    • 2
    • 2
  • a

    Alessandro Facchin

    08/01/2022, 9:38 AM
    Hi all! I have incurred into an unexpected error in Dagster 0.15.8, could you please help me with it?
    Operation name: SidebarAssetQuery
    
    Message: 'compute_raw_features_2'
    
    Path: ["assetNodeOrError","configField"]
    
    Locations: [{"line":51,"column":3}]
    It happens when I create two assets from the same graph "`compute_raw_features`" using the function
    AssetsDefinition.from_graph
    :dagster-bot-resolve: 1
    o
    • 2
    • 6
  • k

    Katrin Grunert

    08/01/2022, 9:39 AM
    Hey! I am running a job with the
    k8s_job_executor
    and each step of the job is being spawned as an own pod. Each pod is named
    dagster-step-<SOME_ID>
    and I was wondering, if it is possible to modify this generated name by a custom name.
  • s

    Sanidhya Singh

    08/01/2022, 10:41 AM
    Hi All! There seems to be a discrepancy in the behaviour of environment variables in
    dagster.yaml
    . The following code
    retention:
      schedule:
        purge_after_days: 
          env: DAGSTER_SCHEDULE_PURGE # sets retention policy for schedule ticks of all types
      sensor:
        purge_after_days:
          skipped: 
            env: DAGSTER_SENSOR_SKIPPED_PURGE
          failure: 
            env: DAGSTER_SENSOR_FAILURE_PURGE
          success:
            env: DAGSTER_SENSOR_SUCCESS_PURGE
    throws
    raise DagsterInvalidConfigError(
    dagster.core.errors.DagsterInvalidConfigError: Errors whilst loading dagster instance config at dagster.yaml.
        Error 1: Invalid scalar at path root:retention:sensor:purge_after_days:failure. Value "{'env': 'DAGSTER_SENSOR_FAILURE_PURGE'}" of type "<class 'dict'>" is not valid for expected type "Int".
        Error 2: Invalid scalar at path root:retention:sensor:purge_after_days:skipped. Value "{'env': 'DAGSTER_SENSOR_SKIPPED_PURGE'}" of type "<class 'dict'>" is not valid for expected type "Int".
        Error 3: Invalid scalar at path root:retention:sensor:purge_after_days:success. Value "{'env': 'DAGSTER_SENSOR_SUCCESS_PURGE'}" of type "<class 'dict'>" is not valid for expected type "Int".
    I’m on Dagster 0.15.8
    :dagster-bot-resolve-to-issue: 1
    o
    • 2
    • 2
  • j

    Jakub Zgrzebnicki

    08/01/2022, 12:24 PM
    Hi, I'm creating an asset from graph and I would like to see in dagit returned type of asset, but graphs only have GraphOut() output type and it's always any. Is there a way to set it somehow using dagsterType?
    :dagster-bot-resolve-to-issue: 1
    o
    • 2
    • 1
  • r

    Remco Loof

    08/01/2022, 1:26 PM
    Anyone got an answer to this question?
    :dagster-bot-resolve: 1
    z
    • 2
    • 2
  • l

    Lucas Gabriel

    08/01/2022, 2:42 PM
    I have Dagster deployed in a AWS EC2, using Docker. Is there a way of telling Dagster to automatic prune the container after execute the job?
    :dagster-bot-resolve: 1
    i
    d
    • 3
    • 3
  • t

    Timo

    08/01/2022, 3:18 PM
    HI, how do I set the
    io_manager_key
    argument for a graph backed asset
    AssetsDefinition.from_graph(...)
    ?
    :dagster-bot-resolve: 1
    o
    • 2
    • 1
  • j

    Jack Yin

    08/01/2022, 4:30 PM
    hey all i’m running into a strange situation where
    dagster-pandera
    seems to require
    dagster-0.15.8
    which then conflicts with other stuff that require
    dagster-0.15.7
    and then everything breaks
  • j

    Jack Yin

    08/01/2022, 4:31 PM
    or perhaps i think the conflict is that
    dagster-graphql
    requires
    dagster-0.15.7
    while
    dagster-pandera
    requires
    dagster-0.15.8
    :dagster-bot-responded-by-community: 1
    z
    • 2
    • 2
  • j

    Jack Yin

    08/01/2022, 4:33 PM
    concretely, i’m trying to run the bollinger example
  • m

    Matt Fysh

    08/01/2022, 5:49 PM
    can anyone here confirm that dagster is not a suitable platform for streaming and real-time use cases?
    :dagster-bot-resolve: 1
    o
    j
    • 3
    • 14
  • m

    Marek

    08/01/2022, 6:38 PM
    Hi All! This is my first message in here. As I have been looking at Dagster, I'd be interesting in getting some feedback with regards to AWX/Salt/Stackstorm tools. I'm interested in replacing some of the workflows that are related to the infrastructure management (ie. network device connectivity and operations) etc. How good Dagster could be in these kind of tasks? 🙂
    o
    • 2
    • 1
  • y

    Yang

    08/01/2022, 8:51 PM
    Hello, I have a local agent that launches docker containers. I'd like to set a secret somehow to an env_var in container_context in the locations file. How can I do that? I know locations files can't use env vars ${var} but there is another way to set an env var so that the contents aren't in the locations file? Thanks
    :dagster-bot-resolve: 1
    d
    • 2
    • 2
  • o

    Oliver

    08/01/2022, 11:29 PM
    Hey all, I have an asset define like this.
    @asset(
        config_schema={
            "limit": int,
            "cohort_table": str,
            "database": str
        },
        required_resource_keys={'aws'},
        # ins={
        #     "cohort": AssetIn(
        #         input_manager_key="pandas_df_manager",
        #         key=AssetKey(('qld_health', 'qld_health_simple_cohort'))
        #     )   
        # },
        non_argument_deps={
            AssetKey(('qld_health', 'qld_health_simple_cohort'))
        }
    
    )
    def cohort(context,
        # cohort
    ):
        pass
    when setting
    non_argument_deps
    to the source asset everything works as expected and the UI shows the correct lineage. however when trying to supply an argument asset using the ins and the same AssetKey I get
    Input asset '["qld_health", "qld_health_simple_cohort"]' for asset '["cohort"]' is not produced by any of the provided asset ops and is not one of the provided sources
    any ideas what I'm missing here?
    :dagster-bot-resolve-to-issue: 1
    o
    o
    • 3
    • 8
  • f

    fahad

    08/01/2022, 11:46 PM
    I think I may be running into an issue with the
    PythonObjectDagsterType
    and it’s associated
    loader
    argument. The
    loader
    was created using the
    @dagster_type_loader
    decorator and requests a resource via
    required_resource_keys={"file_downloader"}
    . So put together I have something like this:
    @dagster_type_loader(..., required_resource_keys={"file_downloader"})
    def type_loader(context, config) -> MyType:
      s3_downloader: S3Downloader = context.resources.file_downloader
      return MyType(path=s3_downloader.download(...))
    
    DagitConfigurableMyType = PythonObjectDagsterType(
        python_type=MyType, loader=type_loader
    )
    Now I can use this as an input to a graph just fine - and configure it via dagit as an input to a graph. The
    file_downloader
    resource is resolved and brought in as desired:
    @graph(
        ...
        ins={"path": In(DagitConfigurableMyType)},
    )
    def intake_graph(in_path: MyType):
        ...
    However doing this almost works:
    @graph(
        ...
        ins={"path": In(list[DagitConfigurableMyType])},
    )
    def intake_graphs(in_paths: list[MyType]):
        ...
    Except that it complains about not being able to resolve the
    file_downloader
    resource. I’m assuming this is somehow related to how resources are resolved. If I manually add the resource to the first op in the graph it will actually resolve the resource for me. However, since the
    dagster_type_loader
    does get respected even when nested within a
    list
    , it seems like the resource should get resolved as well
    :dagster-bot-resolve-to-issue: 1
    o
    • 2
    • 2
Powered by Linen
Title
f

fahad

08/01/2022, 11:46 PM
I think I may be running into an issue with the
PythonObjectDagsterType
and it’s associated
loader
argument. The
loader
was created using the
@dagster_type_loader
decorator and requests a resource via
required_resource_keys={"file_downloader"}
. So put together I have something like this:
@dagster_type_loader(..., required_resource_keys={"file_downloader"})
def type_loader(context, config) -> MyType:
  s3_downloader: S3Downloader = context.resources.file_downloader
  return MyType(path=s3_downloader.download(...))

DagitConfigurableMyType = PythonObjectDagsterType(
    python_type=MyType, loader=type_loader
)
Now I can use this as an input to a graph just fine - and configure it via dagit as an input to a graph. The
file_downloader
resource is resolved and brought in as desired:
@graph(
    ...
    ins={"path": In(DagitConfigurableMyType)},
)
def intake_graph(in_path: MyType):
    ...
However doing this almost works:
@graph(
    ...
    ins={"path": In(list[DagitConfigurableMyType])},
)
def intake_graphs(in_paths: list[MyType]):
    ...
Except that it complains about not being able to resolve the
file_downloader
resource. I’m assuming this is somehow related to how resources are resolved. If I manually add the resource to the first op in the graph it will actually resolve the resource for me. However, since the
dagster_type_loader
does get respected even when nested within a
list
, it seems like the resource should get resolved as well
:dagster-bot-resolve-to-issue: 1
o

owen

08/02/2022, 4:16 PM
hi @fahad! thanks for the report, looks like you're right. I filed an issue for this here: https://github.com/dagster-io/dagster/issues/9161, and in the meantime your solution of explicitly adding the resource to the op as well is probably the best workaround
f

fahad

08/02/2022, 4:17 PM
Ah thanks for creating the issue!
View count: 2