dagster #dagster-cloud

Stephen Bailey

04/08/2022, 8:20 PM

am i right in understanding that the

alert_policies

allow us to use tags to configure which slack channels a job alert gets sent to? https://docs.dagster.cloud/guides/alerts#slack-alert-policies

Charlie Bini

04/08/2022, 10:29 PM

I'm getting

dagster.core.errors.DagsterInvariantViolationError: No jobs, pipelines, graphs, asset collections, or repositories found

for my docker code location. I have a

repository.py

file in the package folder with a

@repository

decorated function. Am I forgetting to do something else?

Stephen Bailey

04/11/2022, 2:28 PM

i'm interested in trying out this local development flow described here: https://docs.dagster.cloud/guides/developing#developing-dagster-code-with-dagster-cloud 1. it looks like there's a typo in the "developing without rebuilding an image" section -- should have

DockerUser*C*odeLauncher

not

DockerUsercodeLauncher

. 2. im trying to understand the advantage here to running things locally? i am trying to solve the problem of being able to simulate sensors, schedules, etc, but it looks like this workflow is just going to simulate code execution. is that right?

🤯 1

Stephen Bailey

04/11/2022, 2:29 PM

we are having a discussion of how we can write a

docker-compose

file to stand up a full local version of the dagster daemon + dagit, and i would love to not have to do that, but it doesn't look like this quite solves that problem.

Charlie Bini

04/12/2022, 10:05 PM

does

dagster-cloud workspace sync -w xyz.yaml

support variable substitution in the YAML file?

Stephen Bailey

04/13/2022, 1:54 PM

if i make an alert policy with an empty tags section, will it send notifications for every run? a la

Copy code

alert_policies:
  - name: "slack-alert-policy"
    description: "An alert policy to send a Slack notification to sales on job failure or success."
    tags: []
    event_types:
      - "JOB_SUCCESS"
      - "JOB_FAILURE"
    notification_service:
      slack:
        slack_workspace_name: "hooli"
        slack_channel_name: "sales-notifications"

Nicholas

04/15/2022, 5:16 PM

Hi, I’m new to Dagster and am trying to set up a production deployment for my company. I’m a bit confused as to the difference between Dagster Cloud and regular Dagster. What’s the difference between setting up an ECS Agent (https://docs.dagster.cloud/agents/ecs/setup) as opposed to deploying Dagster to AWS (https://docs.dagster.io/deployment/guides/aws). Is there a recommendation for one over the other? Thanks in advance!

Charlie Bini

04/18/2022, 7:53 PM

just noticed:

container_context

seems to disappear after you add it to the code location yaml. Can confirm it's working, but if I go to modify the YAML, it's no longer there

Charlie Bini

04/18/2022, 8:22 PM

for the dagster-cloud-cicd-action, what's the CLI equivalent for what it does with the locations.yaml? Is it a simple add/update of only the locations specified or a full sync that will delete anything not listed?

Prratek Ramchandani

04/19/2022, 1:40 AM

just wanted to call out couple of tiny details when picking a partition for a job in Dagit: 1. i'd love if they were ordered with most recent partition first so i don't have to scroll right to the bottom for the most common use case 2. when i scroll all the way the most recent partition still isn't really in view

Evan Arnold

04/22/2022, 6:44 PM

For Slack Alerts, what are the available options for

event_types

Charlie Bini

04/27/2022, 3:23 PM

helm noob question: if I want to adjust the agent resources, that goes under

dagsterCloudAgent.resources

on the helm chart, right? how exactly do I edit that and persist that setting across updates?

Charlie Bini

04/28/2022, 7:23 PM

getting this when I try to sync an alert policy:

Copy code

Error: Invariant failed. Description: Value at path root:alert_policies[0]:event_types[0] not in enum type AlertPolicyEventType got RUN_FAILURE, Value at path root:alert_policies[0]:event_types[1] not in enum type AlertPolicyEventType got STEP_FAILURE, Value at path root:alert_policies[0]:event_types[2] not in enum type AlertPolicyEventType got ALERT_FAILURE

looking HERE, it seems the only

JOB_FAILURE

and

JOB_SUCCESS

are the only valid event types, but I can't find any docs or code explaining what those are. Would it be possible to open this up to other event types (e.g.

STEP_FAILURE

) or does that have to be done at the job level?

Charlie Bini

04/29/2022, 7:05 PM

@daniel clarifying question about the new

resources

key: if I define it in the locations yaml, will that affect the code location pod or only the job pods that it launches?

Evan Arnold

05/05/2022, 2:41 PM

Do y'all have any recommendations for a good default way of handing IO for Cloud + ECS? Currently, withe default IOManager (file system), I lose the ability to re-run failed

ops

because the storage is ephemeral.

Will Curatolo

05/12/2022, 3:52 PM

hey all 👋 is there any info available about Dagster Cloud other than here and the intro blog post? My company is evaluating hosted data orchestration tools for our use case, and I'd love to include Dagster Cloud if possible! My team and I have been on the waitlist for a few months now and haven't heard anything

Travis McKinney

05/12/2022, 5:05 PM

Hey @Will Curatolo, just sent you an email. Look forward to chatting!

thank you box 1

Stephen Bailey

05/22/2022, 8:00 PM

What is the recommended way to configure

runCoordinator.maxConcurrentRuns

on Dagster Cloud? I see the Helm chart value in the standalone chart but don't see anything matching in the Dagster Cloud one. I'm looking to double or triple the default instance-level concurrency (deployed on EKS).

🤖 1

Stephen Bailey

05/23/2022, 1:00 PM

Would be really cool to be able to configure default tags for runs on an Instance/Code Location/Repository basis from within the Dagster cloud settings file. i currently have a helper function that will attach Datadog-related tags to runs, but I have to add it onto every job run specifically. Would be useful for the cloud Slack alerting as well.

Yevhen Samoilenko

05/30/2022, 2:04 PM

Hi! Sorry for the silly question, but what is the recommended way of passing env variables and secrets to user code images when using the Docker Agent?

Yevhen Samoilenko

05/31/2022, 2:23 PM

Hi! I'm having trouble adding my code to a workspace using Docker Agent. Here is the error:

Copy code

Exception: Timed out waiting for server user_code_52fa4f:4000. Most recent connection error: dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server Stack Trace: File "/dagster-cloud/dagster_cloud/workspace/user_code_launcher/user_code_launcher.py", line 687, in _wait_for_server server_id = sync_get_server_id(client) File "/dagster/dagster/api/get_server_id.py", line 15, in sync_get_server_id result = check.inst(api_client.get_server_id(), (str, SerializableErrorInfo)) File "/dagster/dagster/grpc/client.py", line 152, in get_server_id res = self._query("GetServerId", api_pb2.Empty, timeout=timeout) File "/dagster/dagster/grpc/client.py", line 115, in _query raise DagsterUserCodeUnreachableError("Could not reach user code server") from e The above exception was caused by the following exception: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "DNS resolution failed for user_code_52fa4f:4000: C-ares status is not ARES_SUCCESS qtype=A name=user_code_52fa4f is_balancer=0: Could not contact DNS servers" debug_error_string = "{"created":"@1654005039.486441814","description":"DNS resolution failed for user_code_52fa4f:4000: C-ares status is not ARES_SUCCESS qtype=A name=user_code_52fa4f is_balancer=0: Could not contact DNS servers","file":"src/core/lib/transport/error_utils.cc","file_line":165,"grpc_status":14}" > Stack Trace: File "/dagster/dagster/grpc/client.py", line 112, in _query response = getattr(stub, method)(request_type(**kwargs), timeout=timeout) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state)
  File "/dagster-cloud/dagster_cloud/workspace/user_code_launcher/user_code_launcher.py", line 548, in _reconcile
    new_updated_endpoint = self._create_new_server_endpoint(
  File "/dagster-cloud/dagster_cloud/workspace/docker/__init__.py", line 197, in _create_new_server_endpoint
    return self._launch(
  File "/dagster-cloud/dagster_cloud/workspace/docker/__init__.py", line 286, in _launch
    server_id = self._wait_for_server(
  File "/dagster-cloud/dagster_cloud/workspace/user_code_launcher/user_code_launcher.py", line 693, in _wait_for_server
    raise Exception(

How can I debug the process and find out the reason it occurs?

Zach

05/31/2022, 2:46 PM

A few comments regarding the job runs view with hundreds of ops: • log loading performance is pretty poor, it takes a few minutes to load the logs and logs are completely unqueryable until they're all loaded • all logs are re-fetched every time the job run page is opened, so if you accidentally close the job runs tab you have to go get a cup of coffee while it loads again. having the logs cached somehow would help mitigate the initial load time • it would be nice to be able to pan and zoom around the op view. we have a job with hundreds of ops and particularly the ability to zoom in and out on different parts of the job would be really useful for exploring the generated op graph.

👍 1

Zach

05/31/2022, 3:44 PM

Not sure if #dagster-support would be better for this, but seeing as it is coming from the dagster-cloud library I'll post it here. I'm seeing this failure on a small percentage of ops in a large fan-out (all fan-out ops are running the same step-launcher and op code, some seem to fail in this way):

Copy code

dagster_cloud.storage.errors.GraphQLStorageError: Error in GraphQL response: [{'message': 'Internal Server Error (Trace ID: 2565201103886791611)', 'locations': [{'line': 15, 'column': 13}], 'path': ['eventLogs', 'getLogsForRun']}]
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/execute_plan.py", line 230, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.9/site-packages/etxdagster/dnax/dnax_step_launcher.py", line 466, in launch_step
    step_run_ref = self._step_context_to_step_run_ref(
  File "/usr/local/lib/python3.9/site-packages/etxdagster/dnax/dnax_step_launcher.py", line 557, in _step_context_to_step_run_ref
    return step_context_to_step_run_ref(
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/external_step.py", line 191, in step_context_to_step_run_ref
    upstream_output_events, run_group = _upstream_events_and_runs(step_context)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/external_step.py", line 117, in _upstream_events_and_runs
    step_output_records = step_context.instance.all_logs(
  File "/usr/local/lib/python3.9/site-packages/dagster/utils/__init__.py", line 615, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1289, in all_logs
    return self._event_storage.get_logs_for_run(run_id, of_type=of_type)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/event_logs/storage.py", line 257, in get_logs_for_run
    res = self._execute_query(
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/event_logs/storage.py", line 237, in _execute_query
    res = self._graphql_client.execute(query, variable_values=variables)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/client.py", line 63, in execute
    return self._execute_retry(query, variable_values)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/client.py", line 117, in _execute_retry
    raise GraphQLStorageError(f"Error in GraphQL response: {str(result['errors'])}")

Stephen Bailey

05/31/2022, 4:13 PM

are there global permissions for users outside of individual deployments? I have a user I am trying to make a superadmin, and they are seeing the deployment tab grayed out, despite being an admin in the workspace. Is that expected?

Seth Kimmel

06/01/2022, 5:10 PM

Is there a place to quickly see what version of Python cloud is running? Looking at the GUI and https://github.com/dagster-io/dagster-cloud release history.

geoHeil

06/03/2022, 6:05 PM

I am in the process of evaluating dagster cloud. Having an existing (docker-runner via docker-compose-based) setup I am trying to migrate to the managed edition of dagster cloud. However, I do have some questions regarding https://docs.dagster.cloud/guides/continuous-integration My setup is similar to: https://github.com/geoHeil/dagster-ssh-demo/blob/master/docker-compose.yml i.e. a docker-compose-based one with the Queued Run launcher. The desired final result is something similar to: 1. Testing pipeline for each commit: - checkout code - check compliance - if non compliant auto-format and push formatted edition back to github - reformatting, linting, type checking - unit tests - pushing a preview to dagit via dagster cloud 2. Manual review of the deploy preview in dagster cloud. - manual review - when clicking on merge/merging the MR to master continue with (3) 3. Main/Master pipeline - only run on the main branch - all the tests are run again - the version number is incremented - at least via semver - ideally via something like zest release where it is derived from a semantic changelog (https://github.com/zestsoftware/zest.releaser) - the docker image is deployed to the registry - dagster is updating the images and deploying the new version of the code Ideally, I do not need to build the container twice - rather can build it once (i.e. create my conda environment with all the dependencies also only once) and forward / increment the version number and tag it when all the tests have passed successfully.

Prratek Ramchandani

06/03/2022, 9:17 PM

how does one authenticate with the graphQL api if you want to query it somewhere other than the playground?

Zach

06/03/2022, 11:08 PM

it would be nice if there were a little more documentation describing the sequence of steps to set up code previews - it doesn't seem particularly complicated (I think it's just the graphQL createCodePreview call, then feed that into

dagster-cloud workspace snapshot ...

, then construct the URL), but the only way I was able to piece that together was to dive into the code for the github action. The provided github action is great, but if you don't use github actions it's not quite as helpful.

Zach

06/03/2022, 11:12 PM

separately, is there a suggestion for managing user tokens for service users? we try not to use tokens that are tied to individual users for programmatic usage, for instance having a separate program / process interacting with dagster via graphql. should we create a specific service user account for this?

geoHeil

06/04/2022, 6:12 AM

In the OSS version of dagster I am using the QueuedRunLauncher with an docker executor:

Copy code

run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      - DAGSTER_POSTGRES_USER
      - DAGSTER_POSTGRES_PASSWORD
      - DAGSTER_POSTGRES_DB
      - DAGSTER_POSTGRES_HOSTNAME
      - AWS_ACCESS_KEY_ID
      - AWS_SECRET_ACCESS_KEY

Environment variables can be passed directly over to the container.

NOTICE: I do NOT have to set key=value paris but rather pass them through.

In dagster cloud:

Copy code

user_code_launcher:
  module: dagster_cloud.workspace.docker
  class: DockerUserCodeLauncher
  config:
    env_vars:
      - AWS_ACCESS_KEY_ID
      - AWS_SECRET_ACCESS_KEY
      - SLACK_DAGSTER_ETL_BOT_TOKEN

I try the following and observe multiple problems: 1. the env vars are not passed and none:

Copy code

# PROD resources:
"s3_bucket": ResourceDefinition.hardcoded_resource("my_bucket"),
"s3": s3_resource.configured(
    {"endpoint_url": "<http://minio1:9000>", "region_name": "eu-central-1"}
),

# when trying to use it:

@io_manager(required_resource_keys={"s3_bucket", "s3"})
def s3_partitioned_csv_io_manager(init_context):
    bucket_resource = init_context.resources.s3_bucket
    s3 = init_context.resources.s3

	creds = s3._request_signer._credentials
	AWS_ACCESS_KEY_ID_resource = creds.access_key
	AWS_SECRET_ACCESS_KEY_resource = creds.secret_key

fails with:

Copy code

'NoneType' object has no attribute 'access_key'

2. the slack notification bot (which should send a message in case of failed pipelines) is never triggered. Also, I cannot find the logs which I usually see for dagster-daemon in OSS which might explain why it failed. 3. In OSS I can subselect assets from a job. This does not work in cloud:

Copy code

# for a dummy asset of:

import pandas as pd
from dagster import asset


@asset(
    io_manager_key="dummy_io",
    compute_kind="python_ingestion",
)
def flow_dummy_raw_asset() -> pd.DataFrame:
    return pd.DataFrame({"bar": [2, 3, 4, 10]})


@asset(
    io_manager_key="dummy_io",
    compute_kind="python_cleaning",
)
def flow_dummy_normalized_asset(flow_dummy_raw_asset: pd.DataFrame) -> pd.DataFrame:
    flow_dummy_raw_asset["n"] = 1
    return flow_dummy_raw_asset


# error message when subselecting the first
# a similar error message shows up when selecting the other side vice-versa
dagster._check.CheckError: Invariant failed. Description: flow_dummy has no solid named flow_dummy_normalized_asset.
  File "/opt/conda/lib/python3.9/site-packages/dagster/grpc/impl.py", line 92, in core_execute_run
    yield from execute_run_iterator(
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/execution/api.py", line 911, in __iter__
    yield from self.execution_context_manager.prepare_context()
  File "/opt/conda/lib/python3.9/site-packages/dagster/utils/__init__.py", line 470, in generate_setup_events
    obj = next(self.generator)
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/execution/context_creation_pipeline.py", line 322, in orchestration_context_event_generator
    context_creation_data = create_context_creation_data(
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/execution/context_creation_pipeline.py", line 140, in create_context_creation_data
    resource_keys_to_init=get_required_resource_keys_to_init(
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/execution/resources_init.py", line 342, in get_required_resource_keys_to_init
    hook_defs = pipeline_def.get_all_hooks_for_handle(step.solid_handle)
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/definitions/pipeline_definition.py", line 564, in get_all_hooks_for_handle
    solid = self._graph_def.solid_named(name)
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/definitions/graph_definition.py", line 297, in solid_named
    check.invariant(
  File "/opt/conda/lib/python3.9/site-packages/dagster/_check/__init__.py", line 1433, in invariant
    raise CheckError(f"Invariant failed. Description: {desc}")

(1) can be fixed by manually passing

KEY=value

variables. This is inconvenient/iconsistent but doable. Do you think you could support the pass-through mode in the future? (2) slack notifications work then. However, it is still inclear for me where to find the dagster daemon logs (3) I have no explanation here and really woould love to get this to work