Hi :slightly_smiling_face: I'm trying to dabble wi...
# announcements
a
Hi ๐Ÿ™‚ I'm trying to dabble with Dagster to see if it fits my ML team's use cases, we are currently on Airflow but Dagster seems really promising and way more ergonomic to use, especially for rapid pipeline development! I'm facing an issue trying to run
dagit
locally connected to a remote Kubernetes cluster with Dagster installed. I'm trying to use the
K8sRunLauncher.
Dagit starts fine and I can see my pipeline, but when i trigger it and check the pod logs I see this error:
Copy code
dagster.check.ParameterCheckError: Param "pipeline_run" is not a PipelineRun. Got None which is type <class 'NoneType'>.
the args of the pod look like this:
Copy code
api
      execute_run_with_structured_logs
      {"__class__": "ExecuteRunArgs", "instance_ref": null, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "pipeline", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "FileCodePointer", "fn_name": "pipeline", "python_file": "/Users/[redacted]/Projects/ml-platform/dsdk/pipeline.py", "working_directory": null}, "executable_path": "/Users/[redacted]/Projects/ml-platform/dsdk/.venv/bin/python"}}, "pipeline_run_id": "9a037a94-c7cc-4fd0-a737-ab65d46b5ab4"}
The local paths look suspicious but are they the cause for the error? This is my launcher config but I'm not sure if it's relevant
Copy code
run_launcher:
    module: dagster_k8s.launcher
    class: K8sRunLauncher
    config:
        dagster_home: 
            env: DAGSTER_HOME
        instance_config_map: dagster-instance
        service_account_name: dagster
        postgres_password_secret: dagster-postgresql-secret
        job_image: [REDACTED].<http://dkr.ecr.us-east-1.amazonaws.com/dsdk:dagster|dkr.ecr.us-east-1.amazonaws.com/dsdk:dagster>
        job_namespace: datascience
        load_incluster_config: false
        image_pull_policy: Always
can anyone help me? Thanks in advance! And thanks again for this great project ๐Ÿ™‚
j
Hi @Alessandro Marrella, thanks for trying Dagit! What it looks like is happening is that the Dagit that youโ€™re running locally is using local storage, as you pointed out
Generally we host dagit in k8s but it should be possible to host it locally, is that a requirement for you?
a
my requirement is to be able to change pipelines quickly and run them to see if they work without needing to republish containers, i thought local dagit would be a solution to that but perhaps i'm looking the wrong direction. How do you commonly do rapid prototyping of pipelines? (considering the containers that are run need more resources than the local machine)
hosted dagit seems to work perfectly, so if the only solution is to keep publishing the pipeline container i'm fine with that, i was just looking to see if there is something faster ๐Ÿ™‚ i was imagining the scheduling / pipeline run happening locally and the actual tasks being run on k8s, but i guess that would require publishing the container anyway ๐Ÿค”
j
Do the pipelines need to execute in a k8s cluster? For dev loop we usually recommend having a local dagster deployment
a
yeah we need kubernetes mostly because of the memory/cpu/gpu requirements. Also some things like pod IAM roles are only testable live on the cluster
I guess though the dagster "type checker" already checks for a lot of mistakes and for that it's enough to test locally
j
Gotcha. Currently pushing images is the only dev loop we have on k8s, though there have been discussions of doing a git-based deployment and you could also imagine keeping a container open and doing some type of file sync
The quickest way to get up and running with that is with our helm chart
a
got it, thanks a lot i will try with that ๐Ÿ™‚ so far i'm liking dagster a lot
j
Glad to hear it! Let me know if you run into any issues
๐Ÿ™‡โ€โ™‚๏ธ 1