https://dagster.io/ logo
#deployment-ecs
Title
# deployment-ecs
r

Randy Coburn

12/13/2021, 10:54 AM
Hi all, we have managed to get our ECS Task running. We have a slightly more complicated design than what it seems dagster is designed to do. we have a process manager inside the container that goes and collects the secrets that we need and sets the names as env values as we need them. We noticed that dagster passes in the commands that it wants to run as a CMD in the task. We used this to our advantage and collected it and stored it in a file, then collected our secrets and then ran the code that was passed in the CMD. However we seem to be getting an error from dagster as follows:
Copy code
Error: Got unexpected extra arguments (ExecuteRunArgs, instance_ref: {__JSON__BLOB__}
The code that we are running is exactly:
Copy code
eval '/usr/local/bin/python -m dagster api execute_run {__JSON_BLOB__}'
We use eval because it is being run from a bash script. We are not sure what the "extra arguments" are? If you need more information about this please ask but, any help would be great...
j

jordan

12/13/2021, 3:10 PM
That error is actually coming from Click which is what the
dagster
CLI uses under the hood. Does your Dockerfile also have a
CMD
or
ENTRYPOINT
specified? See: https://aws.amazon.com/blogs/opensource/demystifying-entrypoint-cmd-docker/
Also, you might be interested in these changes (available as of Dagster 0.13.10: https://docs.dagster.io/deployment/guides/ecs#secrets-management
r

Randy Coburn

12/13/2021, 3:14 PM
We do have a ENTRYPOINT specified. it uses a process manager as explained. https://github.com/morfien101/launch We use it to collect secrets as we collect many of them and change the names that are presented in secret manager. https://github.com/morfien101/aws-secrets-reader The way we do this is to first check to see if there was a CMD passed in, we don't use it as we place everything in the ENTRYPOINT of the docker file. If there is one, we write exactly what was given into a file, collect our secrets and then execute the passed in CMD.
We collect many secrets that have the same internal names.
username, password, database
however we change the names to be
service_a_username
,
service_b_username
.
j

jordan

12/13/2021, 3:18 PM
Can I see your
ENTRYPOINT
? I still suspect it has something to do with how ECS is concatenating
ENTRYPOINT
and
CMD
.
r

Randy Coburn

12/13/2021, 3:18 PM
sure, 1 min
In the dockerfile
Copy code
# CMD allows this to be overridden from run launchers or executors that want
# to run other commands against your repository
ENTRYPOINT ["/bin/bash", "/cmd_run.sh"]
CMD [""]
In the cmd_run.sh
Copy code
#!/bin/bash
if [ ! -z "$(echo -n $@ | base64)" ]; then
  # When we have a cmd passed in we will use it.
  # We do this by checking if the array of passed in values is more than zero in length
  LAUNCH_WITH_CMD=/launch-with-cmd.yaml
  echo "Got a CMD passed in."
  CMD_SCRIPT="/cmd.sh"
  cat << EOF > $CMD_SCRIPT
#!/bin/bash
echo "Starting CMD"
eval '$@'
EOF
  /launch -f $LAUNCH_WITH_CMD
else
  # Else we will just run the GRPC container.
  /launch -f /launch-grpc-code.yaml 
fi
In the ECS console I see only a CMD passed in. I can also confirm that the container is running that code as expected.
j

jordan

12/13/2021, 3:27 PM
what’s in the two yaml files?
This answer seems to suggest whitespacing in your JSON blob could also be at play: https://stackoverflow.com/a/50921236 So definitely try that as well. But there’s enough indirection happening within the ENTRYPOINT and the CMD that I have a feeling we might find we just have some subtle inconsistency with how the two are concatenated together.
r

Randy Coburn

12/13/2021, 3:32 PM
They describe the run for launch.
launch-with-cmd.yaml
Copy code
processes:
  secret_processes:
    - name: SERVICE_A_RDS_CREDENTIALS
      command: /aws-secret-reader
      arguments:
        - -secret
        - {{ env "SERVICE_A_RDS_SECRET_HINT" }}
        - -upper-case
        - -prepend-with
        - SERVICE_A_RDS_
    - name: SERVICE_B_CREDENTIALS
      command: /aws-secret-reader
      arguments:
        - -secret
        - {{ env "SERVICE_B_SECRET_HINT" }}
        - -upper-case
        - -prepend-with
        - SERVICE_B_
  main_processes:
    - name: CMD
      command: /bin/bash
      arguments:
        - /cmd.sh
launch-grpc-code.yaml
Copy code
processes:
  secret_processes:
    - name: SERVICE_A_RDS_CREDENTIALS
      command: /aws-secret-reader
      arguments:
        - -secret
        - {{ env "SERVICE_A_RDS_SECRET_HINT" }}
        - -upper-case
        - -prepend-with
        - SERVICE_A_RDS_
    - name: SERVICE_B_CREDENTIALS
      command: /aws-secret-reader
      arguments:
        - -secret
        - {{ env "SERVICE_B_SECRET_HINT" }}
        - -upper-case
        - -prepend-with
        - SERVICE_B_
  main_processes:
  - name: DAG_REPO_GRPC
    command: /usr/local/bin/dagster
    arguments:
    - api
    - grpc
    - -h
    - 0.0.0.0
    - -p
    - 4000
    - -f
    - /opt/dagster/app/repositories/datalake_repositories.py
Are you suggesting that the white space is in the JSON itself?
j

jordan

12/13/2021, 3:38 PM
It could be - I still think it’s more likely that the ENTRYPOINT and CMD are getting concatenated together incorrectly. But it’s at least worth ruling out that the JSON isn’t malformed.
r

Randy Coburn

12/13/2021, 3:46 PM
Copy code
eval '/usr/local/bin/python -m dagster api execute_run {"__class__": "ExecuteRunArgs", "instance_ref": {"__class__": "InstanceRef", "compute_logs_data": {"__class__": "ConfigurableClassData", "class_name": "LocalComputeLogManager", "config_yaml": "base_dir: /home/service_user/.dagster/storage\n", "module_name": "dagster.core.storage.local_compute_log_manager"}, "custom_instance_class_data": null, "event_storage_data": {"__class__": "ConfigurableClassData", "class_name": "PostgresEventLogStorage", "config_yaml": "postgres_db:\n db_name: dagster\n hostname: <http://dagster.xxx.rds.amazonaws.com|dagster.xxx.rds.amazonaws.com>\n password: xxx\n port: 5432\n username: xxx\n", "module_name": "dagster_postgres.event_log"}, "local_artifact_storage_data": {"__class__": "ConfigurableClassData", "class_name": "LocalArtifactStorage", "config_yaml": "base_dir: /home/service_user/.dagster\n", "module_name": "dagster.core.storage.root"}, "run_coordinator_data": {"__class__": "ConfigurableClassData", "class_name": "QueuedRunCoordinator", "config_yaml": "{}\n", "module_name": "dagster.core.run_coordinator"}, "run_launcher_data": {"__class__": "ConfigurableClassData", "class_name": "EcsRunLauncher", "config_yaml": "container_name: dlakepipe-pipeline-code\ntask_definition: arn:aws:ecs:eu-west-1:123:task-definition/dlakepipe-pipeline-code:1\n", "module_name": "dagster_aws.ecs"}, "run_storage_data": {"__class__": "ConfigurableClassData", "class_name": "PostgresRunStorage", "config_yaml": "postgres_db:\n db_name: dagster\n hostname: <http://dagster.xxx.rds.amazonaws.com|dagster.xxx.rds.amazonaws.com>\n password: xxx\n port: 5432\n username: xxx\n", "module_name": "dagster_postgres.run_storage"}, "schedule_storage_data": {"__class__": "ConfigurableClassData", "class_name": "PostgresScheduleStorage", "config_yaml": "postgres_db:\n db_name: dagster\n hostname: <http://dagster.xxx.rds.amazonaws.com|dagster.xxx.rds.amazonaws.com>\n password: xxx\n port: 5432\n username: xxx\n", "module_name": "dagster_postgres.schedule_storage"}, "scheduler_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterDaemonScheduler", "config_yaml": "{}\n", "module_name": "dagster.core.scheduler"}, "settings": {"telemetry": {"enabled": false}}}, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "my_job", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "FileCodePointer", "fn_name": "deploy_docker_repository", "python_file": "/opt/dagster/app/repo_test.py", "working_directory": "/opt/dagster/app"}, "container_image": null, "executable_path": "/usr/local/bin/python"}}, "pipeline_run_id": "8b3de847-95ba-4b98-97b5-f35733e18260"}'
After the capture this is what we have. I have removed the username and password stuff. However the JSON is valid. I tried to get eval to use it as to try stop any bash crazy stuff.
j

jordan

12/13/2021, 3:54 PM
Yeah, at a glance, that looks correct to me. At this point, I’d recommend checking all the places your ENTRYPOINT and CMD can be set (in the Dockerfile, in the ECS Task Definition, and in the ECS Task Container Overrides) and stepping through the logic described here to see what the full command ECS is running is: https://aws.amazon.com/blogs/opensource/demystifying-entrypoint-cmd-docker/
r

Randy Coburn

12/13/2021, 3:59 PM
Can you point me to the line that ECS Runner sets the CMD? I'm pretty sure that we have the ENTRYPOINT and CMD right. That snip comes from the container itself after it has captured the
$@
which should be the CMD
j

jordan

12/13/2021, 4:11 PM
There are 3 different places it can source the ENTRYPOINT From and 3 different places it can source the CMD from: ENTRYPOINT: • in the Dockerfile • in the ECS Task Definition’s Container Definitions • in the ECS Task’s Overrides CMD: • in the Dockerfile • in the ECS Task Definition’s Container Definitions • in the ECS Task’s Overrides By default, the ECS Run Launcher passes its command as an ECS Task’s Override: https://github.com/dagster-io/dagster/blob/c24f76950b8c80de3ed538e091fda9df7ee3e66c/python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py#L154-L157 If your ENTRYPOINT is being set in your Dockerfile, it still might be getting overridden by something in your Task Definition or in your Task. Or things might not be being concatenated the way you expect it to (particularly because of the combination of arrays and strings being used here).
r

Randy Coburn

02/17/2022, 8:23 AM
Hi @jordan We finally managed to figure this out. We shelved it over the holiday season but picked it up again a few days ago. It was basically multiple different shells parsing the text and stripping it of the quotes. This then made it seem like arguments rather than a JSON blob to the dagster process. I wonder if it would be useful for other to see how we do this as we execute code in the dagster task containers BEFORE we start the actual task. We do this to collect secrets and setup the environment using a tool called Launch. https://github.com/morfien101/launch
🙌 1
3 Views