Nadav Ben-Haim
09/25/2022, 6:46 AMStephen Bailey
09/25/2022, 5:55 PM@asset
def training_job():
todays_job = datetime.datetime.now().strfmt("%Y-%m-%d")
client = boto3.client("sagemaker")
client. create_and_poll_training_job( TrainingJobName=todays_job, **other_params)
details = client.describe_processing_job(todays_job)
return details
@asset
def model_version(training_job: dict):
artifact = training_job["artifact_s3_uri"]
client.create_model(name="my_model", artifact = artifact)
details = client.describe_model("my_moel")
return details
k8s_job_op
to have it launched as its own docker image (or some similar approach where the dagster ops is actually launching a secondary job somewhere to do the work -- this is how sagemaker works for example). you can also link together assets from two different code locations using SourceAsset
, although in the case you mention above it's probably not best practice.
wrt to multi-image asset graphs, one approach you could use is doing something like k8s_job_op
Nadav Ben-Haim
09/26/2022, 9:11 AMStephen Bailey
09/26/2022, 2:29 PMop
, so you can't call other jobs unless you hit the graphql api from within the job afaik# all defined in code location 1
@asset(executor_def=image_1)
def my_first_model():
model.fit()
return model
@asset(executor_def=image_2)
def my_second_model():
model.fit()
return model
because both of those would be launched from the Code Location 1 image daemon (which could be docker or kubernetes). but you could do something like:
# all defined in code location 1
@asset
def my_first_model():
docker_run_launcher.run_image(image_1)
metadata = fetch_metadata()
return metadata
@asset
def my_second_model():
docker_run_launcher.run_image(image_2)
metadata = fetch_metadata()
return metadata
but, im going to confess im hitting my limits of understanding hereNadav Ben-Haim
09/26/2022, 2:43 PMdocker run
via the API from within an asset@asset(docker_image = 'hello-world')
claire
09/30/2022, 5:26 PMk8s_job_executor
or celery_k8s_job_executor
. https://dagster.slack.com/archives/C01U954MEER/p1644335565408589?thread_ts=1644334246.680049&cid=C01U954MEER@asset
you would specify op_tags
instead:
@asset(op_tags={...})
with the code snippet in Johann's response