hey all -- i have a situation where i need to back...
# ask-community
s
hey all -- i have a situation where i need to backfill some assets. the logic aligns with this:
Copy code
@asset
def ml_model_training_job():
    job_name = uuid.uuid4()
    return job_name

@asset
def do_something_with_job_data(ml_model_training_job):
    output = "fun_things_with" + model_training_job
    return output
in our case, the upstream job has run for weeks, and we just added the downstream job. how can i run the downstream asset code against all previous versions of the upstream asset? (I am happy to do this completely offline with a script, i.e. not in dagit) something like
Copy code
for name in ["foo", "bar", "baz"]:
    materialize(do_something_with_job_data(name))
digging deeper, if i try to materialize just the one, it gives me an error because the upstream asset doesn't exist. if i try to use a sourceasset, it throws an error because the file doesn't exist.
Copy code
if __name__ == "__main__":
    from dagster import asset, materialize, SourceAsset
    import uuid

    @asset
    def ml_model_training_job():
        job_name = uuid.uuid4()
        return job_name

    @asset
    def do_something_with_job_data(ml_model_training_job):
        output = "fun_things_with" + ml_model_training_job
        return output

    for name in ["foo", "bar", "baz"]:
        spoofed = SourceAsset("ml_model_training_job")
        materialize([spoofed, do_something_with_job_data])
s
Hi Stephen, I understand you have a simplified example so I’m not sure this applies to your actual use case, but in your script, could you not move your
ml_model_tranining_job
inside the for-loop and just return
name
from it? (I may be misunderstanding you-- I’m interpreting
foo
etc as job names returned from
ml_model_training_job
)
Copy code
for name in ["foo", "bar", "baz"]:

        @asset
        def ml_model_training_job():
            return name
        materialize([ml_model_training_job, do_something_with_job_data])
In any case we definitely need a better API for “stubbing out” asset values like this.
s
yeah, the issue is that i don't want to actually materialize the model training job -- those have already happened in the past. so basically i need to run
do_something_with_job_data
while passing in the values from previous runs.
s
I understand you chatted with Sandy this morning about this-- is your issue resolved now?