It looks like the newer version of dagster (dagste...
# dagster-feedback
j
It looks like the newer version of dagster (dagster==1.3.13) will no longer show that an asset is stale. It instead relies on code and data versions. I actually like the old way of showing stale assets. I find it burdensome to have to maintain code versions for every single asset, especially when multiple collaborators are working on a single project, so I don't use that feature. For data versions, it seems like this should be accomplished "under-the-hood" by hashing the data objects without requiring the code version. I understand that there are times when the data is only changing because the code changed, but I think this is a good thing to indicate in case the code change is inadvertently changing the data.
🤖 1
s
Hi Jeff, Thanks for the feedback-- this change happened several months ago. If you’re not using code versions, then I believe the only real difference is you should now see an “New data” tag instead of the previous stale status-- the calculation of the state is the same.
For data versions, it seems like this should be accomplished “under-the-hood” by hashing the data objects without requiring the code version.
You have the option of doing this by attaching a
DataVersion
to the
Output
returned when materializing an asset, but this isn’t something dagster can do on its own, since Dagster does not concern itself with how physical assets are represented. See this guide: https://docs.dagster.io/guides/dagster/asset-versioning-and-caching
j
Thanks @sean! I don't see the 'New data' status on the downstream asset. It just shows as Materialized.
s
Copy code
from dagster import asset

@asset
def foo():
    return 1

@asset
def bar(foo):
    return foo + 1
•
dagit -f
this file • materialize foo • materialize bar • materialize foo again Then you should see this (sorry it’s “Upstream data”, not “New data”):
đź‘€ 1
j
@sean That works for me, interesting. Now I am trying to figure out why my example is showing that.
@sean It looks like it's because one of my assets had a
code_version
. If I add code versions to your example assets and materialize
foo
, I don't see
Upstream data
s
Yes-- that is because when there is a
code_version
on foo, the data version of foo does not change when you materialize it again (because the code version is the same and it has no inputs). Therefore there is no new “Upstream data” for bar no matter how many times you materialize foo. When there is no code version, we assume the code could have changed each time you materialize it, so the data version changes.
j
What if
foo
reads in data from S3 and that data is new? In that case, the code is the same but the data is updated.
s
Yes, that is possible-- when we have limited information we have to make assumptions. To model this case correctly you should determine a data version using user code inside the compute function and attach it to the output:
Copy code
@asset
def foo():
    data = get_data_from_s3()
    data_hash = hash_data(data)
    return Output(data, data_version=DataVersion(data_hash))
j
Ok I am following now. Thank you!