Posted this in <#C02LJ7G0LAZ|dagster-cloud> , hopi...
# ask-community
a
Posted this in #dagster-cloud , hoping to get assistance. Thanks! https://dagster.slack.com/archives/C02LJ7G0LAZ/p1684371830418209
k
@Ashwin Kamath I have also been seeing this error with dynamic partitions lately. In my case though, there is nothing tricky about them - they are supposed to be the same dynamic partitions for both assets, and I still get this error on the downstream asset.
a
Thanks for confirming, the strange thing about this is that the stack trace seems to be using the default partition mapping class instead of my custom one. Unclear to me how this code is called in the first place as its all within the cloud agent
c
Hey Ashwin, would you mind sharing your asset defs and the partition mapping def? Does this error happen locally too or just on cloud?
@Kate Ross It's surprising that you're seeing this error. By default, dagster assumes that if both the upstream and downstream asset are partitioned, but not time partitioned, that the partition mapping is the
IdentityPartitionMapping
. So whatever keys are select for the upstream asset are also selected for the downstream asset, and vice versa. Do you happen to have asset defs that I'd be able to repro this with?
k
Hi @claire, yes, it seems to be a bug of some sort (unless I am just missing something - entirely possible!). I can materialize the downstream assets one-by-one, but when I try to materialize a range of them I get an error about missing upstream assets. The partitions are being created by a schedule. I’m not sure how to send the partitions def without sending a lot of code…
c
Is this happening on cloud or locally?
k
Locally
a
My issue occurs on cloud and local, sure can try to create a simple repro
ty spinny 1
Is there a way to override the IdentityPartitionMapping? My upstream and downstream keys are not identical, which may be part of the issue
"2020-01-01-UTC" on the dynamic partition def, "2020-01-01" on the MonthlyPartitionDef
c
Yeah, you can override it by doing something like this:
Copy code
@asset(
    partitions_def=partitions_def,
    ins={"asset1": AssetIn(partition_mapping=<Your partition mapping def here>},
)
def downstream_asset1(asset1):
    return 2
a
I have that set
It works fine normally, only fails in backfill
And if you look through the stack trace, it is not using my custom partition mapping, it never makes it there
c
Got it. Once you send me the repro I will take a look
a
This is the partition mapping, I will try to get you a repro in the next day thanks
@claire got a simple repro here. Steps are the following:
Copy code
################## REPRODUCTION STEPS ##################
#
#  1. Run `dagster dev -f partition_mapping.py`
#  2. Open dashboard in browser.
#  3. Wait for sensor to kick in and materialize all partitions of upstream_asset.
#  4. Trigger a backfill of downstream_asset via "Materialize selected.." on the Asset graph.
#  5. Go to backfill page at http://.../overview/backfills.
#  6. Click failure button to see "Nonexistent partition keys:" error.
#  
#  NOTE: Triggering the backfill from the `downstream_job` page results in a successful backfill.
#        It only fails from the asset page itself.
#
########################################################
Thanks for the help!
Bottom backfill is triggered from asset page and hits the partition mapping bug, top one is triggered from job page and performs as expected
c
Hey Ashwin. Appreciate the detailed repro here, I was able to repro this. Unfortunately writing your own custom partition mapping is unsupported at this time. Subsequently for asset backfills, we ignore partition mappings that aren't built-in as part of dagster, which is why your partition mapping is being ignored here. This kind of issue is a known pain point in our existing system--supporting custom partition mappings requires a function call over gRPC to fetch partition dependencies that would make things like auto-materialization potentially prohibitively slow, which is why we don't support this behavior. But it is annoying that users can't define their own partition mappings like in your case. I would recommend filing an issue for this in the meantime. We have a similar issue here: https://github.com/dagster-io/dagster/issues/13139 One possible alternative would be to implement your own custom subclass of the
TimeWindowPartitionMapping
TimeWindowPartitionsDefinition
, and use that as your upstream asset partitions def instead of a
DynamicPartitionsDefinition
.
a
Thanks for the response, that is unfortunate. I am in need of a "Business Days Partition" and Dynamic seemed to be the easiest approach. If I were to implement a custom subclass of TimeWindowPartitionMapping, would that get recognized during asset backfills? Seems to me it may run into the same issue, or am I not understanding things correctly?
Hmm.. just experimented a bit more, if I remove the "-UTC" from the dynamic partition keys and ensure that there is an upstream key that exactly matches the downstream key, it works then. I think I understand now why
TimeWIndowPartitionMapping
makes sense. The behavior I'm seeing right now is due to the default
IdentityPartitionMapping
chosen here - https://github.com/dagster-io/dagster/blob/98bc51c47d4c6e4e8a6e2f40e232ebf066e8c7e[…]_modules/dagster/dagster/_core/definitions/partition_mapping.py
Do you have recommendations for creating a custom TimeWindowPartitionMapping for business days? This needs to account for certain holidays so a cron schedule is unfortunately not sufficient. Essentially I have a list of dates that would serve as as the partition ranges, and would need to turn that into a TimeWindowPartitionMapping
@claire just to close the loop here - do you have any advice on the above ^
c
Hey Ashwin, yeah, a custom subclass of TimeWindowPartitionMapping should not be recognized during the asset backfills. I had a typo in my message--instead of subclassing
TimeWindowPartitionMapping
, i meant
TimeWindowPartitionsDefinition
. I don't have personal experience doing something like this, but one possible implementation strategy would have an "exclude_days" arg in the constructor.
Though, thinking about it more, I think it seems reasonable to have a built-in partition mapping that maps dynamic partitions to time partitions like yours does. If this mapping is provided in the dagster library, asset backfills will work for you. I imagine that this is also a functionality that others would want. I'd recommend filing an issue to include this partition mapping in Dagster
a
@claire - Hmm following up here, either way works for me.
exclude_days
or dynamic partition mapping. I do need something reasonably quickly so will try to dig deeper into the dagster internals
Thanks for the advice