Hi I have a question regarding partitions I ve hit a bit of dagster #ask-community

Hi, I have a question regarding partitions - I've ...

Jonny Wray

03/12/2023, 5:28 PM

Hi, I have a question regarding partitions - I've hit a bit of a wall either will my understanding or not sure how to achieve what I'm trying. • I have a partition asset with two dimension - the date and a static array of parameters (time window length, but not sure it matters for this). I have an IO manager used for the in on an asset that reads values from a DB using the date and window length as parameters. The asset then calculates some statistics from values in those data frames. • My current mental model is an asset generated for every pair of (date, window length) • The upstream data the IO manager reads from isn't partitioned and is a dbt created asset. • I have a job that runs at 5am based on a cron that materializes the dbt asset. Now I'd like it to also materialize the downstream partitioned assets for all time windows and the current date. I guess my question is really am I viewing this correctly in terms of the partitioned model, and how do I achieve my goal of the cron job materializing the dbt asset daily which then materializes the downstream partitioned dependencies, but just for the current day? Thanks in advance.

claire

03/14/2023, 4:14 PM

Hi Jonny. Is there a reason why the upstream dbt asset isn't partitioned by day, given that you want to rematerialize it every day? In terms of automatically materializing the downstream partitions but just for the current day after the upstream asset is rematerialized, I'd recommend creating your own sensor that tracks when the upstream asset is materialized. Then, the sensor can calculate the last day in your daily partitions def, and kick off a run request for each multipartition with that given day.

Jonny Wray

03/14/2023, 4:26 PM

Hi Claire - thanks for the reply. I've simplified by situation a bit, moving to a one dimensional partition but still a bit blocked. My upstream dbt asset uses incremental materialization within dbt - the time based partition key is not used within the dbt job. Would you still recommend adding a daily partition to the dbt asset even if it does not use the values are parameters? I'm sure I had a reason for not adding a partition to the dbt upstream asset (over an above it not needing the key) but I don't remember anymore what it was. I've just hit another barrier also - that in my current set up I get an

The input does not correspond to a partitioned asset

when the downstream asset tries to materialize which, I assume, that adding a partition to the upstream would fix. Thanks for the sensor suggestion. I may still go that route if I go back to the multiple partitions, but I'd like to get this simpler situation working first - and get my understanding of how partitions work a bit clearer.

claire

03/14/2023, 4:58 PM

Ah I see. Yep, currently we throw that error when a partitioned asset depends on an unpartitioned asset. Would you mind filing an issue for this? I think ideally all downstream partitions depend on the upstream asset. You could fix this by adding 1 partition to your upstream asset in the meantime.

Jonny Wray

03/14/2023, 4:59 PM

Indeed it does - just been experimenting with that. I'll file an issue for sure.

Jonny Wray

03/14/2023, 5:01 PM

In terms of the logical model in this situation - would you say it still makes sense (issue aside) to add the partition to the upstream asset? It sort of is partitioned by day, but the dagster partition capabilities aren't actually used when materializing the dbt asset. I can sort of see it both ways.

claire

03/14/2023, 5:48 PM

I think it's reasonable to have it partitioned by day, given that you want the asset to update with new contents each day

👍 1

Jonny Wray

03/14/2023, 5:59 PM

Thanks. I’ll run with that then.

Open in Slack

Previous Next