What’s the best way to define an asset with 6 hour...
# ask-ai
f
What’s the best way to define an asset with 6 hourly partitions when such asset depends on an upstream asset with daily paritions
s
To define an asset with 6 hourly partitions that depends on an upstream asset with daily partitions, you would typically use a
HourlyPartitionsDefinition
for the downstream asset and set up the partition dependencies to reflect that each 6-hourly partition depends on the corresponding daily partition of the upstream asset. Here's a step-by-step approach based on the documentation: 1. Define the
HourlyPartitionsDefinition
for the downstream asset, specifying the start date and time from which the hourly partitions should begin. 2. Define the
DailyPartitionsDefinition
for the upstream asset, specifying the start date from which the daily partitions should begin. 3. Set up the partition dependencies. According to the documentation, when both the upstream and downstream assets are time window-partitioned, each partition in the downstream asset will depend on all partitions in the upstream asset that intersect its time window. In this case, since the downstream asset is partitioned hourly and depends on a daily partitioned upstream asset, you would expect that each set of 6 hourly partitions (covering a full day) would depend on the corresponding daily partition of the upstream asset. Here's an example of how you might define these assets in code:
Copy code
python from datetime import datetime from dagster import asset, HourlyPartitionsDefinition, DailyPartitionsDefinition  start_date = datetime(2022, 1, 1)  upstream_partitions = DailyPartitionsDefinition(start_date=start_date.strftime("%Y-%m-%d")) downstream_partitions = HourlyPartitionsDefinition(start_date=start_date.strftime("%Y-%m-%d-%H:%M"))  @asset(partitions_def=upstream_partitions) def upstream_asset():     # Logic for upstream asset     ...  @asset(partitions_def=downstream_partitions) def downstream_asset(upstream_asset):     # Logic for downstream asset that depends on the upstream asset     ...
In this example, the
downstream_asset
function takes
upstream_asset
as an input, indicating a dependency. The partitioning logic that determines how the hourly partitions of
downstream_asset
depend on the daily partitions of
upstream_asset
is managed by Dagster's default partition dependency rules, as described in the documentation. If you need to customize the dependency behavior beyond the default rules, you can use a
PartitionMapping
to define a custom mapping between partitions of the upstream and downstream assets. However, based on the documentation, the default behavior should handle the described scenario where each day's worth of hourly partitions depends on the corresponding daily partition of the upstream asset.
f
@Scout My partition definition is
Copy code
grib_wind_partitions = dagster.TimeWindowPartitionsDefinition(
    start=constants.STARBOARD_DATA_START_DATE,
    fmt="%Y%m%d%H%M%S",
    cron_schedule="0 */6 * * *",  # Six hourly partitions
    end_offset=int(15 * 24 / 6),  # 15 days
)
Now I’m getting the error
ValueError: time data '2024-03-13 12:00:00' does not match format '%Y%m%d%H%M%S'
. How can I solve it?
s
The error you're encountering indicates that there's a mismatch between the datetime format you've specified in your
TimeWindowPartitionsDefinition
and the actual datetime string that's being parsed. The format
"%Y%m%d%H%M%S"
expects a datetime string without any separators, but the error message shows a datetime string with separators (
'2024-03-13 12:00:00'
). To resolve this issue, you need to ensure that the datetime format string you provide to the
fmt
parameter of
TimeWindowPartitionsDefinition
matches the format of the datetime strings you are working with. If your datetime strings are in the format
'2024-03-13 12:00:00'
, you should change the
fmt
parameter to match this format. Here's how you can adjust the
fmt
parameter:
Copy code
python grib_wind_partitions = dagster.TimeWindowPartitionsDefinition(     start=constants.STARBOARD_DATA_START_DATE,     fmt="%Y-%m-%d %H:%M:%S",  # Adjusted format to match the datetime string     cron_schedule="0 */6 * * *",  # Six hourly partitions     end_offset=int(15 * 24 / 6),  # 15 days )
Make sure that the
constants.STARBOARD_DATA_START_DATE
is also a string that matches the format
"%Y-%m-%d %H:%M:%S"
. If it's a
datetime
object, you'll need to convert it to a string using the same format when you pass it to the
start
parameter. If you are using the
fmt
parameter to format the partition keys that Dagster generates, then you should keep it as
"%Y%m%d%H%M%S"
and ensure that any datetime strings you use elsewhere in your code are formatted to match this pattern. This would mean removing the separators from the datetime string before it's parsed or formatted.