https://dagster.io/ logo
Title
d

Drew You

03/28/2023, 2:06 PM
Is there a way to deal with self-referential assets well? In airflow, I would often do something like:
# ensure table exists in db
last_record = #query db for the latest record of a timeseries
fill_data = #query api for data between last_record and time.now()
#optionally do very long running processing task
#append fill_data to database
In dagster, I'm confused. This is often a time partition, but it might be i.e. minute-partitioned data that I want to schedule every 5 minutes and the range nature of the airflow query means the setup/teardown work gets optimized for free in longer queries.
t

Tobias Pankrath

03/28/2023, 2:16 PM
Would scheduling multiple partitions into one job/op be a solution?
d

Drew You

03/28/2023, 2:21 PM
oh, interesting. I hadn't thought of that
s

sean

03/28/2023, 2:46 PM
You can have self-referential dependencies in time-partitioned assets:
@asset(
        partitions_def=DailyPartitionsDefinition(start_date="2020-01-01"),
        ins={
            "a": AssetIn(
                partition_mapping=TimeWindowPartitionMapping(start_offset=-1, end_offset=-1)
            )
        },
    )
    def a(a):
        ...