I’m new to dagster and building up my first pipeli...
# ask-community
a
I’m new to dagster and building up my first pipeline The data I’m dealing with is loaded into postgres regularly, and is grouped into chunks already by let’s say chunk_id, and loaded into a very large table (with an index on chunk_id) I think it makes sense to partition by chunk_id for the processing with a dynamic partitioning Is the best practice here to be as a first step, create a “chunk_ids” table that simply has all the chunk ids, and then query that when I make my chunk_ids_sensor? That way I don’t have to scan even the index of the massive table as often. And have that be in a separate job And everything else be in a job that uses the partitions? It seems complicated to set up, with the chunk_ids asset dependent on my sensor asset Or does a sensor being downstream of an asset take away the point of doing a sensor at all? Is there a simpler way to structure this? At the end of the day I just want to be able to efficiently partition by chunk_id
For now, I’ve just done it by date, and not used a sensor, though this isn’t quite as good for the business logic
s
What will your chunk_ids sensor do? Is it sensing new chunks getting added? Or sensing changes to existing chunks?
a
Definitely the first one, hopefully the second one