Hi All, could you please help me designing a pipeline?
I have data coming as files through a FTP system every 30 minutes (I need to deal with some eventual delays).
The data come in multiple files.
I need to process those files and write the data to the database, and I also need to do some processing to compute new data after the data is in the database.
First thought was to just build assets representing each file partitioned with 30 min time window.
A more refined idea was to make one asset representing FTP Data, which would be multi-partitioned in 30 min time window and also statically by file identifier (there is a fixed number of files per time window)
Then I could simply build downstream assets to represent the computations, probably using some custom partition mappings to redirect each file to the appropriate asset
However, using 30-min assets would make my day-by-day monitoring a little too heavy, since i would need to check 48 partitions instead of only one.
Daily Assets are my main monitoring concern.
Also, I would like to discuss how the 30-min asset approach would scale down to 5-min windows.
Also, I don't know if this is be the best place to have such discussion.
Would it be better to have this posted on Dagster Github Discussions?