to follow-up on my previous questions, I'm wondering about the best way to ingest incoming data. Basically, at irregular intervals, a new file will pop up on an FTP server. Each file will not contain the same amount of data (sometimes one hour, sometime 2 days,...).
For now, I have a sensor that kicks off a job to:
• read the new file
• append the data to a database table
• the table is represented as a daily partitioned asset, because that's how it's processed downstream. The job thus has to figure out which partitions have been affected by the new data, and logs an AssetMaterialization event for the relevant partitions.
From this point on the data can be used by the downstream assets.
Is there a better way to do this ?
c
chris
01/18/2023, 8:00 PM
your approach sounds reasonable, it’s tricky because you need some mechanism to figure out which partitions are actually being constructed, hence the job - is there a piece that is causing particular pain?
c
Clément Masson
01/19/2023, 7:14 AM
no problem in particular, I'm just new to this and still trying to wrap my head around all the definitions and best practices. Thanks for the help !