to follow-up on
my previous questions, I'm wondering about the best way to ingest incoming data. Basically, at irregular intervals, a new file will pop up on an FTP server. Each file will not contain the same amount of data (sometimes one hour, sometime 2 days,...).
For now, I have a sensor that kicks off a job to:
• read the new file
• append the data to a database table
• the table is represented as a daily partitioned asset, because that's how it's processed downstream. The job thus has to figure out which partitions have been affected by the new data, and logs an AssetMaterialization event for the relevant partitions.
From this point on the data can be used by the downstream assets.
Is there a better way to do this ?