conceptual question: in redshift, you need to create a table before you can copy a parquet file from s3 into it. how does this translate into an asset?
is the asset typically created for the inserts AFTER the table was created?
or is the asset supposed to check if the table exists first, if so copy, if not create + copy?
or are you supposed to make an op and an asset? what's like the best practice here
03/14/2023, 8:27 PM
Great question! The answer for what's best will vary depending on what makes the most sense for your circumstances, but my rule of thumb is to what you mention in the latter. The asset's output is supposed to represent the end state of the asset, with the body of it being what it takes to get there. So yeah, check if the table exists, copy, otherwise make it.
That being said, have you written your logic for how to relate or trigger the S3 bucket to the table? Depending on how the job is triggered, you might want to add some observability to your buckets with source asset.