https://dagster.io/ logo
Title
k

kyle

10/28/2022, 6:12 AM
Hello, I would appreciate if someone could point me in the right direction here as I am new to Dagster. I want to create a new asset (text file 2) from a source asset (text file 1). Text file 1 is stored in AWS s3 and I also want to store text file 2 in AWS s3. The files are named by a unique ID (e.g. id123456.txt), and new files show up in AWS s3 daily. The id also shows up in a database before the file shows up in AWS s3. I would like to be able to create the new asset (text file 2) from all existing source assets (text file 1) and from any new source assets (text file 1) that show up each day. Could anyone describe to me how I should be thinking about this? In my head I am thinking that I should start by using the ids in the database table to define a partitioned asset?
s

sandy

11/14/2022, 5:08 PM
Hey Kyle - I noticed that we missed this question. Are you still trying to figure this out? The short answer is that there isn't an easy way in Dagster right now to implement the pattern you're talking about, because we don't yet have great support for "dynamic" partitioned assets. We're aiming to remedy this soon: https://github.com/dagster-io/dagster/issues/7943
k

kyle

11/16/2022, 9:57 PM
I see, yes a dynamic partitioned asset is what I would need. For me, most of my data is not time partitioned, as I work with bioinformatics data so it makes sense to watch a bucket for new files. In some of my particular use cases I get sporadic NSG sequencing data dumps to a aws s3 bucket. The time stamp is not really what matters, but the ID of the files being dumped.
Thanks for the response.
s

sandy

01/05/2023, 11:45 PM
Circling back here, we're now prioritizing first-class support for use cases like the ones you brought up and gathering detailed requirements. Would either of you be open to a short call to talk about some of the details of what you're trying to do?