https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Axel Bock

02/26/2023, 10:16 AM
Hi all, after reading docs and FAQ 😉 i still have questions. I can’t find any “recommended” way to do this: • find ZIP file on folder (for now, later S3) • unpack ZIP file (contains several CSV files which each represent one or more “invoices”) • pre-process the CSV files (e.g. split further into sub-CSVs if necessary) • create an invoice for each “final” CSV from what i’ve read, each “final CSV” file should be an asset, which is then processed. so my main trouble lies in the “transformation” from an “incomplete asset” (e.g. a ZIP file) to several “more fine-grained assets” (e.g. several CSV files) which are then, again, processed further. my initial thought was to have sensors, which react on asset creations, but i was unable to connect the components … :
Copy code
@op
def unpack_zip():
    # ... which zip?
    ...

@job
def handle_zip():
    # no idea where zip_file should come from
    dir = unpack_zip(zip_file)

    for f in os.listdir(dir):
        # now how do i create "smaller" assets (1 per CSV file)?
        # is that the right approach?        
        ...


@sensor(job=handle_zip)
def zip_file():
    # should each found ZIP file not be an asset itself?
    # also, what _is_ actually _run_ here?
    yield RunRequest(run_key="1", run_config={"file": "./testdata/1.zip"})
so frankly i’m kinda stuck and would appreciate any help here. please don’t just link to the docs without further explanation, i have read a lot of them and i really don’t see it … 😞 (btw, i would be happy to help out with them, once i understand it)
s

sandy

02/27/2023, 4:40 PM
hi Axel - there are two ways to accomplish this: • with dynamic graphs: https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#dynamic-graphs • with software-defined assets, the newly-introduced dynamic asset partitions: https://docs.dagster.io/concepts/partitions-schedules-sensors/partitions#dynamically-partitioned-assets eventually, we'd like to essentially merge these two so you can use them together, but we're not there yet. here's where we're tracking this: https://github.com/dagster-io/dagster/issues/9559
5 Views