https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Andrew Smith

08/16/2022, 8:10 PM
We are trying to migrate a series of tables from one database to another using assets. The name of the tables are loaded as a list from a yaml file. We have a SQL connector resource that generates connections to the databases. What would be the best way to use assets to pull the different tables from SQL, store them to disk, and then use another asset to create them on the sink database? I was thinking of using
multi_asset
for pulling them and storing them on disk. However, I'm not sure how I would feed those in to another asset without creating a unique asset for each table (not an option, as tables need to be defined in a yaml config file).
s

sandy

08/16/2022, 10:01 PM
could you read the yaml file when constructing the multi_asset?
a

Andrew Smith

08/16/2022, 10:46 PM
Right now we load the yaml in the repo and pass it to the job as a config. I think the part that confuses me is how best to have a single asset generate a series of files to disk, and then have the downstream asset expect those
n
number files. That is, without defining each table as its own asset. Outside of dagster I would just loop through a list of table names, writing the tables to a temp directory, and then have another function load everything in said file. However, that doesn't feel dagsterish, and doesn't fit well with the asset driven design.
s

sandy

08/16/2022, 10:48 PM
got it - nothing stops you from treating that entire directory / collection of files as a single asset. if you want to be able to materialize and track materializations of individual files within the directory, you could model them as partitions of the asset
a

Andrew Smith

08/17/2022, 4:20 PM
If I were to use partitions, I would use
dagster.StaticPartitionsDefinition
correct? There isn't really any examples or documentation on using this with assets. Would I more or less use it in the same fashion as shown with ops?
Copy code
CONTINENTS = [
    "Africa",
    "Antarctica",
    "Asia",
    "Europe",
    "North America",
    "Oceania",
    "South America",
]


@static_partitioned_config(partition_keys=CONTINENTS)
def continent_config(partition_key: str):
    return {"ops": {"continent_op": {"config": {"continent_name": partition_key}}}}


@op(config_schema={"continent_name": str})
def continent_op(context):
    <http://context.log.info|context.log.info>(context.op_config["continent_name"])


@job(config=continent_config)
def continent_job():
    continent_op()
s

sandy

08/17/2022, 4:21 PM
it would look like this code snippet: https://docs.dagster.io/concepts/partitions-schedules-sensors/partitions#partitioned-assets but you'd replace
DailyPartitionsDefinition(start_date=...)
with
StaticPartitionsDefinition(CONTINENTS)
a

Andrew Smith

08/17/2022, 4:22 PM
Thanks, I'll give this a shot
3 Views