I am brand new to dagster and thinking of how I can use dags dagster #announcements

I am brand new to dagster and thinking of how I ca...

Mikael Ene

03/02/2020, 4:51 PM

I am brand new to dagster and thinking of how I can use dagster in my current setting. Basically we are loading a lot of tables from a relational database to another relational database. What I want to do is: Define a solid that accepts databaseconnection and table_name. Input a list of tables to load and make dagster loop that list, inputting table_name in dagster and load in parallell. I have read about Reusable solids in the docs, but have not fully got how configuration can be used. Does anyone have a similar setup?

abhi

03/02/2020, 5:00 PM

The config system in terms of reusability is useful because it allows you to express a config schema for your solids. This allows consumers of that solid to understand what they need to configure if they were to use your solid in their pipelines. There are a few ways to express your solution. One way which uses the config system could be passing in your creds at config which then get transformed into a dB connection. The rest of your solid would perform the load logic you described.

alex

03/02/2020, 5:44 PM

you could cross reference this example as well: https://docs.dagster.io/latest/learn/demos/airline_demo https://github.com/dagster-io/dagster/tree/master/examples/dagster_examples/airline_demo

Mikael Ene

03/02/2020, 6:38 PM

🙏 I guess I should express my list of tables as .alias functions in the pipeline then? That would be the easiest way to run the same function with different settings. My idea of looping a list of inputs is not applicable? I experimented with prefects map-function and was looking for something similar. Thanks for the replay, I’ll give it a go when I have time 😀

alex

03/02/2020, 7:23 PM

ya we’re not there yet on a

map

feature - starting to experiment with some approaches

Vincent Goffin

03/03/2020, 1:09 PM

I've been trying the same kind of thing these days, I have a few queries that needs to be executed against different databases and data sent to another one. All these queries are in separate sql files in one folder per db/engine. I have a function that builds a CompositeSolidDefinition object based on the content of that folder, populating everything as needed as it iterates over the sql files: 1. solid_defs (appending SolidDefinition objects from a function) 2. dependencies (building a dictionary of DependencyDefinition objects mapping to empty dicts) 3. input_mappings (I map one input from the composite to each of the solids in there (db schema). 4. I also build the related env dict that is returned with the CompositeSolidDefinition, allowing me to change my input values on the fly (mostly table names). I then use these lists and dicts as inputs to create the CompositeSolidDefinition object.

26 Views

Open in Slack

Previous Next