https://dagster.io/ logo
#ask-community
Title
# ask-community
j

Joris Ganne

12/29/2022, 10:28 AM
Hey everyone, I'm looking to migrate an existing ETL flow to dagster but I'm not sure how to configure/implement dagster in my specific situation using best practices. At the moment, i have two meta tables. One meta table contains the dependency logic of the different tables (assets in dagster I assume) in the form of a column 'Source' and 'Target'. When one 'source' have multiple targets, the table contains multiple rows for that source. Further, the second table links each table with a stored procedure which do the necessary transformation steps. Now I want to implement this logic into dagster. Which concepts should I use for the different components of the ETL logic and how can I generate, based on the meta tables, the different assets/ops/jobs in dagster? Thank you in advance!
The method I'm currently experimenting with is to create 'fabrics' of the different components which was mentioned in another question
Another way I'm exploring is to use jinja templates to generate all the code based on the meta tables
And what about the GraphDefinition in Dagster?
o

owen

12/29/2022, 2:47 PM
hi @Joris Ganne ! how often are these meta tables updated, and via what sort of process? in my mind, the first step is to decide where the source of truth for these transformations should live (either the tables themselves, or a code artifact that gets used to update the state of those tables). I'm somewhat biased towards having a version-controlled source of truth (so this would be having some yaml or python format for defining these transformations, then parsing that both to create dagster assets as well as to set the state of those meta tables). If you do have requirements for the source of truth to be the other way around (i.e. source of truth is the tables themselves), then I'd recommend a sort of asset factory pattern. Basically just have a function (excuse the pseudocode) like
Copy code
def get_assets_from_meta_tables():
    meta_table_one = query("select * from tbl1")
    meta_table_two = query("select * from tbl2")
    
    upstream_tbls_for_tbl = defaultdict(set)
    for source, target in meta_table_one:
        upstream_tbls_for_tbl[target].add(source)
j

Joris Ganne

12/29/2022, 3:13 PM
Hi @owen, thank you for your quick and extensive reply. I'm also struggling with what my source of truth should be. I'm a recent junior in the company I'm working for so I need to discuss this further with my team. I think that they will prefer to use the meta tables as source of truth. But I will keep both options open.