Hey! I’m new to dagster. I’m starting on the dagst...
# ask-community
a
Hey! I’m new to dagster. I’m starting on the dagster cloud trial I’ve setting up my first project. I’ve got a process already written that queries an api and pushes to a postgres table. I think the ideal would be for that to all be wrapped in dagster, but that’s a whole lot of compute being used, and not a short term priority Instead, I want to start my pipeline with the assumption that the initial table is up to date. All of my downstream assets are going to reference this table, and create tables via dbt or pandas. Would best practice be to do an asset that’s a simple select * from that table?
r
check out the dagster + dbt tutorial here: https://docs.dagster.io/integrations/dbt and sample repo structures here for mixing Python + DBT assets: https://github.com/dagster-io/dagster/tree/master/examples/assets_modern_data_stack they have an example where a python asset reads in an upstream DBT generated table as a pandas dataframe
z
For downstream pandas assets, you'd probably want to model the initial table as a SourceAsset with an IO manager attached to it which reads the table and returns a dataframe. I haven't really used dbt so I'm not sure what the best practice is there, but I find it likely that the SourceAsset part will still apply for modeling the initial table
1
c
+1 on Zach's comment about modeling the initial table as a source asset. This way, when the downstream assets re-materialize, they can pull in the updated contents from the table.
r
the example I posted uses a DB IO manager (I think for Postgres or DuckDB, I forget). the source asset is implicit in the asset method argument, and corresponds to the DBT model name