Jacob Marcil
01/11/2023, 1:52 PMmy_dagster_project
is a really cool name 😅, but I might want to change it. I just don’t know to what.sandy
01/11/2023, 6:14 PMyuhan
01/12/2023, 1:10 AMJacob Marcil
01/12/2023, 1:56 PMassets
folder as another folder, what you be in another workspace
, and what you be in another github repo
.
• So we have a bunch of Blockchain data, all of this data is segmented by networks.
• We have third party API call that we are going to put a on schedule.
• Some of the API calls are blockchain related, some are not.
• We want to orchestrate DBT, using DBT Cloud. We have 1 DBT Github Repo per blockchain and 1 global for the other stuffs.
• All data is store in the end Snowflake
• It would be nice if we can store every raw
file in S3 when we gather the new data.
The Dagster structure we are following is the project_fully_feature
example.
Any help is appreciated 🙂yuhan
01/13/2023, 12:07 AMassets
folder and you can divide assets into groups where each group can be its own folder. this structure would benefit together from using load_assets_from_package_module
or load_assets_from_modules
to load assets to your definitions based on the folder structure.
• because you’re thinking of dbt cloud, you can use load_assets_from_dbt_cloud_job
api to load the dbt project jobs by specifying a dbt cloud url, so you don’t have to put dbt projects alongside with dagster code.
• as for another workspace or github repo, first of all, we don’t recommend over-abstracting too early, and in most cases, one github repo should be sufficient. the pattern we found useful is to use multiple dagster code locations (previously referred as dagster repositories) is to keep conflicting dependencies separate, where each Dagster code can keep their own package requirements (e.g., setup.py) and deployment specs (e.g., Dockerfile).
◦ if it’s for organizational purpose and you’re using assets, i think asset groups should be sufficient in most cases, i.e. no need to start different code locations or different github repos for that.
◦ for more context, we rolled out changes lately to eliminate unnecessary hierarchies in our top-level apis. you can find an overview in the diagram on this discussion: https://github.com/dagster-io/dagster/discussions/10772 where we’re eliminating repositories and workspace which aren’t needed for most use cases, so the project structure can be simpler.