Hey, is there any documentation on how we should n...
# ask-community
j
Hey, is there any documentation on how we should name our projects? Same question : Is there any documentation on how we should group pipelines and defined workspaces?
my_dagster_project
is a really cool name šŸ˜…, but I might want to change it. I just donā€™t know to what.
s
Hey Jacob - I don't think we currently have any documentation on this. I know @yuhan was working on putting some guidelines together on recommended project structure.
y
Hi Jacob, we donā€™t have a ā€œone size fits allā€ recommendation for naming tho, but Iā€™m happy to learn about your use cases and provide some suggestions.
j
Ok thank you very much. What I donā€™t understand fully now, is what should live in the
assets
folder as another folder, what you be in another
workspace
, and what you be in another
github repo
. ā€¢ So we have a bunch of Blockchain data, all of this data is segmented by networks. ā€¢ We have third party API call that we are going to put a on schedule. ā€¢ Some of the API calls are blockchain related, some are not. ā€¢ We want to orchestrate DBT, using DBT Cloud. We have 1 DBT Github Repo per blockchain and 1 global for the other stuffs. ā€¢ All data is store in the end Snowflake ā€¢ It would be nice if we can store every
raw
file in S3 when we gather the new data. The Dagster structure we are following is the
project_fully_feature
example. Any help is appreciated šŸ™‚
y
Hereā€™s my suggestions: ā€¢ all assets can live inside the
assets
folder and you can divide assets into groups where each group can be its own folder. this structure would benefit together from using
load_assets_from_package_module
or
load_assets_from_modules
to load assets to your definitions based on the folder structure. ā€¢ because youā€™re thinking of dbt cloud, you can use
load_assets_from_dbt_cloud_job
api to load the dbt project jobs by specifying a dbt cloud url, so you donā€™t have to put dbt projects alongside with dagster code. ā€¢ as for another workspace or github repo, first of all, we donā€™t recommend over-abstracting too early, and in most cases, one github repo should be sufficient. the pattern we found useful is to use multiple dagster code locations (previously referred as dagster repositories) is to keep conflicting dependencies separate, where each Dagster code can keep their own package requirements (e.g., setup.py) and deployment specs (e.g., Dockerfile). ā—¦ if itā€™s for organizational purpose and youā€™re using assets, i think asset groups should be sufficient in most cases, i.e. no need to start different code locations or different github repos for that. ā—¦ for more context, we rolled out changes lately to eliminate unnecessary hierarchies in our top-level apis. you can find an overview in the diagram on this discussion: https://github.com/dagster-io/dagster/discussions/10772 where weā€™re eliminating repositories and workspace which arenā€™t needed for most use cases, so the project structure can be simpler.