Hey, is there any documentation on how we should n...
# ask-community
j
Hey, is there any documentation on how we should name our projects? Same question : Is there any documentation on how we should group pipelines and defined workspaces?
my_dagster_project
is a really cool name 😅, but I might want to change it. I just don’t know to what.
s
Hey Jacob - I don't think we currently have any documentation on this. I know @yuhan was working on putting some guidelines together on recommended project structure.
y
Hi Jacob, we don’t have a “one size fits all” recommendation for naming tho, but I’m happy to learn about your use cases and provide some suggestions.
j
Ok thank you very much. What I don’t understand fully now, is what should live in the
assets
folder as another folder, what you be in another
workspace
, and what you be in another
github repo
. • So we have a bunch of Blockchain data, all of this data is segmented by networks. • We have third party API call that we are going to put a on schedule. • Some of the API calls are blockchain related, some are not. • We want to orchestrate DBT, using DBT Cloud. We have 1 DBT Github Repo per blockchain and 1 global for the other stuffs. • All data is store in the end Snowflake • It would be nice if we can store every
raw
file in S3 when we gather the new data. The Dagster structure we are following is the
project_fully_feature
example. Any help is appreciated 🙂
y
Here’s my suggestions: • all assets can live inside the
assets
folder and you can divide assets into groups where each group can be its own folder. this structure would benefit together from using
load_assets_from_package_module
or
load_assets_from_modules
to load assets to your definitions based on the folder structure. • because you’re thinking of dbt cloud, you can use
load_assets_from_dbt_cloud_job
api to load the dbt project jobs by specifying a dbt cloud url, so you don’t have to put dbt projects alongside with dagster code. • as for another workspace or github repo, first of all, we don’t recommend over-abstracting too early, and in most cases, one github repo should be sufficient. the pattern we found useful is to use multiple dagster code locations (previously referred as dagster repositories) is to keep conflicting dependencies separate, where each Dagster code can keep their own package requirements (e.g., setup.py) and deployment specs (e.g., Dockerfile). ◦ if it’s for organizational purpose and you’re using assets, i think asset groups should be sufficient in most cases, i.e. no need to start different code locations or different github repos for that. ◦ for more context, we rolled out changes lately to eliminate unnecessary hierarchies in our top-level apis. you can find an overview in the diagram on this discussion: https://github.com/dagster-io/dagster/discussions/10772 where we’re eliminating repositories and workspace which aren’t needed for most use cases, so the project structure can be simpler.