Hello Team, Greetings and hope you are doing well!...
# ask-community
a
Hello Team, Greetings and hope you are doing well! I have a use case wherein I want dagster to modify folders (delete and create) on a case basis. The reason for this use case is specifically for Software Defined Assets in DBT. Currently we have our project setup in DBT Cloud which is orchestrated by dagster. To orchestrate the DBT project, I have been using DbtCliResource, but this requires a manual process of downloading the project from DBT Cloud to the local file system so that dagster picks and runs only latest and reviewed code. Therefore, it is crucial that the DBT Local project is in sync with its cloud counterpart atleast for software defined asset. To do this, i have been thinking of an automation with steps as follows: 1. On launching a Dagster run for DBT, Dagster would first create a "drop location" folder in the local system where it is hosted for the DBT Cloud project to be stored. 2. It then would clone the project from DBT Cloud to the said local drop location 3. Now dagster has all resources of DBT to kick off dbt asset management. Given the use case above, I would like to know how to create and delete folders using dagster and it would be great if you could point me out to its documentation.
o
hey @Ashley Dsouza! this is a really cool idea for handling the synchronization problem. There's definitely a way to get this setup to work, although it will be a bit roundabout, just because you don't have a ton of access into the internals of the generated dbt assets at the moment. Happy to talk through some options that can work with the current version of Dagster, but I'd also be interested in hearing your thoughts on what an ideal dbt Cloud + asset integration would look like.
off the top of my head there are two basic options here: either separate the synchronization code into a separate process that runs regularly, or treat the cloned dbt project as an asset (which the dbt assets depend on)
for the first one, you could create an op that uses the dbt_cloud_resource (or some other method) to clone the dbt project into a local directory. that would look something vaguely like:
Copy code
import os
from dagster import op

LOCAL_DBT_PROJECT_PATH = "/path/to/local/project"

@op
def clone_dbt_project():
    if not os.path.exists(LOCAL_DBT_PROJECT_PATH):
        os.makedirs(LOCAL_DBT_PROJECT_PATH)
    # ... some code to download current state of the dbt Cloud project
from there, you could put this into a simple job, and run it on a schedule, maybe every 5 minutes or something like that. This would ensure that you had a reasonably up-to-date local copy.
The other option would involve similar code, but a slightly different mental model. Here, you'd have an asset instead of an op, and treat this cloned project as a persistent data object that your dbt Cloud assets depend on.
Copy code
@asset
def cloned_dbt_project():
    if not os.path.exists(LOCAL_DBT_PROJECT_PATH):
        os.makedirs(LOCAL_DBT_PROJECT_PATH)
    # ... some code to download current state of the dbt Cloud
from there, you'd need a way to tell Dagster that the dbt Cloud assets depend on this cloned_dbt_project asset. I think you would be able to do that by messing with the sources.yml file in your dbt project (as dagster parses that to figure out how to map dbt sources to dagster assets). It'd definitely be a bit of a hack, but it should work well enough.
anyway, happy to chat through this more, as we're definitely interested in adding support for dbt Cloud + assets
a
hello @owen, Thanks very much for the information. I was able to implement DBT Cloud to DBT Cli code synchronization using an op. I had a bit of permission issues primarily due to using git clone to reload my drop location but after a few tweeks i was able to pull it off. With regards to the dbt cloud + assets configuration, I expect the SDA implementation for DBT Cloud should look similar to its DBT Cli implementation especially because all the artifacts (manifest.json, run_results.json, sources.yaml) are present in a DBT Cloud environment directly. From the looks of it DBT Cloud API v2.0 has a few methods to extract these artifacts through appropriate requests. Therefore any materialzation of assets of a DBT Cloud project would be equivalent to firing a request to the DBT Cloud API to run a project/model/lineage. Atleast thats the direction i would head to for starters which is to leverage DBT Cloud API features to integrate within Dagster. I hope this helps a bit :)