Today is my first day with Dagster, so feel free t...
# ask-community
c
Today is my first day with Dagster, so feel free to tell me to rtfm, but please give me a pointer. 🙂 I'm working my way through the Tutorial and having a hard time getting my head wrapped around whether or not Dagster is designed to support my scenario. Currently I have a workflow like this: 1. CSV file dropped in
/original
folder 2. Manually run PrepFiles.py a. Ensure the file is utf-8 encoded b. Create a copy in
/Load_Raw
folder 3. Manually run QuickLoadToPostgres.py a. read csv into dataframe b. minimal transformation c. enrich with metadata d. load into postgres database i. successful files written to
/Load_Raw-complete
ii. failed files written to
/Load_Raw-errors
4. Manually run dbt run a. transforms data into final table for reporting & analysis (This always runs flawlessly! 😉) What I'm hoping Dagster can do is replace each of the "Manually" statements. I see how I can integrate dbt, though I haven't tried it yet. What I'm not really seeing in the tutorial is an example of using Dagster to run existing Python scripts. Am I thinking about this wrong? Is there a similar example I can learn from?
dagster bot responded by community 1
🤖 1
q
This is all possible with Dagster. Your options are as follows: • Make your workflow a single job by using the dagster shell integration to run the python scripts. This makes each script an op within the job. This job will have four ops as you outlined. • Instead of using dagster to run python scripts, make each step in your python scripts a dagster op. This will be about 10 or more ops in this job.
❤️ 1
m
Hi! You can put your scripts in the same project where your dagster directory. After that, you can build packages from your scripts and run these from dagster job. If you have more questions tag me into direct messages, i will show you an example =)
❤️ 1
y
I’d recommend an incremental development pattern (exactly what @Qwame describes): 1) build a single job using dagster shell integration to make sure all the “manually” statements are connected using Dagster 2) then migrate the script to ops or assets (when to use ops vs assets) as defining computations natively in dagster gives benefits like lineage tracking, metadata monitoring, etc.