https://dagster.io/ logo
Title
d

David Lakomski

05/18/2022, 12:56 PM
Hi everyone ! I would like to have your thoughts about how to implement Dagster and run jobs on several local machines (I'm a week old dagster newbie) My use case : I have an already existing data pipeline written and ran only manually (step by step) that I want to transfer to Dagster (because it's currently a huge pain to operate and adapt to other use cases without a proper orchestrator). As two external paying APIs are used, I'm pretty cautious about task executions and duplicates avoidance. Details : The pipeline is seeded with quite extensive configurations (semantics, urls, numeric tresholds, etc.), it begins with composing first API queries, then runs them in batches, then process them (with a changing IO json schema), then evaluates data and compose the second API targets, etc. until processing a custom report. It's currently built on top of a local Mongo DB (which was better for prototyping) and ran locally. Current goal : As I need to prototype quite a few things to improve and extend the existing pipeline and make it run time consuming tasks (selenium automation tasks), I want to use some local computers that I own to operate those tasks (a Raspberry Pi 4, an old 32bit linux PC, etc.) My question : I want to know what would be the Dagster way of using those machines to run jobs and still monitoring them on my main PC dagit instance ? (I'm not so good at Docker and absolutely noob in K8S) My ideas : • Joining dagit instances on multiple PC with specific ops making graphQL calls, maybe with a Fastapi "hub". My main PC would send runjob calls and receive AssetMaterialization events for example. • Maybe run celery servers on those machines and configure dagit to interact with them on specific jobs (but I'm not sure to understand how it's implemented) • Else ? Sometimes when you learn new concepts, the problem or the solution you focus is wrong, so feel free to correct me in any level you see necessary 🙂 I may also have missed a doc or example dealing with this kind of use case Thanks a lot !
:dagster-bot-resolve: 1
:dagster-bot-responded-by-community: 1
g

geoHeil

05/18/2022, 12:58 PM
I think in case you use the dagster cloud version you could simply start a runner any any compute node you have access to.
❤️ 1
d

David Lakomski

05/18/2022, 2:19 PM
Great thanks ! I'll try it 🙂
@geoHeil any idea to prototype something while on the waiting list ? 🙂
g

geoHeil

05/18/2022, 2:36 PM
I would suggest you to talk to someone of the dagster people like @Shaun McAvinney or others I guess they can help you.
d

David Lakomski

05/18/2022, 2:37 PM
Ok great thanks 🙂
@geoHeil man, your repos are awesome !
❤️ 1
😛artydagster: 1