Hi everyone ! I would like to have your thoughts a...
# data-platform-design
d
Hi everyone ! I would like to have your thoughts about how to implement Dagster and run jobs on several local machines (I'm a week old dagster newbie) My use case : I have an already existing data pipeline written and ran only manually (step by step) that I want to transfer to Dagster (because it's currently a huge pain to operate and adapt to other use cases without a proper orchestrator). As two external paying APIs are used, I'm pretty cautious about task executions and duplicates avoidance. Details : The pipeline is seeded with quite extensive configurations (semantics, urls, numeric tresholds, etc.), it begins with composing first API queries, then runs them in batches, then process them (with a changing IO json schema), then evaluates data and compose the second API targets, etc. until processing a custom report. It's currently built on top of a local Mongo DB (which was better for prototyping) and ran locally. Current goal : As I need to prototype quite a few things to improve and extend the existing pipeline and make it run time consuming tasks (selenium automation tasks), I want to use some local computers that I own to operate those tasks (a Raspberry Pi 4, an old 32bit linux PC, etc.) My question : I want to know what would be the Dagster way of using those machines to run jobs and still monitoring them on my main PC dagit instance ? (I'm not so good at Docker and absolutely noob in K8S) My ideas : ā€¢ Joining dagit instances on multiple PC with specific ops making graphQL calls, maybe with a Fastapi "hub". My main PC would send runjob calls and receive AssetMaterialization events for example. ā€¢ Maybe run celery servers on those machines and configure dagit to interact with them on specific jobs (but I'm not sure to understand how it's implemented) ā€¢ Else ? Sometimes when you learn new concepts, the problem or the solution you focus is wrong, so feel free to correct me in any level you see necessary šŸ™‚ I may also have missed a doc or example dealing with this kind of use case Thanks a lot !
šŸ¤– 1
dagster bot responded by community 1
g
I think in case you use the dagster cloud version you could simply start a runner any any compute node you have access to.
ā¤ļø 1
d
Great thanks ! I'll try it šŸ™‚
@geoHeil any idea to prototype something while on the waiting list ? šŸ™‚
g
I would suggest you to talk to someone of the dagster people like @Shaun McAvinney or others I guess they can help you.
d
Ok great thanks šŸ™‚
@geoHeil man, your repos are awesome !
ā¤ļø 1
šŸŽ‰ 1