Hey I m currently evaluating dagster as a replacement for ai dagster #announcements

Hey, I'm currently evaluating dagster as a replace...

swenzel

01/14/2020, 10:50 AM

Hey, I'm currently evaluating dagster as a replacement for airflow. I've listened to https://softwareengineeringdaily.com/2019/11/15/dagster-with-nick-schrock/ and could relate to many of the airflow issues you describe @schrockn. Thanks for your insights, very interesting episode 🙂 So, although we started developing our own k8s native tool, I'd like to give dagster a go since it sounded interesting when you described it and looked even better when I went through the tutorial. Is there any reason why you're using your own k8s launcher in dagster-k8s instead of Dask Kubernetes?

alex

01/14/2020, 4:10 PM

Is there any reason why you’re using your own k8s launcher in dagster-k8s instead of Dask Kubernetes?

We are currently building out

dagster-celery

dagster-k8s

together. I expect when we expand

dagster-k8s

to support Dask we will use Dask Kubernetes.

max

01/14/2020, 7:31 PM

and would welcome your thoughts on how we could best integrate with dask k8s -- this is an area where our thinking and design is evolving quickly and we would love insights from other practitioners

👍 1

swenzel

01/15/2020, 7:06 AM

Thanks for your replies. I’m asking because, according to documentation, your dask Integration runs all solids individually while the k8s integration, according to my understanding of the code, runs whole pipelines in a single job. I suppose the former has much better scaling potential and dask Kubernetes claims to scale dynamically with your workload. So the combination of dask Kubernetes wir dagster-dask seemed straight forward to me.

swenzel

01/15/2020, 7:39 AM

That being said, I see the advantage of not introducing yet another system. But then I'd go for an implementation similar to the one used by dagster-dask in that you run one pod per solid rather than one job per pipeline. Also we found jobs hard to observe programmatically, which is why we decided to spawn and manage pods ourselves with the tool we're developing.

max

01/15/2020, 6:07 PM

would love to know more about the issues you had observing jobs

max

01/15/2020, 6:08 PM

there clearly is an approach where we reach deeper into k8s and write some kind of CRD -- this is the approach that argo has taken, e.g.

swenzel

01/16/2020, 1:29 PM

would love to know more about the issues you had observing jobs

It's hard to figure out the reason why a job is in a certain state. If it's running, it might actually be running but might as well be that the pod which was spawned by the job is stuck in pending. Why? Maybe because of an image pull error, Maybe because it's not schedulable. Is the image pull error due to a missing/wrong image or due to missing/wrong credentials? Is it not schedulable because the cluster is full? Does it fit on a node at all? Most of that you can find out by reading the pod events, but you won't find that information on the job resource. Therefore, since we have to watch the pods anyway, we'll just manage them ourselves and thereby only need to watch and deal with one instead of two k8s resources.

max

01/16/2020, 6:02 PM

yep, that makes sense to me

Open in Slack

Previous Next