Stupid question to start with :sweat_smile:: How y...
# announcements
Stupid question to start with 😅: How you normally develop with spark on a Kubernetes environment? We have Open-Shift with Kubernetes (spark, jupyter notebooks, kafka etc.) and we have a databricks cluster (another spark for fast ramp-up-time). Would you deploy a dagster kubernets pod ( right? But how would you develop if you're not into Vim? Because I'd like to VS Code locally, but all our stuff is in Kubernetes. What is best practice? I started now to use dagster in a databricks notebook (just to have the concept with solids and reusable pattern). But of course I cannot really visualise the dag etc. and also programming inside a notebook is not really fun.... any hints are well appreciated 🙈
Theres a lot to unpack in that question but ill do my best. You should be able to start by authoring your dagster python code locally and getting a version running that way against some sample data. You may be able to use
for spark and
for jupyter notebooks. Then using the
abstractions - you can figure out how to make that pipeline work in your kubernetes infrastructure in addition to being runnable locally. You will build a docker image containing the pipeline code and potentially use
to deploy it.
thanks alex. I will try this. Problem is a bit, that all my data is on the object store and it's hard to create sample data for each file that I need. But I will try a little bit more with your hints and revert in case it doesn't work at all. Thank you very much for the help!
what object store are you using?
hi nate, i’m using s3. We’re using also a gateway like minio or zenko, but for the beginning i will directly use s3. I used your airline example, this is a good start for me. I believe you also used spark there in a local environment, didn’t work yet on my enviroment to run it (databricks extensions didn’t install, probably a proxy error on my windows/WSL installation in work 🙈. I’m trying now on my macbook, if it’s working here..) But the airline example uses spark on local mode, right?
yep! airline is local spark
local spark deployment, or just mimik it? 🤔 sorry for the beginner questions. But I’m trying to use S3 directly now, not over the
and hopefully deploy it to our kubernetes cluster.