When using databricks pyspark step launcher, I alw...
# ask-community
b
When using databricks pyspark step launcher, I always see the following log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Does this mean that the job is being set up locally? Shouldn't it be in databricks instead?
@owen I am afraid this a step launcher-related issue again... I guess the problem is that the resources are being redundantly initialized both locally and in databricks, isn't it?
o
@Bernardo Cortez hm yes the resources will be initialized in both places unfortunately. the local resources (other than the step launcher) will not be used for anything during execution. does this issue cause a failure on your end, or is it more of a performance kind of problem?
b
It's a performance issue, it takes ~500Mb RAM just for launching a step...
👍 1
o
yikes yeah that's pretty bad -- is this happening with a custom resource or an imported one?
b
Just importing the pyspark resource from dagster-pyspark. Yep, it's pretty resource intensive :) Any chance of not intializing locally?
o
https://github.com/dagster-io/dagster/issues/7253 created an issue for it, although my intuition is that this might be a bit tricky to solve. as a bandaid/hack on your end, you might be able to create a
hacked_pyspark_resource
, that is basically a copy/paste of the real one, but it reads an environment variable to determine if it will return None or an actual resource.
b
Ah, thats a good trick. Let me try that.