Frank Dekervel
11/19/2020, 9:33 PMcat
11/19/2020, 10:09 PMbob
11/19/2020, 10:35 PMspark_conf
. that’s why your spark.kubernetes.*
variables are being ignored
alternatively, a solution would be to specify user-defined k8s config on your solid or pipeline that uses the pyspark_resource
. this is done with the tags
argument in the @solid
and @pipeline
decorator
i made a quick example below for specifying the k8s ServiceAccountName:
@pipeline(
# ...other parameters,
tags={
'dagster-k8s/config': {
'pod_spec_config': {
'serviceAccountName': 'my-k8s-service-account',
},
},
},
)
def my_pipeline_that_uses_pyspark():
pass
if you have additional variables that usually go in spark.kubernetes.*
, you might be able to fit them in tags['dagster-k8s/config]
another example can be found in the changelog announcement of “user-defined k8s config”Frank Dekervel
11/19/2020, 10:37 PMbob
11/19/2020, 10:40 PMFrank Dekervel
11/19/2020, 10:43 PMbob
11/19/2020, 10:47 PMCONFIG_TYPES
, that wouldnt be sufficient 😓
parse_spark_configs.py
fetches the markdown for the Spark config doc website and generates the Dagster config*,* but the config variables for Spark kubernetes stuff is on a different page :///
CONFIG_TYPES
is there for typing non-string config variables. by default, all the Spark config variables will be str
typedFrank Dekervel
11/19/2020, 10:49 PMbob
11/19/2020, 10:49 PMCONFIG_TYPES
are still ConfigType.String
🤔 hmmFrank Dekervel
11/19/2020, 10:49 PMjohann
11/19/2020, 10:56 PMbob
11/19/2020, 11:01 PMspark_config
end up being passed to `SparkSession.builder` which should recognize annotations or whatnotsandy
11/19/2020, 11:01 PMpyspark_resource.configured({"spark_conf":{"spark.some.config": "some_value"}})
Frank Dekervel
11/19/2020, 11:03 PMsandy
11/19/2020, 11:15 PMFrank Dekervel
11/19/2020, 11:16 PMsandy
11/19/2020, 11:30 PMFrank Dekervel
11/19/2020, 11:36 PMsandy
11/19/2020, 11:46 PMFrank Dekervel
11/19/2020, 11:47 PMsandy
11/19/2020, 11:48 PMFrank Dekervel
11/20/2020, 9:57 AM