sephi
04/01/2020, 7:48 PM> dagster - ERROR - pyspark_pipeline - 1d12e4e7-15e0-49f8-a394-27dcf29ef9cb - PIPELINE_INIT_FAILURE - Pipeline failure during initialization of pipeline "pyspark_pipeline". This may be due to a failure in initializing a resource or logger.
event_specific_data = {"error": ["py4j.protocol.Py4JError: An error occurred while calling <http://None.org|None.org>.apache.spark.api.python.PythonAccumulatorV2. Trace:\npy4j.Py4JException: Constructor org.apache.spark.api.python.PythonAccumulatorV2([class java.lang.String, class java.lang.Integer, class java.lang.String]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n\n\n", [" File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/dagster/core/errors.py\", line 161, in user_code_error_boundary\n yield\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 138, in single_resource_event_generator\n resource = next(gen)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/dagster_pyspark/resources.py\", line 66, in pyspark_resource\n pyspark = SystemPySparkResource(init_context.resource_config['spark_conf'])\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/dagster_pyspark/resources.py\", line 30, in __init__\n self._spark_session = spark_session_from_config(spark_conf)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/dagster_pyspark/resources.py\", line 25, in spark_session_from_config\n return builder.getOrCreate()\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/pyspark/sql/session.py\", line 173, in getOrCreate\n sc = SparkContext.getOrCreate(sparkConf)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/pyspark/context.py\", line 367, in getOrCreate\n SparkContext(conf=conf or SparkConf())\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/pyspark/context.py\", line 136, in __init__\n conf, jsc, profiler_cls)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/pyspark/context.py\", line 207, in _do_init\n self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/py4j/java_gateway.py\", line 1525, in __call__\n answer, self._gateway_client, None, self._fqn)\n", " File \"/opt/anaconda3/envs/dask_pyarrow/lib/python3.7/site-packages/py4j/protocol.py\", line 332, in get_return_value\n format(target_id, \".\", name, value))\n"], "Py4JError", null]}
We tried to play with SPARK_HOME
and tried to use findspark in the dagster_pyspark/resources.py
file but had no success .
Any suggestions?max
04/01/2020, 7:55 PMnate
04/01/2020, 8:35 PMsephi
04/02/2020, 1:08 PMdagit
nate
04/02/2020, 4:37 PMsephi
04/02/2020, 5:54 PMpyspark 2.3.0
(which also requires python 3.6
) - but still have the same problems.
when running dagit
there is no problem, but when running dagster
we get a different error:
java.util.NoShuchElementException: key not found: _PYSPARK_DRIVER_CONN_INFO_PATHI'm still trying to understand where you define SPARK_HOME in the conda envrionment.
composit_solids.py
file
findspark.init(spark_home='/path/to/spark/lib/')
max
05/03/2020, 10:21 PM