sephi
04/01/2020, 7:48 PMdagster
with pyspark
but succeed to run dagit
with the exact same pipeline.
The below code succeeds with dagit:
(conda_env) myuser@myserver:~/projects/formatter> dagit -f dagster_pyspark/compsite_solids_types.py -n pyspark_pipeline -p 3001
Loading repository...
Serving on <http://127.0.0.1:3001> in process 73783
2020-04-01 20:45:35 - dagster - DEBUG - pyspark_pipeline - c56443eb-fd20-4d61-863a-7af6a1bfc4a4 - ENGINE_EVENT - Starting initialization of resources [spark].
event_specific_data = {"error": null, "marker_end": null, "marker_start": "resources", "metadata_entries": []}
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2) overrides detected (/opt/cloudera/parcels/SPARK2/lib/spark2/).
WARNING: Running spark-class from user-defined location.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2020-04-01 20:45:45 - dagster - DEBUG - pyspark_pipeline - c56443eb-fd20-4d61-863a-7af6a1bfc4a4 - ENGINE_EVENT - Finished initialization of resources [spark].
event_specific_data = {"error": null, "marker_end": "resources", "marker_start": null, "metadata_entries": [["spark", "Initialized in 10.05s", ["dagster_pyspark.resources", "SystemPySparkResource"]]]}
2020-04-01 20:45:45 - dagster - DEBUG - pyspark_pipeline - c56443eb-fd20-4d61-863a-7af6a1bfc4a4 - PIPELINE_START - Started execution of pipeline "pyspark_pipeline".
But when running the same pipeline with dagster
the pyspark
initialising fails .
(conda_env) myuser@myserver:~/projects/formatter> dagster pipelione execute -f dagster_pyspark/compsite_solids_types.py -n pyspark_pipeline -e dagster_pyspark/test_pyspark.yaml