Simon Späti
03/05/2020, 8:05 AMnate
03/05/2020, 10:50 AMSimon Späti
03/05/2020, 11:19 AMspark:
config:
spark_conf:
spark:
sql:
extensions: io.delta.sql.DeltaSparkSessionExtension
delta:
logStore:
class: org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
jars:
packages: "com.databricks:spark-avro_2.11:3.0.0,com.databricks:spark-redshift_2.11:2.0.1,com.databricks:spark-csv_2.11:1.5.0,org.postgresql:postgresql:42.2.5,org.apache.hadoop:hadoop-aws:2.7.7,org.apache.hadoop:hadoop-common:2.7.7,org.apache.hadoop:hadoop-client:2.7.7,com.amazonaws:aws-java-sdk:1.7.4,io.delta:delta-core_2.11:0.5.0"
and then you can use delta commands easily out-of-the-box (at least so far it does what it should 🙂 )
data_frame.write \
.format("delta") \
.mode(delta_coordinate['config.mode']) \
.option("mergeSchema", delta_coordinate['config.mergeSchema']) \
.partitionBy(delta_coordinate['config.partitionBy']) \
.save(delta_path)
probably worth an example or worth highlighting somewhere.