Adrij Shikhar

06/24/2021, 5:47 AM
HI, i am trying to upload spark dataframe to GCP bucket using google native sdk but I could not find an efficient way to do that.
def upload_to_gcp(context, df):
    """Uploads a file to the bucket."""
    bucket_name = context.resources.gcp_bucket["bucket_name"]
    destination_blob_name = context.resources.gcp_bucket["destination_blob_name"]
    storage_client = storage.Client.from_service_account_json(
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    json_df = ''.join(df.toJSON().collect())

    blob.upload_from_blob(json_df, content_type='application/json')

    print("Upload completed====>")
I tried with the above mentioned. This works, but the conversion into JSON takes time. Can anyone help me if there is any way in Dagster to upload data to GCP without using Google SDK