https://dagster.io/ logo
#integration-bigquery
Title
# integration-bigquery
q

Qwame

05/09/2023, 7:28 PM
I'm using the bigquery pandas IO manager to load data in partitions. I noticed that when the first partition is an empty dataframe, the table is created in BQ nut the data types are all wrong, which is expected because there's no data to use to infer the data types. Can we have something in the
handle_output
section of the pandas type handler file that skips the creation of tables if the data frame has 0 rows or some sort of a check to ensure that 0 rows data frames are not passed as output to the IO manager
🤖 1
I mocked up something like this
Copy code
if obj.shape[0] > 0:  # or table_slice.job_config.schema:
            job = connection.load_table_from_dataframe(
                dataframe=with_lowercase_cols,
                destination=f"{table_slice.schema}.{table_slice.table}",
                project=table_slice.database,
                location=context.resource_config.get("location")
                if context.resource_config
                else None,
                timeout=context.resource_config.get("timeout")
                if context.resource_config
                else None,
                # job_config=table_slice.job_config if table_slice.job_config else None,
            )
            job.result()
        elif context.has_partition_key:
            context.log.warn(f"The partition has {obj.shape[0]} rows")
        else:
            check.failed(
                f"The object has {obj.shape[0]} rows. Please provide a `job_config` since auto-detection of data types can be wrong without data"
            )
s

sean

05/09/2023, 10:03 PM
This sounds very reasonable. Would you mind opening this as a GH issue? That’s going to be a better forum.
👍 1
q

Qwame

05/10/2023, 2:15 PM
Issue created here
2 Views