Hi I ve been trying to build a Pythonic IO Manager for delta dagster #ask-community

Hi! I’ve been trying to build a Pythonic IO Manage...

Ian Armstrong

04/08/2023, 9:49 PM

Hi! I’ve been trying to build a Pythonic IO Manager for delta lake, which currently can write spark/pandas dfs, but fails to read spark dfs due to this error message:

Copy code

Unknown resource `spark`. Specify `spark` as a required resource on the compute / config function that accessed it.

My spark resource is supplied to my root definition, and Ops/Assets can interact with it directly.

Copy code

RESOURCES_LOCAL = dict(
    ...,
    spark=PySparkResource(...) # My custom resource
)

DEPLOYMENTS = {"local", "staging", "prod"}
resources_by_deployment_name = {
    "prod": RESOURCES_PROD,
    "local": RESOURCES_LOCAL,
}
deployment_name = os.environ.get("DAGSTER_DEPLOYMENT", "local")
assert deployment_name in DEPLOYMENTS

defs = Definitions(
    assets=all_assets,
    resources=resources_by_deployment_name[deployment_name],
    schedules=[],
    sensors=all_sensors,
)

Is there way to set a required resource directly on a

ConfigurableIOManager

? Or is the best practice to wrap that with the

@io_manager

decorator. I tried digging through the docs + this channel, and this is the closest issue I could find.

claire

04/10/2023, 6:59 PM

Hi Ian, I think you might be seeing this error because you haven't specified

spark

as a resource on your IO manager. You can do this by defining your io manager like:

Copy code

@io_manager(required_resource_keys={"spark"})
def my_io_manager(context):
    context.resources.spark(...)
    return ....

4 Views

Open in Slack

Previous Next