since dagster ops/assets/etc just wrap python code we should be able to find a way to make this work for you without needing to bring in another tool like spark (that is unless spark is something you want to use)
Here's some ideas based on my understanding of what you're trying to do:
• in a single asset, read the file in batches, process it, and return the result (this would basically be like sticking an @asset decorator on a plain python function that does the reading and processing)
• if you split out the read operation into an op, where one op reads a single batch of the data (ie you would need to run the op multiple times to read the data) you can make your asset using a graph backed asset. Basically your graph would look something like this
@graph
def data_graph():
chunk1 = read_data()
chunk2 = read_data()
chunk3 = read_data()
return combine_data(chunk1, chunk2, chunk3)
then you can turn this graph into an asset
https://docs.dagster.io/concepts/assets/software-defined-assets#graph-backed-assets