drogozin
10/26/2022, 8:21 AMdef process_file(file_names:List[string]):
#mask files
gdal_merge("-o", "output.tif", file_names) #the gdal_merge API doesn't saves file behind the scene and doesn't return anything.
jamie
10/26/2022, 2:12 PMdrogozin
10/27/2022, 9:23 AM@asset(ins={"modis_data":AssetIn()})
my_asset(modis_data):
output =process_modis_data(modis_data)
return output
where output represents processed files.
So, that will be an ideal scenario, unfortunatelly
GDAL functions don't return values but store
processed file as a side effect, so for GDAL functions you pass input_path and output_path
and after the function completes, you have an output_path file created. So GDAL library is not applicable to the ideal scenario I've described above, it's more like (and I don't know how to handle it):
@asset(ins={"modis_data":AssetIn()})
my_asset(modis_data): process_modis_data(modis_data,outPath)
## NO OUTPUTHERE
jamie
10/27/2022, 1:26 PMOutput
with metadata of the filepaths, and write a custom IO manager to supply those paths (or load the images) to downstream assets
For 2 it would look something like
@asset
def my_asset(modis_data):
process_data(modis_data, output_path)
return Output(None, metadata={"filepath": output_path})
Here is out documentation for writing an IO Manager and I’m happy to walk you through that process if you decide to go that route!
personally, I think option 1 sounds easiest, but i may be missing more context/requirements that make 2 the better optiondrogozin
10/31/2022, 5:24 PM@asset(
ins={
"modis_input_paths": AssetIn("modis_asset"),
},
io_manager_key="modis_asset_io_manager",
)
def modis_preproc_asset(
context: OpExecutionContext,
modis_input_paths,
):
path = process_logic(input_files, *params)
return path
@asset()
def modis_usage(context, modis_preproc_asset):
<http://context.log.info|context.log.info>(modis_preproc_asset) #this doesn't return path from the func above
I believe I am misusing the io_manager here. As I overrode the "load_input" method, it works as expected.
But for handle_output, I do nothing, just logging. Because the output should be just a path of the output files, but this method doesn't return anything. My next approach will be to override "handle_output" of the IO manager, and just store the output path somewhere, so the downstream asset can read it.jamie
10/31/2022, 5:45 PMmodis_usage
asset is using the default io manager (the filesystem IO manager unless you specify it otherwise in your run config). The FS io manager stores the output of an asset on your file system and then when that output is loaded as an input in another asset, it reads from that file. So in modis_usage
the fs io manager is trying to read from a file where it things the upstream output was stored. But (based on my understanding) the io manager for the upstream output (modis_asset_io_manager
) is only logging info when handle_output
is called. Instead, it would need to store the filepaths in a file that the downstream asset can find.
You can change how youre specifying the io manager for the first asset so that it is only responsible for loading the input asset, buy using the input_manager_key
parameter for AssetIn. This lets you specify an IO manager that is only used to load the corresponding asset. So if you set modis_asset_io_manager
as the input_manager_key
then the default io manager would still be used to store the output of modis_preproc_asset
and load it as the input to modis_usage
let me know if any of that doesn’t make sense or doesn’t work for you!drogozin
11/02/2022, 9:42 AM