Oliver
08/29/2022, 6:52 AMAssetsDefinition.from_op
in the datascientists role, but this takes away some of the magic and ease of SDA's IMO
I had a look at the example projects but nothing really seemed to match my use case too much, the relevant examples tend to use partitioning for scaling which is a little difficult to implement in my situation since the number of partitions is data dependent
so a few questions if I may
1. suggestions for project layout that could facilitate this
2. why shouldn't assets be used as ops -- not sure why but I thought this was an intended mechanic
3. suggestions on any other workflows that might fit this use case
thanks 🙂Oliver
08/29/2022, 6:52 AM@asset(
io_manager_key='inferences_io'
)
def inference(model: FlairNerModel, dataset: list[object]):
inference = model.predict_many(preprocess)
return inference
@op(out=DynamicOut())
def batch(context, dataset):
batch_size = context.op_config['batch_size']
n_batches = int(ceil(len(dataset)/batch_size))
get_batch = lambda x: dataset.iloc[x*batch_size:(x+1)*batch_size]
batches = map(get_batch, range(n_batches))
indexed_batches = zip(batches, range(n_batches))
wrap_batches = lambda data, idx: DynamicOutput(data, mapping_key=str(idx))
outputs = starmap(wrap_batches, indexed_batches)
yield from outputs
@op
def collect(results):
return pd.concat(results, ignore_index=True)
@graph
def batch_inference(model, dataset):
batch_size = 50
batcher = batch.configured({'batch_size': batch_size}, f'batch_{batch_size}')
batches = batcher(dataset)
inferenced = batches.map(lambda x: inference(model, x))
return collect(inferenced.collect())
jamie
08/29/2022, 4:04 PMinference
asset actually be an op since you are using it to perform the same task on a variety of inputs. without knowing more about your use case, i think what could potentially be assets are the dataset
input to your graph and the output of the batch_inference
graphsandy
08/29/2022, 5:16 PMsandy
08/29/2022, 11:53 PMinference
as an asset is:
inferenced = batches.map(lambda x: inference.op(model, x))
Oliver
08/30/2022, 8:33 AMsandy
08/30/2022, 3:36 PMOliver
09/01/2022, 10:00 PM