Tiri Georgiou
06/10/2021, 10:27 AM@solid(required_resource_keys={
"s3"},
config_schema={
'file_key': Field(str, is_required=True, description="Path from bucket to file i.e. data/raw/smmt_raw.csv"),
'bucket': Field(str, is_required=True, description="Bucket name i.e. data-team-staging")
}
)
def read_from_s3(context) -> pd.DataFrame:
"""read_from_s3 will load csv from s3.
Configs:
file_key (str): Path from bucket name to csv.
bucket (str): Bucket name located in s3.
Returns:
(DataFrame): loaded csv as a pandas DataFrame.
"""
# Get response
resp = context.resources.s3.get_object(
Bucket=context.solid_config["bucket"], Key=context.solid_config["file_key"])
# As dataframe
df = pd.read_csv(resp['Body'])
<http://context.log.info|context.log.info>(f"Columns of dataframe: {df.columns}")
return df
It just simply reads in the csv saved from a bucket defined by its bucket_name and key. It works locally fine. Obviously locally its reading from my .aws/credentials
but I suppose the container wouldn't need any of these credentials because its already got an IAM role? Or is there another config I need to set?jordan
06/10/2021, 2:07 PM