Hello support, I have been using dagster on poc le...
# ask-community
e
Hello support, I have been using dagster on poc level for a while and been happy with it. However when I started a bit more complicated use cases, I struggle a bit with the documentation. Right now, I can't find in the code or in any documentation how to state the
"metadata": {  "partition_expr":  {"<partition_column_name_1>": "anystring", "partition_column_2": "anystring"}
for for a multipartitioned snowflake io asset. It works with any dictionary, but the SQL produced is weird.
SELECT * FROM <tablename> WHERE
None in ('<partitionkey-1>') AND
None in ('<partitionkey-2>')
(sorry, never learned when to press ctrl-enter and just enter in slack) I guess I am rather close - I just need to understand what "anystring" should have been to not become None
Or is this a bug? I tried to read the tests and it seems that it should be a dictionary returning the "expression" , in my case just the column names. I.e. this should work given that my asset returns a dataframe with the named columns?
"metadata": {
"partition_expr":  {
"<partition_column_name_1>": "<partition_column_name_1>",
"<partition_column_2>": "<partition_column_2>"
}
}
Maybe I am doing this all wrong. If I have a static set of companies, but different subdimensions, I guess a multipartition structure doesnt work? I.e. the dynamic partition would be different for each company, so it is more of a tree than a matrix. Would I need to manage this myself instead in a single dynamic partition?
t
Hmm, do you mind clarifying on what the partitions are? That does look like the right syntax to define multi-partitions with i/o managers, is the second partition column a dynamic partition?
e
Yes, the multidimension is a [StaticPartitionDefinition, DynamicPartitionDefinition], where the static partition is a list of entity ids and dynamic partition definition is set by a sensor that retrieves "year id" from an api. If there was a YearlyPartitionDefinition I would maybe have used it instead. The asset is persisted correctly, but I suspect that a re-run would cause dupes because the next time the partition a|1 is run, I would expect the query
SELECT * FROM <tablename> WHERE
entity_id in ('a') AND
year_id in ('1')
but instead I get the above mentioned with None instead of column names and I can see in the snowflake query log that a query failed (since a while back, failed queries are no longer visible in the log on Snowflake unless you are accountadmin, bummer) I can follow this up as "community service" but as I mentioned, I think I made a mistake. Each of my entities has different list of values for the year ids, so I can't use the combinatorial result I assume the multi-partition would create. I.e. if entity a has years [1,2,3], b has [1,2] and c just [1], this would not produce the correct result, because the multi partition would create all partition combinations from a|1 => c|3.
t
Aaah. okay, just figuring out the best way to help you! If one of your partitions is yearly, you can make your own timewindowpartitiondefintion with a cron
e
Thanks for the tip. And then I guess I would need a second pointer on how to combine https://stackoverflow.com/questions/73983827/how-can-i-do-an-incremental-load-based-on-record-id-in-dagster with partitions? I.e. for the current year, I will need to reload more often than once per partition. The api endpoint is more or less identical in behavior as the SO question, except that I have a "lastmodified" timestamp instead of a increasing cursor Would something like this work, or do I need to adapt the way to look for the last materialization to include the partition?
latest_materialization_event = context.instance.get_latest_materialization_events(
[context.asset_key_for_output()]
).get(context.asset_key_for_output())
if latest_materialization_event:
materialization = (
latest_materialization_event.dagster_event.event_specific_data.materialization
)
metadata = {entry.label: entry.entry_data for entry in materialization.metadata_entries}
lastmodified = metadata["lastmodified"].value
else:
lastmodified = None
There seems to be an implementation here https://github.com/dagster-io/dagster/pull/6047 but I cant figure out how to get from the context to the Asset node and I can't find any docs about any helpers.