< Scout> update code below to work with dagster logging impo dagster #ask-ai

Join Slack

<@U0667DNC02Y> update code below to work with dag...

# ask-ai

Peter Mburu

02/22/2024, 10:15 AM

@Scout update code below to work with dagster logging import logging logger = logging.getLogger(name)

Scout

02/22/2024, 10:15 AM

To update your code to work with Dagster's logging system, you can replace the standard Python logging with Dagster's

get_dagster_logger()

utility. This will allow your logs to be captured by Dagster and linked to the relevant computations. Here's how you can modify your code:

Copy code

python from dagster import get_dagster_logger  logger = get_dagster_logger()

Now, you can use

logger

just like you would use a standard Python logger, but the messages will be treated identically to messages created using

context.log

in Dagster. This means that the logs will be captured and displayed in the Dagster UI and other places where Dagster manages logs. Here's an example of how you might use the logger in a function:

Copy code

python from dagster import get_dagster_logger  def my_logged_function():     logger = get_dagster_logger()     for i in range(10):         <http://logger.info|logger.info>("Did %d things!", i+1)

By using

get_dagster_logger()

, you ensure that your logs are integrated with Dagster's logging system without needing to modify function signatures to pass around context.

Peter Mburu

02/22/2024, 10:49 AM

@Scout how to use

Copy code

MetadataValue.url

Scout

02/22/2024, 10:49 AM

The

MetadataValue.url

is a static constructor in Dagster that allows you to wrap a URL string as a

UrlMetadataValue

. This can be used as the value type for the metadata parameter for supported events, such as asset materializations, expectation results, and others. When you use

MetadataValue.url

, the URL will be rendered in the Dagster UI as a clickable link, which can be useful for providing quick access to relevant resources. Here's an example of how you might use

MetadataValue.url

within an operation (op):

Copy code

python from dagster import op, MetadataValue  @op def emit_metadata(context):     # Your logic here     context.log_event(         AssetMaterialization(             asset_key="my_dataset",             metadata={                 "dashboard_url": MetadataValue.url("<http://mycoolsite.com/my_dashboard>"),             }         )     )

In this example,

emit_metadata

is an op that logs an

AssetMaterialization

event. The metadata for this event includes a "dashboard_url" key, which is associated with a URL to a dashboard. By using

MetadataValue.url

, the URL is wrapped in a way that Dagster understands it should be rendered as a clickable link in the UI. Remember that the

AssetMaterialization

and

context.log_event

are part of the Dagster's event logging system, which allows you to log structured events that are integrated with the Dagster UI and tooling.

Peter Mburu

02/22/2024, 11:44 AM

@Scout explain how to use monthly partition

Scout

02/22/2024, 11:44 AM

To use monthly partitions in Dagster, you typically define a

MonthlyPartitionsDefinition

for an asset. This partitions definition will create a set of partitions for your asset, where each partition corresponds to a month. You can specify the start date for the first partition and other optional parameters like timezone and date format. Here's a step-by-step guide on how to use monthly partitions: 1. Define Monthly Partitions: Use the

MonthlyPartitionsDefinition

class to define how your asset will be partitioned on a monthly basis. You need to provide at least a

start_date

parameter. 2. Create an Asset: Decorate a function with the

@asset

decorator and pass the

partitions_def

argument to it, assigning the

MonthlyPartitionsDefinition

you created. 3. Access Partition Information: Inside your asset function, you can access the partition information through the

context

parameter, which provides the start and end times for the current partition. Here's an example of how you might define an asset with monthly partitions:

Copy code

python from dagster import asset, MonthlyPartitionsDefinition import pandas as pd  # Define monthly partitions starting from a specific date monthly_partition_def = MonthlyPartitionsDefinition(start_date="2023-01-01")  @asset(partitions_def=monthly_partition_def) def my_monthly_asset(context):     # Access the partition's time window from the context     start_time = context.partition_start     end_time = context.partition_end      # Your logic here, e.g., querying a database for data within the month     data = pd.DataFrame()  # Replace with actual data fetching logic      return data

In this example,

my_monthly_asset

is an asset that will be partitioned monthly, starting from January 1, 2023. Each partition represents a month, and within the asset function, you can use the

context.partition_start

and

context.partition_end

to fetch data that falls within the current month's time window. When you run this asset, Dagster will handle each partition separately, allowing you to materialize or compute the asset for individual months. This is particularly useful for batch processing scenarios where data is naturally segmented into monthly chunks. Keep in mind that the actual implementation of data fetching and processing will depend on your specific use case. The example provided here is a template to illustrate how to set up monthly partitions with Dagster.

Peter Mburu

02/22/2024, 11:52 AM

@Scout with dyanamic start date

Scout

02/22/2024, 11:52 AM

In Dagster, if you need to define monthly partitions with a dynamic start date, you would typically handle this by programmatically determining the start date based on some logic or condition before creating the

MonthlyPartitionsDefinition

. Dagster's partitioning system expects a fixed start date when defining partitions, so to have a "dynamic" start date, you would need to compute that start date dynamically in your code before passing it to the

MonthlyPartitionsDefinition

. Here's an example of how you might do this:

Copy code

python from dagster import asset, MonthlyPartitionsDefinition import datetime  # Function to determine the dynamic start date def get_dynamic_start_date():     # Your logic to determine the start date     # For example, it could be based on the current date, an external event, or a configuration     today = datetime.datetime.today()     # Set the start date to the first day of the current month     start_date = datetime.datetime(today.year, today.month, 1)     return start_date  # Use the dynamic start date when defining monthly partitions dynamic_start_date = get_dynamic_start_date() monthly_partition_def = MonthlyPartitionsDefinition(start_date=dynamic_start_date)  @asset(partitions_def=monthly_partition_def) def my_monthly_asset(context):     # Access the partition's time window from the context     start_time = context.partition_start     end_time = context.partition_end      # Your logic here, e.g., querying a database for data within the month     # ...      return data

In this example, the

get_dynamic_start_date

function is responsible for determining the start date dynamically. You would replace the logic inside this function with whatever conditions or calculations are appropriate for your use case. Once you have the dynamic start date, you pass it to the

MonthlyPartitionsDefinition

when defining your asset. Keep in mind that this approach only sets the start date dynamically at the time of defining the partitions. Once the partitions are defined, they are fixed and will not change unless you redefine them. If you need to adjust the partitions on a regular basis (e.g., every time you run a pipeline), you would need to ensure that the logic for determining the start date is executed as part of your pipeline's setup or execution process.

Peter Mburu

02/22/2024, 12:18 PM

@Scout with multi-assets can i have partion for each ouput

Scout

02/22/2024, 12:18 PM

As of my knowledge cutoff date in early 2023, in Dagster, when you define a

@multi_asset

, you can specify a single

partitions_def

that applies to all assets produced by the multi-asset. However, each individual output within the multi-asset cannot have its own distinct

partitions_def

. Here's an example of how you might define a multi-asset with a common partition definition:

Copy code

python from dagster import multi_asset, StaticPartitionsDefinition, AssetOut  partitions_def = StaticPartitionsDefinition(["partition1", "partition2", "partition3"])  @multi_asset(     outs={         "output1": AssetOut(),         "output2": AssetOut(),     },     partitions_def=partitions_def, ) def my_multi_asset():     # Your logic here     yield AssetMaterialization(asset_key="output1", partition="partition1")     yield Output(value1, "output1")     yield AssetMaterialization(asset_key="output2", partition="partition2")     yield Output(value2, "output2")

In this example, both

output1

and

output2

will share the same partition definition, which includes "partition1", "partition2", and "partition3". If you need different partitioning schemes for each output, you would typically define separate assets with their own

partitions_def

. As of my knowledge cutoff date, Dagster does not support having different partition definitions for each output within a single

@multi_asset

. If this capability has been added in a version of Dagster released after my knowledge cutoff date, you should refer to the latest Dagster documentation or release notes for updated information on this feature.

Peter Mburu

02/22/2024, 1:20 PM

@Scout agster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "remote_imf_data": The above exception was caused by the following exception: AttributeError: 'AssetExecutionContext' object has no attribute 'partition_start' Stack Trace: File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary yield , File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 467, in iterate_with_context next_output = next(iterator) , File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/imf_ingestion.py", line 514, in remote_imf_data period_date= context.partition_start

Scout

02/22/2024, 1:20 PM

The error message you're encountering indicates that within the

remote_imf_data

op, you're trying to access an attribute

partition_start

on an

AssetExecutionContext

object, but this attribute does not exist. This typically happens when you're attempting to use partition-specific attributes in a context that is not set up for partitioned execution. In Dagster, partition-specific attributes like

partition_start

and

partition_end

are available when you're executing a partitioned asset or within a partitioned run. If you're trying to access these attributes outside of a partitioned context, you'll encounter an

AttributeError

. To resolve this issue, ensure that: 1. Your asset is correctly configured with a

partitions_def

. 2. You're executing your asset within a partitioned run context. Here's an example of how you might define a partitioned asset:

Copy code

python from dagster import asset, DailyPartitionsDefinition  partitions_def = DailyPartitionsDefinition(start_date="2021-01-01")  @asset(partitions_def=partitions_def) def my_partitioned_asset(context):     # Now you can safely access partition_start and partition_end     start_time = context.partition_start     end_time = context.partition_end      # Your logic here     # ...

When you execute this asset, Dagster will provide a context that includes

partition_start

and

partition_end

attributes. If you're already doing this and still encountering the error, it's possible that there's a mismatch between how you're executing the asset and its partitioned configuration. Make sure that when you execute the asset, you're specifying the partition you want to run. For example, if you're using the Dagster CLI, you would execute a partitioned asset like this:

Copy code

shell dagster asset materialize -a my_partitioned_asset -p partition_key

Replace

partition_key

with the actual key of the partition you want to materialize. If you're executing the asset within a job, ensure that the job itself is configured to run with partitions and that you're specifying the partition during execution. If you continue to face issues, please check the latest Dagster documentation or reach out to the Dagster community for support, as there may have been updates or changes after my last knowledge update.

Peter Mburu

02/22/2024, 1:37 PM

@Scout fix the code below: @multi_asset( outs={ "cpi_remote_data": AssetOut(is_required=False), # "price_remote_data": AssetOut(is_required=False), "gpd_remote_data": AssetOut(is_required=False), "labor_remote_data": AssetOut(is_required=False), }, internal_asset_deps={ "cpi_remote_data": { AssetKey("cpi_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, "gpd_remote_data": { AssetKey("gpd_labor_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, "labor_remote_data": { AssetKey("gpd_labor_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, }, can_subset=True, partitions_def = monthly_partition_def, ) def remote_imf_data( context: AssetExecutionContext, config: IMFConfig, get_imf_datasource_documents, get_imf_data_series, cpi_metadata, gpd_labor_metadata, ) -> Output[pd.DataFrame]: """ Get a list of all data matching metadata fetched in FetchMetadata task. """ gdp_result = fetch_remote_imf_data( config=config, imf_metadata_df=gpd_labor_metadata, datasource_documents=get_imf_datasource_documents, data_series=get_imf_data_series, document_type="Economic Statistics", domain="IFS", frequency="A", period_date= context.asset_partition_key_for_output() ) yield Output(gdp_result.value, metadata=gdp_result.metadata, output_name="gpd_remote_data") labor_result = fetch_remote_imf_data( config=config, imf_metadata_df=gpd_labor_metadata, datasource_documents=get_imf_datasource_documents, data_series=get_imf_data_series, document_type="Labor Statistics", domain="IFS", frequency="A", period_date= context.asset_partition_key_for_output() ) yield Output(labor_result.value, metadata=labor_result.metadata, output_name="labor_remote_data") cpi_result = fetch_remote_imf_data( config=config, imf_metadata_df=cpi_metadata, datasource_documents=get_imf_datasource_documents, data_series=get_imf_data_series, document_type="PriceIndex", domain="CPI", frequency="M", period_date= context.asset_partition_key_for_output() ) yield Output(cpi_result.value, metadata=cpi_result.metadata, output_name="cpi_remote_data")

Scout

02/22/2024, 1:37 PM

To fix the code, there are a few things we need to address: 1. The return type hint

-> Output[pd.DataFrame]

is incorrect for a

@multi_asset

function because it yields multiple outputs. We should remove the return type hint. 2. The

->

symbol seems to be a misinterpretation of the

->

symbol used for type hints. This should be corrected. 3. The

context.asset_partition_key_for_output()

method requires an

output_name

argument when used within a

@multi_asset

function. We need to pass the appropriate output name for each call to this method. 4. The

monthly_partition_def

needs to be defined before it's used in the

@multi_asset

decorator. 5. Ensure that all necessary imports are included, such as

AssetOut

AssetKey

Output

, and any other domain-specific imports like

IMFConfig

and

fetch_remote_imf_data

. Here's the corrected code: ```python from dagster import multi_asset, AssetOut, AssetKey, Output import pandas as pd # Assuming monthly_partition_def is defined somewhere above this code # monthly_partition_def = ... @multi_asset( outs={ "cpi_remote_data": AssetOut(is_required=False), "gpd_remote_data": AssetOut(is_required=False), "labor_remote_data": AssetOut(is_required=False), }, internal_asset_deps={ "cpi_remote_data": { AssetKey("cpi_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, "gpd_remote_data": { AssetKey("gpd_labor_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, "labor_remote_data": { AssetKey("gpd_labor_metadata"), AssetKey("get_imf_datasource_documents"), AssetKey("get_imf_data_series"), }, }, can_subset=True, partitions_def=monthly_partition_def, ) def remote_imf_data( context, config, get_imf_datasource_documents, get_imf_data_series, cpi_metadata, gpd_labor_metadata, ): """ Get a list of all data matching metadata fetched in FetchMetadata task. """ gdp_result = fetch_remote_imf_data( config=config, imf_metadata_df=gpd_labor_metadata, datasource_documents=get_imf_datasource_documents, data_series=get_imf_data_series, document_type="Economic Statistics", domain="IFS", frequency="A", period_date=context.asset_partition_key_for_output("gpd_remote_data") ) yield Output(gdp_result.value, metadata=gdp_result.metadata, output_name="gpd_remote_data") labor_result = fetch_remote_imf_data( config=config, imf_metadata_df=gpd_labor_metadata, datasource_documents=get_imf_datasource_documents, d

Scout

02/22/2024, 1:38 PM

ata_series=get_imf_data_series, document_type="Labor Statistics", domain="IFS", frequency="A", period_date=context.asset_partition_key_for_output("labor_remote_data") ) yield Output(labor_result.value, metadata=labor_result.metadata, output_name="labor_remote_data") cpi_result = fetch_remote_imf_data( config=config, imf_metadata_df=cpi_metadata, datasource_documents=get_imf_datasource_documents, data_series=get_imf_data_series, document_type="PriceIndex", domain="CPI", frequency="M", period_date=context.asset_partition_key_for_output("cpi_remote_data") ) yield Output(cpi_result.value, metadata=cpi_result.metadata, output_name="cpi_remote_data") ``` Make sure to replace the placeholder

monthly_partition_def

with the actual definition of your monthly partitions. Also, ensure that all the functions and classes like

IMFConfig

fetch_remote_imf_data

, and any other custom logic are properly defined and imported in your script.

7 Views

Open in Slack

Previous Next