Does any body knows how to cope with this (docker ...
# ask-community
m
Does any body knows how to cope with this (docker executor):
dagster._core.errors.DagsterTypeCheckDidNotPass: Type check failed for step output "result" - expected type "DataFrame". Description: Value of type <class 'NoneType'> failed type check for Dagster type DataFrame, expected value to be of Python type pandas.core.frame.DataFrame.
Stack Trace:
File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_plan.py", line 224, in dagster_event_sequence_for_step
for step_event in check.generator(step_events):
,  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 363, in core_dagster_event_sequence_for_step
for evt in _type_check_and_store_output(step_context, user_event, input_lineage):
,  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 417, in _type_check_and_store_output
for output_event in _type_check_output(step_context, step_output_handle, output, version):
,  File "/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/execute_step.py", line 283, in _type_check_output
raise DagsterTypeCheckDidNotPass(
Test for an asset directly from container passes positive, same goes for a local test (data is parsed properly from DB), however container materialization doesn't work at all with Docker Runner.
😪 1
Strange! Changing:
return result
To:
return pd.DataFrame(result)
Fixed the problem... Where it makes a little nonsense since result already is a pd.DataDrame...
a
based on
Value of type <class ‘NoneType’> failed type check
you are getting
None
as the value for some reason. The solution you posted above passes the type check by making an empty dataframe from the
None
value. When you run the container directly - do you have to do any volume mounting or env var passing? Do you have the docker executor configured to do the same?
❤️ 1
m
Alex, I pass the ENVs to docker image through docker compose and as I checked with the Portainer, they exist in it and are mapped properly. Pytest on asset returns as positive:
def test_get_sales_assets():
assets = [
get_sales,
head_sales
]
result = materialize(assets)
assert result.success
actual_frame = get_sales()
assert len(actual_frame) > 50
assert type(actual_frame) == pd.DataFrame
data_to_str = <http://actual_frame.to|actual_frame.to>_string(
columns=["sale_number"], header=False, index=False, index_names=False)
assert "0801923901" in data_to_str
However... In Dagit itself, I cannot check the data. Is there any way to debug container runner? Should I have an option to watch data in asset or not? Code which I use to fetch data looks like this (besides query):
@asset(group_name="rusty_assets")
def get_sales() -> pd.DataFrame:
conn = SqlServerDomainConnection("srv", "sales", 1433, DOMAIN_NAME, DOMAIN_USR, DOMAIN_PWD)
result = conn.query("""SELECT
store, sale_number, sale_value
FROM dbo.POS_Sales
WHERE sale_date = DATEADD(DAY, -1, GETDATE())
AND store = 1""")
return result
Code behind the connecion class, builds engine with:
def __post_init__(self):
if (self.host_name, self.database, self.host_port, self.domain_name, self.domain_user, self.domain_password) is not None:
temp_engine = f"mssql+pymssql://{self.domain_name}\\{self.domain_user}:{self.domain_password}@{self.host_name}:{self.host_port}/{self.database}"
self.engine = create_engine(temp_engine)
Code for query on connection is a dead simple:
def query(self, database_query: str):
try:
if self.resolve_host() == 1:
query_result = pd.read_sql(database_query, self.engine)
self.engine.dispose()
return query_result
elif self.resolve_host() == 0:
print(
"Error while resolving hostname: {}".format(
self.host_name))
except SQLAlchemyError as err:
print(err)
Ps. Query returns the data alone, while used in typical Python code, same goes for query alone in MSSQL.
I think I might found a reason, first I added (as you pointed out) the ENV's to dagster.yaml for a docker runner, but it didn't work since runner through an error for missing ENV in image. Then I added for both: core and daemon, the ENV to the environments in docker compose. Remove conversion from to pd.DataFrame (left the type annonation in function itself) and now... I think it works (see the image). In my opinion it would be good to add to docs - that for a docker runner, the ENVS need to be passed not only for an docker service on which container runs inherit, but also to the daemon. Best regards, Mike