Hi! I'm uploading the manifest from a GCP bucket. ...
# integration-dbt
n
Hi! I'm uploading the manifest from a GCP bucket. It reads it but I get this error:
Copy code
TypeError: '>' not supported between instances of 'str' and 'int'
Copy code
# Instantiate a Google Cloud Storage client and specify required bucket and file
storage_client = storage.Client()
bucket = storage_client.get_bucket('dagster-dbt')
blob = bucket.blob('manifest/manifest.json','r')

# Download the contents of the blob as a string and then parse it using json.loads() method

manifest = json.loads(blob.download_as_string(client=None))


dbt_assets = load_assets_from_dbt_manifest(
    manifest,
    key_prefix=None,
)

defs = Definitions(
    assets=dbt_assets,
)
this is how i'm loading the manifest
o
hi @Nicolas Luchetti! do you have a full stack trace?
n
Hi @owen Do you mean the complete code of the .py?
o
just the full error message, to see where
TypeError: '>' not supported between instances of 'str' and 'int'
is being thrown from
n
oh, ok!
Copy code
TypeError: '>' not supported between instances of 'str' and 'int'
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/dagster/_grpc/server.py", line 267, in __init__
    self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/dagster/_grpc/server.py", line 116, in __init__
    loadable_targets = get_loadable_targets(
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 47, in get_loadable_targets
    else loadable_targets_from_python_module(module_name, working_directory)
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 36, in loadable_targets_from_python_module
    module = load_python_module(
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 138, in load_python_module
    return importlib.import_module(module_name)
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/nicolas.luchetti/Repos/datateam-dagster-ochestrator-jobs/datacrowd_orchestrator/datacrowd_orchestrator_dagster_dbt/__init__.py", line 4, in <module>
    from .assets import *
  File "/Users/nicolas.luchetti/Repos/datateam-dagster-ochestrator-jobs/datacrowd_orchestrator/datacrowd_orchestrator_dagster_dbt/assets/__init__.py", line 31, in <module>
    blob = bucket.blob('manifest/manifest.json','r')
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/google/cloud/storage/bucket.py", line 795, in blob
    return Blob(
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 219, in __init__
    self.chunk_size = chunk_size  # Check that setter accepts value.
  File "/Users/nicolas.luchetti/.pyenv/versions/3.10.4/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 262, in chunk_size
    if value is not None and value > 0 and value % self._CHUNK_SIZE_MULTIPLE != 0:
o
ah I see -- I think this might be an artifact of multiple processes trying to download the same file into the same location at the same time (so you end up with invalid data when trying to read from a file that's being written to)
hm that might not be quite right, as you're not downloading the file locally (you're just reading it as a string). does this happen every time you try to run your code, or just sometimes?
regardless, this seems to be happening in the part of your code that's reading from gcs, rather than anything specific to dbt.
I might recommend separating the process that downloads your manifest.json file from the import path for your code. a simple option would just be a short python script that runs the download code and saves the manifest to a local file. then in your dagster code, you just read from the local file. whenever you want the newest version of your manifest, you can just run the script (if you're using docker, you can run that script in the docker build step)
n
thanks Owen! I will review what you point out
thanks!
Hi Owen, I was able to do it this way:
Copy code
for source in manifest["sources"]:
    manifest["sources"][source]["unique_id"] = manifest["sources"][source][
        "unique_id"
    ].replace("-", "_")
but, when I want to do local tests, the following happens to me
I created the run_results file inside target, but when it is executed, the file is deleted. It's very weird. do you know what could be happening?
o
so the run_results.json is expected to be produced automatically during dbt execution (i.e. after dbt successfully completes, there should be a run_results.json file available). when you get that error, it tends to indicate that dbt did not execute successfully -- you might get a better idea of what went wrong in the logs above that message (specifically related to dbt)
it's also a bit unclear to me why that code that modifies the manifest dictionary would be necessary