Kevin Schaich
06/07/2023, 8:53 PMsandy
06/07/2023, 9:08 PMKevin Schaich
06/07/2023, 9:12 PMDockerRunLauncher
, and then using dagster_pyspark
to get a spark sessionKevin Schaich
06/07/2023, 9:12 PMKevin Schaich
06/07/2023, 9:14 PMmanaged_python_loggers
but I haven't been able to capture the stdout / what gets printed in the Docker container
python_logs:
python_log_level: INFO
managed_python_loggers:
- py4j.clientserver
- py4j.java_gateway
- py4j.protocol
- py4j.java_collections
- py4j.finalizer
- py4j.signals
- py4j
- pyspark
- pyspark.sql
- dagster_pyspark
- dagster-pyspark
sandy
06/07/2023, 10:25 PMKevin Schaich
06/07/2023, 10:28 PMDEBUG:py4j.clientserver:__ASSET_JOB_0 - 8283175e-b8ca-4e80-abae-4884af8242db - [redacted] - Answer received: !yv
DEBUG:py4j.clientserver:__ASSET_JOB_0 - 8283175e-b8ca-4e80-abae-4884af8242db - [redacted] - Command to send: c
o62
collectToPython
e
Kevin Schaich
06/07/2023, 10:30 PM2023-06-07 18:29:49 23/06/07 22:29:49 INFO Executor: Running task 26.0 in stage 1.0 (TID 78)
2023-06-07 18:29:49 23/06/07 22:29:49 INFO Executor: Finished task 21.0 in stage 1.0 (TID 73). 1964 bytes result sent to driver
2023-06-07 18:29:49 23/06/07 22:29:49 INFO TaskSetManager: Starting task 27.0 in stage 1.0 (TID 79) (462fb47a4b12, executor driver, partition 27, PROCESS_LOCAL, 7475 bytes)
sandy
06/07/2023, 10:34 PMmanaged_python_loggers
to only affect the logs that show up when the other icon ("Structured event logs") is selected.Kevin Schaich
06/07/2023, 10:35 PMsandy
06/07/2023, 10:36 PMKevin Schaich
06/07/2023, 10:38 PMKevin Schaich
06/14/2023, 5:41 PMalex
06/14/2023, 9:46 PMalex
06/14/2023, 9:50 PMalex
06/14/2023, 9:56 PMKevin Schaich
06/14/2023, 10:13 PMDockerRunLauncher
and a docker image that includes Spark and all our Python dependencies (I believe this is in-line w/ the rec in the docs for k8s/docker deployment).
I believe part of the complexity in using PySpark is that the Spark backend is a Java process regardless of which language you write your transforms in – so both the execution and logging are handled by Java – PySpark is just a Python wrapper around them. They make calls to the Java API through py4j and logging through log4j (latter is a Java library) – the little logs I was able to get to show up so far are by declaring py4j as a Dagster-managed logger, but it's not very useful output.
With that said, though – if they are making it to stdout on my Docker container, I feel like we should be able to capture them somehow, and PySpark is pretty widely used in terms of supported Dagster integrations so hopefully many will benefit if we can figure it out.alex
06/15/2023, 2:39 PMFidocia Adityawarman
07/31/2023, 4:10 PMalex
07/31/2023, 9:31 PMEdo
08/01/2023, 5:06 AMPatricia Goldberg
09/12/2023, 12:55 PMcompute_logs:
module: dagster.core.storage.local_compute_log_manager
class: LocalComputeLogManager
config:
base_dir: /opt/dagster/logs
Being /opt/dagster/logs
my mounted volume.
I can see my volume in Docker, and all of the subfolders being created by each run, I don’t see any files.
```
```Patricia Goldberg
09/12/2023, 3:20 PM