This <guide on Python Packages> has been extremely...
# ask-community
p
This guide on Python Packages has been extremely helpful for me (an R developer who is curious about Dagster) to get up and running quickly. I was wondering if a similar guide exists that explains using
pipenv
and/or
conda
environments in more detail? My organization depends heavily on a proprietary Python site package called ArcPy for geospatial analysis. Unfortunately, this package is only available as a conda environment. Are there any best practices that I should be aware of when using conda environments with dagster? Thanks in advance for the help!
z
It may be worth considering a custom docker image, but one big thing I’d consider is how would the license work? You could use conda, set up the config correctly to point to the conda’s version of python and such and then use this in a code location for all of your arcpy related jobs (and set concurrency such that you don’t get license errors). Maybe a side comment — but I’ve had endless issues with ArcPy and ESRI software across multiple jobs. There’s tons of complications that will arise when using it with something like dagster (Licensing, package versioning, etc.). We’ve ripped it out and as we port things from our legacy stack to dagster we’re refactoring it to use open source alternatives (gdal, geopandas, apache sedona, etc). We’ve often found these tend to run much faster as well, but YMMV.
❤️ 1
p
Thank you for the feedback and for sharing this link! Unfortunately ArcGIS does not play well with Docker yet (this is due to licensing issues). As far as I'm aware, the only way to license the arcpy conda environment is by manually installing ArcGIS Pro on the host machine. I'm going to explore cloning the arcpy conda environment into each new dagster project and try managing package dependencies with pipenv. It sounds like support for pipenv was recently introduced with ArcGIS Pro >=2.9. I've also experienced endless issues with ESRI software. I hate that it's closed source and it's really expensive for how often it breaks, and how poor their technical support is. We've already replaced it with the sf library and PostGIS for 90% of our work, but we are still dependent on its Network Analyst Utils and StreetMapPremium dataset. These tools are often mandated by state regulators and we haven't been able to convince them to let us use pgRouting and OSM yet... Thanks again for help! If I make any progress with arcpy and pipenv I'll be sure to document that here.
z
Ahh interesting, where are you guys currently running ArcPy? One approach that may not be the most ergonomic, but would at least allow you to orchestrate with dagster and not have to worry too much about esri being weird would be to essentially have dagster send some signal to the server you run it on. EG: You could maybe have a dagster asset that uses SSH to trigger work on your ESRI server and wait for the results.
p
We are currently running ArcPy on an on-prem VM running Windows 2019 server. Another team is having success getting ArcPy to run on an on-prem VM running RHEL 8 through WINE. Decoupling our ArcGIS computation from our orchestration server sounds like the best path. Thanks for the suggestion!