Skip to content

Repo to demonstrate the usage of Apache Spark within a Jupyter notebook within ArcGIS Pro

License

Notifications You must be signed in to change notification settings

carstenpiepel/spark-esri

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark ESRI

Project to demonstrate the usage of Apache Spark within a Jupyter notebook within ArcGIS Pro.

Dec 16, 2021 - Added check for env var SPARK_HOME to override built-in spark. For example, download spark-3.1.1-bin-hadoop2.7.tgz and set env var SPARK_HOME to the extracted folder location.

Oct 30, 2021 - Pro 2.8 relies on the Windows registry to find the active conda environment. The registry key is HKEY_CURRENT_USER/SOFTWARE/ESRI/ArcGISPro/PythonCondaEnv. The value of this key is used to set the required os environment variable PYSPARK_PYTHON for PySpark to work correctly in a Pro notebook.

As of this writing, the order to detect the active conda environment is as follows:

  • look for env var CONDA_DEFAULT_ENV.
  • look for %LOCALAPPDATA%/ESRI/conda/envs/proenv.txt, in case of an older Pro version.
  • look for HKEY_CURRENT_USER/SOFTWARE/ESRI/ArcGISPro/PythonCondaEnv.

Oct 27, 2021 - Pro 2.8.3 removed the reliance and existence of the file %LOCALAPPDATA%/ESRI/conda/envs/proenv.txt. It now depend on env var CONDA_DEFAULT_ENV to determine the activate conda env.

Sep 16, 2021 - Perform the following as a patch for Pro 2.8.3

cd c:\
git clone https://github.com/kontext-tech/winutils

Define a system environment variable HADOOP_HOME with value C:\winutils\hadoop-3.3.0 and add to system variable PATH the %HADOOP_HOME%/bin value.

NOTE: This works in Pro 2.6 ONLY. There is a small "issue" with Pro 2.7 and pyarrow. The folks in Redlands have a fix that will be in 2.8 :-(

Create a new Pro Conda Environment.

Start a Python Command Prompt:

Note: You might need to add proxy settings to .condarc located in C:\Program Files\ArcGIS\Pro\bin\Python.

conda config --set proxy_servers.http http://username:password@host:port
conda config --set proxy_servers.https https://username:password@host:port

The above will produce something like the below:

ssl_verify: true
proxy_servers:
  http: http://domainname\username:password@host:port
  https: http://domainname\username:password@host:port

Create a new conda environment:

proswap arcgispro-py3
conda remove --yes --all --name spark_esri
conda create --yes --name spark_esri --clone arcgispro-py3
proswap spark_esri

Optional:

pip install fsspec==2021.8.1 boto3==1.18.35 s3fs==0.4.2 pyarrow==1.0.1
conda install --yes -c esri -c conda-forge -c default^
    "numba=0.53.*"^
    "pandas=1.2.*"^
    "untangle=1.1.*"^
    "pyodbc=4.0.*"^
    "gcsfs=0.7.*"        

Install the Esri Spark module.

Note: You might need to install Git for Windows.

git clone https://github.com/mraad/spark-esri.git
cd spark-esri
python setup.py install

MicroPathing Notebook

Please note the usage of the range slider on the map to filter the micropaths between a user defined hour of day.

The following is the resulting crossing points and gates statistics.

TODO

  • Unify spark_esri and spark_dbconnect python modules.

References

About

Repo to demonstrate the usage of Apache Spark within a Jupyter notebook within ArcGIS Pro

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 76.6%
  • Python 22.8%
  • Shell 0.6%