Skip to content

Installing Spark 1.6.1 on a Mac with Scala 2.11

I have recently gone through the process of installing Spark in my mac for testing and development purposes. I also wanted to make sure I could use the installation not only with Scala, but also with PySpark through a Jupyter notebook.

If you are interested in doing the same, here are the steps I followed. First of all, here are the packages you will need:

  • Python 2.7 or higher
  • Java SE Development Kit
  • Scala and Scala Build Tool
  • Spark 1.6.1 (at the time of writing)
  • Jupyter Notebook

Python

You can chose the best python distribution that suits your needs. I find Anaconda to be fine for my purposes. You can obtain a graphical installer from https://www.continuum.io/downloads. I am using Python 2.7 at the time of writing.

Java SE Development Kit

You will need to download Oracle Java SE Development Kit 7 or 8 at Oracle JDK downloads page. In my case, at the time of writing I am using 1.7.0_80. You can check the version you have by opening a terminal and typing

java -version

You also have to make sure that the appropriate environment variable is set up. In your

~/.bashr_profile

  add the following lines:

export JAVA_HOME=$(/usr/libexec/java_home)

Scala and Scala Build Tool

In this case, I found it much easier to use Homebrew to install and manage the Scala language. I f you have never used Homebrew, I recommend that you take a look. To install it you have to type the following in your terminal:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once you have Hombrew you can install Scala and the Scala Build Tool as follows:

> brew install scala
> brew install bst

You may want to create appropriate environments in your 

~/.bashr_profile

 :

export SCALA_HOME=/usr/local/bin/scala
export PATH=$PATH:$SCALA_HOME/bin

Spark 1.6.1

Obtain Spark from https://spark.apache.org/downloads.html

Note that for building Spark with Scala 2.11 you will need to download the Spark source code and build it appropriately.

Download_Spark

Once you have downloaded the tgz file, unzip it into an appropriate location (your home directory for example) and navigate to the unzipped folder (for example

~/spark-1.6.1

 )

To build Spark with Scala 2.11 you need to type the following commands:

> ./dev/change-version-to-2.11.sh
> build/sbt clean assembly

This may take a while, so sit tight! When finished, you can check that everything is working by launching either the Scala shell:

> ./bin/spark-shell

or the Python shell:

> ./bin/pyspark

Once again there are some environment variables that are recommended:

export SPARK_PATH=~/spark-1.6.1
export PYSPARK_DRIVER_PYTHON="jupyter" 
export PYSPARK_DRIVER_PYTHON_OPTS="notebook" 
alias sparknb='$SPARK_PATH/bin/pyspark --master local[2]'

The last line is an alias that will enable us to launch a Jupyter notebook with PySpark. Totally optional!

Jupyter Notebook

If all is working well you are ready to go. Source your

bash_profile

  and  launch a Jupyter notebook:

> sparknb

Et voilà!