Installing Spark 1.6.1 on a Mac with Scala 2.11

I have recently gone through the process of installing Spark in my mac for testing and development purposes. I also wanted to make sure I could use the installation not only with Scala, but also with PySpark through a Jupyter notebook.

If you are interested in doing the same, here are the steps I followed. First of all, here are the packages you will need:

  • Python 2.7 or higher
  • Java SE Development Kit
  • Scala and Scala Build Tool
  • Spark 1.6.1 (at the time of writing)
  • Jupyter Notebook

Python

You can chose the best python distribution that suits your needs. I find Anaconda to be fine for my purposes. You can obtain a graphical installer from https://www.continuum.io/downloads. I am using Python 2.7 at the time of writing.

Java SE Development Kit

You will need to download Oracle Java SE Development Kit 7 or 8 at Oracle JDK downloads page. In my case, at the time of writing I am using 1.7.0_80. You can check the version you have by opening a terminal and typing

You also have to make sure that the appropriate environment variable is set up. In your ~/.bashr_profile  add the following lines:

Scala and Scala Build Tool

In this case, I found it much easier to use Homebrew to install and manage the Scala language. I f you have never used Homebrew, I recommend that you take a look. To install it you have to type the following in your terminal:

Once you have Hombrew you can install Scala and the Scala Build Tool as follows:

You may want to create appropriate environments in your  ~/.bashr_profile :

Spark 1.6.1

Obtain Spark from https://spark.apache.org/downloads.html

Note that for building Spark with Scala 2.11 you will need to download the Spark source code and build it appropriately.

Download_Spark

Once you have downloaded the tgz file, unzip it into an appropriate location (your home directory for example) and navigate to the unzipped folder (for example ~/spark-1.6.1 )

To build Spark with Scala 2.11 you need to type the following commands:

This may take a while, so sit tight! When finished, you can check that everything is working by launching either the Scala shell:

or the Python shell:

Once again there are some environment variables that are recommended:

The last line is an alias that will enable us to launch a Jupyter notebook with PySpark. Totally optional!

Jupyter Notebook

If all is working well you are ready to go. Source your bash_profile  and  launch a Jupyter notebook:

Et voilà!