Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
I am joining forces with Bob Yelland (IBM) again to organise a joint meetup. I say again as we organised a joint session a few months back between the Big Data Developers in London and the Data+Visual meet up. I even gave a talk on that one, about “Data Visualisation: The good, the bad and the ugly” Unlike the previous one, we are actually physically joining the attendees rather than having parallel sessions.
The event is now live and it will take place on the 25th of August. Shall I see you there?
I have recently gone through the process of installing Spark in my mac for testing and development purposes. I also wanted to make sure I could use the installation not only with Scala, but also with PySpark through a Jupyter notebook.
If you are interested in doing the same, here are the steps I followed. First of all, here are the packages you will need:
Python 2.7 or higher
Java SE Development Kit
Scala and Scala Build Tool
Spark 1.6.1 (at the time of writing)
Jupyter Notebook
Python
You can chose the best python distribution that suits your needs. I find Anaconda to be fine for my purposes. You can obtain a graphical installer from https://www.continuum.io/downloads. I am using Python 2.7 at the time of writing.
Java SE Development Kit
You will need to download Oracle Java SE Development Kit 7 or 8 at Oracle JDK downloads page. In my case, at the time of writing I am using 1.7.0_80. You can check the version you have by opening a terminal and typing
java -version
You also have to make sure that the appropriate environment variable is set up. In your
~/.bashr_profile
add the following lines:
export JAVA_HOME=$(/usr/libexec/java_home)
Scala and Scala Build Tool
In this case, I found it much easier to use Homebrew to install and manage the Scala language. I f you have never used Homebrew, I recommend that you take a look. To install it you have to type the following in your terminal:
Note that for building Spark with Scala 2.11 you will need to download the Spark source code and build it appropriately.
Once you have downloaded the tgz file, unzip it into an appropriate location (your home directory for example) and navigate to the unzipped folder (for example
~/spark-1.6.1
)
To build Spark with Scala 2.11 you need to type the following commands:
I had the opportunity to attend the Strata+Hadoop World conference in London last week on the 2nd and 3rd of June. It was held in the ExCeL Centre in East London. Given the size of the venue, I had the expectation that it was going to be a massive event… Don’t take me wrong, it was indeed big, but I thought it was going to be even bigger. Colleagues that have attended other editions in San Jose did also remark that this one was on the smaller side of things.
In any case, I had the opportunity to talk to a lot of very interesting and engaging people, and heard about the work that large and small companies in the scene are doing. I had the chance to present a demo on the use of Spark in Bluemix and I think it went really well.
Re-blogged from the Morning Download: CIO Journal – WSJ. The Morning Download comes from the editors of CIO Journal and cues up the most important news in business technology every weekday morning.
The Morning Download: Big Push for Big Data
by Michael Hickins
Good morning. Big Data is in danger of becoming a buzzword as meaningless as “cloud” became in 2009, when the National Institute of Standards and Technology had to step in with an official definition. We’re not quite there yet, but the hype is getting so hot and heavy that you could be forgiven for believing there isn’t any there, there.
That hasn’t stopped the venture capital community from pouring millions into the field — including into startups like Hortonworks Inc. and Cloudera Inc., which are in the business of smoothing out the rough edges of the open source Hadoop analytic framework. Traditional technology vendors are just as convinced that there’s fire along with all the hot air — witness Intel Corp.’s undisclosed but significant investment in Cloudera, announced last week.
This week, Hortonworks will introduce an update to its commercial version Hadoop that is intended to make the technology easier to manage. Hortonworks CEO Rob Bearden tells CIO Journal that he hopes CIOs will come to view Hadoop as the “enterprise data platform where the vast majority of all data lands and is managed.” For all its promise, and a large number of trials in progress, Hadoop is still in actual use by just around 1,000 businesses, according to Gartner Inc. analyst Merv Adrian.