Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
2. Our brains label someone as an ‘outsider’ or part of ‘our group’ within 170 thousandths of a second. The neuroscience of populism runs deep, but advances in understanding the brain could drive huge progress.
I have recently gone through the process of installing Spark in my mac for testing and development purposes. I also wanted to make sure I could use the installation not only with Scala, but also with PySpark through a Jupyter notebook.
If you are interested in doing the same, here are the steps I followed. First of all, here are the packages you will need:
Python 2.7 or higher
Java SE Development Kit
Scala and Scala Build Tool
Spark 1.6.1 (at the time of writing)
You can chose the best python distribution that suits your needs. I find Anaconda to be fine for my purposes. You can obtain a graphical installer from https://www.continuum.io/downloads. I am using Python 2.7 at the time of writing.
Java SE Development Kit
You will need to download Oracle Java SE Development Kit 7 or 8 at Oracle JDK downloads page. In my case, at the time of writing I am using 1.7.0_80. You can check the version you have by opening a terminal and typing
You also have to make sure that the appropriate environment variable is set up. In your
add the following lines:
Scala and Scala Build Tool
In this case, I found it much easier to use Homebrew to install and manage the Scala language. I f you have never used Homebrew, I recommend that you take a look. To install it you have to type the following in your terminal:
I recently got a newsletter from the ODI. Nothing unusual there, except for the fact that my name was spelled wrongly. It is clear that their mail merge does not know how to handle accented characters, but I must admit that I quite like this version of my name… I mean it includes the square root of ! How cool is that‽
When the arena is something as pure as a board game, where the rules are entirely known and always exactly the same, the results are remarkable. When the arena is something as messy, unrepeatable and ill-defined as actuality, the business of adaptation and translation is a great deal more difficult.
This is a reblog of an article by Fulvia Montresor, Director, World Economic Forum. See the original here.
The 7 technologies changing your world
Find out how companies are changing their business models and organizational structures in The Digital Transformation of Industries, a live Davos debate taking place at 10.30am on Wednesday 20 January 2016.
From intelligent robots and self-driving cars to gene editing and 3D printing, dramatic technological change is happening at lightning speed all around us.
The Fourth Industrial Revolution is being driven by a staggering range of new technologies that are blurring the boundaries between people, the internet and the physical world. It’s a convergence of the digital, physical and biological spheres.
It’s a transformation in the way we live, work and relate to one another in the coming years, affecting entire industries and economies, and even challenging our notion of what it means to be human.
So what exactly are these technologies, and what do they mean for us?
Computing capabilities, storage and access
Between 1985 and 1989, the Cray-2 was the world’s fastest computer. It was roughly the size of a washing machine. Today, a smart watch has twice its capabilities.
As mobile devices become increasingly sophisticated, experts say it won’t be long before we are all carrying “supercomputers” in our pockets. Meanwhile, the cost of data storage continues to fall, making it possible keep expanding our digital footprints.
Today, 43% of the world’s population are connected to the internet, mostly in developed countries. The United Nations has set the goal of connecting all the world’s inhabitants to affordable internet by 2020. This will increase access to information, education and global marketplaces, which will empower many people to improve their living conditions and escape poverty.
Imagine a world where everyone is connected by mobile devices with unprecedented processing power and storage capacity
If we can achieving the goal of universal internet access and overcome other barriers such as digital illiteracy, everybody could have access to knowledge, and all the possibilities this brings.
Each time you run a Google search, scan your passport, make an online purchase or tweet, you are leaving a data trail behind that can be analysed and monetized.
Thanks to supercomputers and algorithms, we can make sense of massive amounts of data in real time. Computers are already making decisions based on this information, and in less than 10 years computer processors are expected to reach the processing power of the human brain.
Analysing medical data collated from different populations and demographics enables researchers to understand patterns and connections in diseases and identify which conditions improve the effectiveness of certain treatments and which don’t.
Big data will help to reduce costs and inefficiencies in healthcare systems, improve access and quality of care, and make medicine more personalized and precise.
In the future, we will all have very detailed digital medical profiles … including information that we’d rather keep private.
Digitization is empowering people to look after their own health. Think of apps that track how much you eat, sleep and exercise, and being able to ask a doctor a question by simply tapping it into your smartphone.
In addition, advances in technologies such as CRISPR/Cas9, which unlike other gene-editing tools, is cheap, quick and easy to use, could also have a transformative effect on health, with the potential to treat genetic defects and eradicate diseases.
The digitization of matter
3D printers will create not only cars, houses and other objects, but also human tissue, bones and custom prosthetics. Patients would not have to die waiting for organ donations if hospitals could bioprint them.
The 3D printing market for healthcare is predicted to reach some $4.04 billion by 2018. According to a survey by the Global Agenda Council on the Future of Software and Society, most people expect that the first 3D printed liver will happen by 2025.
The survey also reveals that most people expect the first 3D printed car will be in production by 2022.
Three-dimensional printing, which brings together computational design, manufacturing, materials engineering and synthetic biology, reduces the gap between makers and users and removes the limitations of mass production.
Consumers can already design personalized products online, and will soon be able to simply press “print” instead of waiting for a delivery.
The internet of things
Within the next decade, it is expected that more than a trillion sensors will be connected to the internet.
If almost everything is connected, it will transform how we do business and help us manage resources more efficiently and sustainably. Connected sensors will be able to share information from their environment and organize themselves to make our lives easier and safer. For example, self-driving vehicles could “communicate” with one another, preventing accidents.
By 2020 around 22% of the world’s cars will be connected to the internet (290 million vehicles), and by 2024, more than half of home internet traffic will be used by appliances and devices.
Home automation is also happening fast. We can control our lights, heating, air conditioning and security systems remotely, but how much longer will it be before sensors are able to detect crumbs under the table and tell our automated vacuum cleaners to tidy up?
The internet of things will create huge amounts of data, raising concerns over who will own it and how it will be stored. And what about the possibility that your home or car could be hacked?
Only a tiny fraction of the world’s GDP (around 0.025%) is currently held on blockchain, the shared database technology where transactions in digital currencies such as the Bitcoin are made.
But this could be about to change, as banks, insurers and companies race to work out how they can use the technology to cut costs.
A blockchain is essentially a network of computers that must all approve a transaction before it can be verified and recorded.
Using cryptography to keep transactions secure, the technology provides a decentralized digital ledger that anyone on the network can see.
Before blockchain, we relied on trusted institution such as a bank to act as a middleman. Now the blockchain can act as that trusted authority on every type of transaction involving value including money, goods and property.
The uses of blockchain technology are endless. Some expect that in less than 10 years it will be used to collect taxes. It will make it easier for immigrants to send money back to countries where access to financial institutions is limited.
And financial fraud will be significantly reduced, as every transaction will be recorded and distributed on a public ledger, which will be accessible by anyone who has an internet connection.
Technology is getting increasingly personal. Computers are moving from our desks, to our laps, to our pockets and soon they will be integrated into our clothing.
By 2025, 10% of people are expected to be wearing clothes connected to the internet and the first implantable mobile phone is expected to be sold.
Implantable and wearable devices such as sports shirts that provide real-time workout data by measuring sweat output, heart rate and breathing intensity are changing our understanding of what it means to be online and blurring the lines between the physical and digital worlds.
The potential benefits are great, but so are the challenges.
These devices can provide immediate information about our health and about what we see, or help locate missing children. Being able to control devices with our brains would enable disabled people to engage fully with the world. There would be exciting possibilities for learning and new experiences.
But how would it affect our personal privacy, data security and our personal relationships? In the future, will it ever be possible to be offline anymore?
I just received my 2015 annual report from WordPress, where I am told how my year in blogging went. This time I received two reports as I moved URLs some time in the first quarter of the year. so here are some bits and bobs:
That’s 90 countries in all! Most visitors came from United Kingdom. The United States & Russia were not far behind.
ANNUAL REPORT FOR QUANTUMTUNNEL.WORDPRESS.COM
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 43,000 times in 2015. If it were a concert at Sydney Opera House, it would take about 16 sold-out performances for that many people to see it.
There were 11 pictures uploaded, taking up a total of 14 MB. That’s about a picture per month.
Last Thursday I attended a Cloudera Breakfast Briefing where Sean Owen was speaking about Spark and the examples were related to building decision trees and random forests. It was a good session in general.
Sean started his talk with an example using the Iris dataset using R, in particular the “party” library. He then moved on to talk about Spark and MLlib.
For the rest of the talk he used the “Covertype” data set that contains 581,012 data points describing trees using 54 features (elevation, slope, soil tye, etc,) predicting forest cover type (spruce, aspen, etc.). A very apt dataset for the construction of random forests, right? I was very pleased to see a new (for me) dataset being used!
Sean want over some bits and pieces about using Spark, highlighting the compactness of the code. He also turned his attention to the tuning of hyper-parameters and its importance.
There are different ways to approach this, but it is always about finding a balance, a trade-off. For a tree we can play with the depth of the tree, the maximum number of bins (i.e. the number of different decision rules to be tried), the amount of impurity (Gini or Entropy measures).
If we don’t know the right values for the hyperparameters, we can try several ones. Particularly if you have enough room on your cluster.
Building a random forest: let various trees see only a subset of the data, then combine. Another approach is to let the trees see a subset of the features. The latter is a nice idea as this may be a more reasonable approach for large clusters, where communication among nodes is kept to a minimum -> good for Spark or Hadoop.
Sean finished with some suggestions of things one can try:
Try SVM and LogisticRegression in MLlib
Real-time scoring with Spark Streaming
Use random decision forests for regression
Nonetheless, the best bit of this all was that after asking a couple of questions I managed to get my hands in a “Tofu Scientist” T-Shirt! Result!
Really thrilled to continue seeing the American Museum of Natural History series Shelf Life. I blogged about this series earlier on in the year and they have kept to their word with interesting and unique instalments.
In Episode 6 we get to hear about micropaleontology, the study of fossil specimens that are so tiny you cannot see them with the naked eye. The scientist and researchers tell us about foramnifera, unicellular organisms belonging to the kingdom Protista and which go back to about 65 million years. In spite of being unicellular, they make shells! And this is indeed what makes it possible for them to become fossilised.
Interestingly enough these fossils allow us to used them as ways to tell something about ancient climate data. As Roberto Moncada pointed out to me:
According to our expert in the piece, basically every representational graph you’ve ever seen of climate/temperatures from the Earth’s past is derived from analyzing these tiny little creatures.
The Tiniest Fossils are indeed among the most important for climate research!