Computer Programming Knowledge

I came across the image above in the Slack channel of the University of Hertfordshire Centre for Astrophysics Research. It summarises some of the fundamental knowledge in computer science that was assumed necessary at some point in time: Binar, CPU execution and algorithms.

They refer to 7 algorithms, but actually rather than actual algorithms they are classes:

  1. Sort
  2. Search
  3. Hashing
  4. Dynamic Programming
  5. Binary Exponentiation
  6. String Matching and Parsing
  7. Primality Testing

I like the periodic table shown at the bottom of the graphic. Showing some old friends such as Fortran, C, Basic and Cobol. Some other that are probably not used all that much, and others that have definitely been rising: Javascript, Java, C++, Lisp. It is great to se Python, number 35, listed as Multi-Paradigm!

Enjoy!

Python – Pendulum

Working with dates and times in programming can be a painful test at times. In Python, there are some excellent libraries that help with all the pain, and recently I became aware of Pendulum. It is effectively are replacement for the standard datetime class and it has a number of improvements. Check out the documentation for further information.

Installation of the packages is straightforward with pip:

$ pip install pendulum

For example, some simple manipulations involving time zones:

import pendulum

now = pendulum.now('Europe/Paris')

# Changing timezone
now.in_timezone('America/Toronto')

# Default support for common datetime formats
now.to_iso8601_string()

# Shifting
now.add(days=2)

Duration can be used as a replacement for the standard timedelta class:

dur = pendulum.duration(days=15)

# More properties
dur.weeks
dur.hours

# Handy methods
dur.in_hours()
360
dur.in_words(locale='en_us')
'2 weeks 1 day'

It also supports the definition of a period, i.e. a duration that is aware of the DateTime instances that created it. For example:

dt1 = pendulum.now()
dt2 = dt1.add(days=3)

# A period is the difference between 2 instances
period = dt2 - dt1

period.in_weekdays()
period.in_weekend_days()

# A period is iterable
for dt in period:
    print(dt)


Give it a go, and let me know what you think of it. 

IEEE Language Rankings 2018

Python retains its top spot in the fifth annual IEEE Spectrum top programming language rankings, and also gains a designation as an “embedded language”. Data science language R remains the only domain-specific slot in the top 10 (where it as listed as an “enterprise language”) and drops one place compared to its 2017 ranking to take the #7 spot.

Looking at other data-oriented languages, Matlab as at #11 (up 3 places), SQL is at #24 (down 1), Julia at #32 (down 1) and SAS at #40 (down 3). Click the screenshot below for an interactive version of the chart where you can also explore the top 50 rankings.

Language Rank

The IEEE Spectrum rankings are based on search, social media, and job listing trends, GitHub repositories, and mentions in journal articles. You can find details on the ranking methodology here, and discussion of the trends behind the 2018 rankings at the link below.

IEEE Spectrum: The 2018 Top Programming Languages

JupyterLab is Ready for Users

This is a reblog of this original post.

JupyterLab is Ready for Users

We are proud to announce the beta release series of JupyterLab, the next-generation web-based interface for Project Jupyter.

Project Jupyter Feb 20
tl;dr: JupyterLab is ready for daily use (documentation, try it with Binder)

1*_jDTWlZNUySwrRBgVNqoNw.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png“>https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png</a>“&amp;amp;gt;

JupyterLab is an interactive development environment for working with notebooks, code, and data.

The Evolution of the Jupyter Notebook

Project Jupyter exists to develop open-source software, open standards, and services for interactive and reproducible computing.

Since 2011, the Jupyter Notebook has been our flagship project for creating reproducible computational narratives. The Jupyter Notebook enables users to create and share documents that combine live code with narrative text, mathematical equations, visualizations, interactive controls, and other rich output. It also provides building blocks for interactive computing with data: a file browser, terminals, and a text editor.

The Jupyter Notebook has become ubiquitous with the rapid growth of data science and machine learning and the rising popularity of open-source software in industry and academia:

  • Today there are millions of users of the Jupyter Notebook in many domains, from data science and machine learning to music and education. Our international community comes from almost every country on earth.¹
  • The Jupyter Notebook now supports over 100 programming languages, most of which have been developed by the community.
  • There are over 1.7 million public Jupyter notebooks hosted on GitHub. Authors are publishing Jupyter notebooks in conjunction with scientific research, academic journals, data journalism, educational courses, and books.

At the same time, the community has faced challenges in using various software workflows with the notebook alone, such as running code from text files interactively. The classic Jupyter Notebook, built on web technologies from 2011, is also difficult to customize and extend.

JupyterLab: Ready for Users

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

1*O20XGvUOTLoFKQ9o20usIA.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png“>https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png</a>“&amp;amp;gt;

JupyterLab enables you to arrange your work area with notebooks, text files, terminals, and notebook outputs.JupyterLab provides a high level of integration between notebooks, documents, and activities:

  • Drag-and-drop to reorder notebook cells and copy them between notebooks.
  • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
  • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
  • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

JupyterLab has been over three years in the making, with over 11,000 commits and 2,000 releases of npm and Python packages. Over 100 contributors from the broader community have helped build JupyterLab in addition to our core JupyterLab developers.

To get started, see the JupyterLab documentation for installation instructions and a walk-through, or try JupyterLab with Binder. You can also set up JupyterHub to use JupyterLab.

Customize Your JupyterLab Experience

JupyterLab is built on top of an extension system that enables you to customize and enhance JupyterLab by installing additional extensions. In fact, the builtin functionality of JupyterLab itself (notebooks, terminals, file browser, menu system, etc.) is provided by a set of core extensions.

1*OneJZOqKqBZ9oN80kRX7kQ.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png“>https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png</a>“&amp;amp;gt;

JupyterLab extensions enable you to work with diverse data formats such as GeoJSON, JSON and CSV.²Among other things, extensions can:

  • Provide new themes, file editors and viewers, or renderers for rich outputs in notebooks;
  • Add menu items, keyboard shortcuts, or advanced settings options;
  • Provide an API for other extensions to use.

Community-developed extensions on GitHub are tagged with the jupyterlab-extension topic, and currently include file viewers (GeoJSON, FASTA, etc.), Google Drive integration, GitHub browsing, and ipywidgets support.

Develop JupyterLab Extensions

While many JupyterLab users will install additional JupyterLab extensions, some of you will want to develop your own. The extension development API is evolving during the beta release series and will stabilize in JupyterLab 1.0. To start developing a JupyterLab extension, see the JupyterLab Extension Developer Guide and the TypeScript or JavaScript extension templates.

JupyterLab itself is co-developed on top of PhosphorJS, a new Javascript library for building extensible, high-performance, desktop-style web applications. We use modern JavaScript technologies such as TypeScript, React, Lerna, Yarn, and webpack. Unit tests, documentation, consistent coding standards, and user experience research help us maintain a high-quality application.

JupyterLab 1.0 and Beyond

We plan to release JupyterLab 1.0 later in 2018. The beta releases leading up to 1.0 will focus on stabilizing the extension development API, user interface improvements, and additional core features. All releases in the beta series will be stable enough for daily usage.

JupyterLab 1.0 will eventually replace the classic Jupyter Notebook. Throughout this transition, the same notebook document format will be supported by both the classic Notebook and JupyterLab.

Get Involved

There are many ways you can participate in the JupyterLab effort. We welcome contributions from all members of the Jupyter community:

  • Use our extension development API to make your own JupyterLab extensions. Please add the jupyterlab-extension topic if your extension is hosted on GitHub. We appreciate feedback as we evolve toward a stable API for JupyterLab 1.0.
  • Contribute to the development, documentation, and design of JupyterLab on GitHub. To get started with development, please see our Contributing Guide and Code of Conduct. We label issues that are ideal for new contributors as “good first issue” or “help wanted”.
  • Connect with us on our GitHub Issues page or on our Gitter Channel. If you find a bug, have questions, or want to provide feedback, please join the conversation.

We are thrilled to see how you use and extend JupyterLab.

Sincerely,

The JupyterLab Team and Project Jupyter

We thank Bloomberg and Anaconda for their support and collaboration in developing JupyterLab. We also thank the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, and the Helmsley Charitable Trust for their support.

[1] Based on the 249 country codes listed under ISO 3166–1, recent Google analytics data from 2018 indicates that jupyter.org has hosted visitors from 213 countries.

[2] Data visualized in this screenshot is licensed CC-BY-NC 3.0. See http://datacanvas.org/public-transportation/ for more details.

10-10 Celebrate Ada Lovelace day!

It’s Ada Lovelace day, celebrating the work of women in mathematics, science, technology and engineering. To join the celebration +Plus Magazine revisits a collection of interviews with female mathematicians produced earlier this year. The interviews accompany the Women of Mathematics photo exhibition, which celebrates female mathematicians from institutions throughout Europe. It was launched in Berlin in the summer of 2016 and is now touring European institutions.

To watch the interviews with the women or read the transcripts, and to see the portraits that featured in the exhibition, click on the links below. For more content by or about female mathematicians click here.

Languages for Data Science

Very often the question about what programming language is best for data science work. The answer may depend on who you ask, there are many options out there and they all have their advantages and disadvantages. Here are some thoughts from Peter Gleeson on this matter:

While there is no correct answer, there are several things to take into consideration. Your success as a data scientist will depend on many points, including:

Specificity

When it comes to advanced data science, you will only get so far reinventing the wheel each time. Learn to master the various packages and modules offered in your chosen language. The extent to which this is possible depends on what domain-specific packages are available to you in the first place!

Generality

A top data scientist will have good all-round programming skills as well as the ability to crunch numbers. Much of the day-to-day work in data science revolves around sourcing and processing raw data or ‘data cleaning’. For this, no amount of fancy machine learning packages are going to help.

Productivity

In the often fast-paced world of commercial data science, there is much to be said for getting the job done quickly. However, this is what enables technical debt to creep in — and only with sensible practices can this be minimized.

Performance

In some cases it is vital to optimize the performance of your code, especially when dealing with large volumes of mission-critical data. Compiled languages are typically much faster than interpreted ones; likewise statically typed languages are considerably more fail-proof than dynamically typed. The obvious trade-off is against productivity.

To some extent, these can be seen as a pair of axes (Generality-Specificity, Performance-Productivity). Each of the languages below fall somewhere on these spectra.

With these core principles in mind, let’s take a look at some of the more popular languages used in data science. What follows is a combination of research and personal experience of myself, friends and colleagues — but it is by no means definitive! In approximately order of popularity, here goes:

R

What you need to know

Released in 1995 as a direct descendant of the older S programming language, R has since gone from strength to strength. Written in C, Fortran and itself, the project is currently supported by the R Foundation for Statistical Computing.

License

Free!

Pros

  • Excellent range of high-quality, domain specific and open source packages. R has a package for almost every quantitative and statistical application imaginable. This includes neural networks, non-linear regression, phylogenetics, advanced plotting and many, many others.
  • The base installation comes with very comprehensive, in-built statistical functions and methods. R also handles matrix algebra particularly well.
  • Data visualization is a key strength with the use of libraries such as ggplot2.

Cons

  • Performance. There’s no two ways about it, R is not a quick language.
  • Domain specificity. R is fantastic for statistics and data science purposes. But less so for general purpose programming.
  • Quirks. R has a few unusual features that might catch out programmers experienced with other languages. For instance: indexing from 1, using multiple assignment operators, unconventional data structures.

Verdict — “brilliant at what it’s designed for”

R is a powerful language that excels at a huge variety of statistical and data visualization applications, and being open source allows for a very active community of contributors. Its recent growth in popularity is a testament to how effective it is at what it does.

Python

What you need to know

Guido van Rossum introduced Python back in 1991. It has since become an extremely popular general purpose language, and is widely used within the data science community. The major versions are currently 3.6 and 2.7.

License

Free!

Pros

  • Python is a very popular, mainstream general purpose programming language. It has an extensive range of purpose-built modules and community support. Many online services provide a Python API.
  • Python is an easy language to learn. The low barrier to entry makes it an ideal first language for those new to programming.
  • Packages such as pandas, scikit-learn and Tensorflow make Python a solid option for advanced machine learning applications.

Cons

  • Type safety: Python is a dynamically typed language, which means you must show due care. Type errors (such as passing a String as an argument to a method which expects an Integer) are to be expected from time-to-time.
  • For specific statistical and data analysis purposes, R’s vast range of packages gives it a slight edge over Python. For general purpose languages, there are faster and safer alternatives to Python.

Verdict — “excellent all-rounder”

Python is a very good choice of language for data science, and not just at entry-level. Much of the data science process revolves around the ETL process (extraction-transformation-loading). This makes Python’s generality ideally suited. Libraries such as Google’s Tensorflow make Python a very exciting language to work in for machine learning.

SQL

What you need to know

SQL (‘Structured Query Language’) defines, manages and queries relational databases. The language appeared by 1974 and has since undergone many implementations, but the core principles remain the same.

License

Varies — some implementations are free, others proprietary

Pros

  • Very efficient at querying, updating and manipulating relational databases.
  • Declarative syntax makes SQL an often very readable language . There’s no ambiguity about what
    SELECT name FROM users WHERE age > 18

    is supposed to do!

  • SQL is very used across a range of applications, making it a very useful language to be familiar with. Modules such as SQLAlchemy make integrating SQL with other languages straightforward.

Cons

  • SQL’s analytical capabilities are rather limited — beyond aggregating and summing, counting and averaging data, your options are limited.
  • For programmers coming from an imperative background, SQL’s declarative syntax can present a learning curve.
  • There are many different implementations of SQL such as PostgreSQL, SQLite, MariaDB . They are all different enough to make inter-operability something of a headache.

Verdict — “timeless and efficient”

SQL is more useful as a data processing language than as an advanced analytical tool. Yet so much of the data science process hinges upon ETL, and SQL’s longevity and efficiency are proof that it is a very useful language for the modern data scientist to know.

Java

What you need to know

Java is an extremely popular, general purpose language which runs on the (JVM) Java Virtual Machine. It’s an abstract computing system that enables seamless portability between platforms. Currently supported by Oracle Corporation.

License

Version 8 — Free! Legacy versions, proprietary.

Pros

  • Ubiquity . Many modern systems and applications are built upon a Java back-end. The ability to integrate data science methods directly into the existing codebase is a powerful one to have.
  • Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For mission-critical big data applications, this is invaluable.
  • Java is a high-performance, general purpose, compiled language . This makes it suitable for writing efficient ETL production code and computationally intensive machine learning algorithms.

Cons

  • For ad-hoc analyses and more dedicated statistical applications, Java’s verbosity makes it an unlikely first choice. Dynamically typed scripting languages such as R and Python lend themselves to much greater productivity.
  • Compared to domain-specific languages like R, there aren’t a great number of libraries available for advanced statistical methods in Java.

Verdict — “a serious contender for data science”

There is a lot to be said for learning Java as a first choice data science language. Many companies will appreciate the ability to seamlessly integrate data science production code directly into their existing codebase, and you will find Java’s performance and and type safety are real advantages. However, you’ll be without the range of stats-specific packages available to other languages. That said, definitely one to consider — especially if you already know one of R and/or Python.

Scala

What you need to know

Developed by Martin Odersky and released in 2004, Scala is a language which runs on the JVM. It is a multi-paradigm language, enabling both object-oriented and functional approaches. Cluster computing framework Apache Spark is written in Scala.

License

Free!

Pros

  • Scala + Spark = High performance cluster computing. Scala is an ideal choice of language for those working with high-volume data sets.
  • Multi-paradigmatic: Scala programmers can have the best of both worlds. Both object-oriented and functional programming paradigms available to them.
  • Scala is compiled to Java bytecode and runs on a JVM. This allows inter-operability with the Java language itself, making Scala a very powerful general purpose language, while also being well-suited for data science.

Cons

  • Scala is not a straightforward language to get up and running with if you’re just starting out. Your best bet is to download sbt and set up an IDE such as Eclipse or IntelliJ with a specific Scala plug-in.
  • The syntax and type system are often described as complex. This makes for a steep learning curve for those coming from dynamic languages such as Python.

Verdict — “perfect, for suitably big data”

When it comes to using cluster computing to work with Big Data, then Scala + Spark are fantastic solutions. If you have experience with Java and other statically typed languages, you’ll appreciate these features of Scala too. Yet if your application doesn’t deal with the volumes of data that justify the added complexity of Scala, you will likely find your productivity being much higher using other languages such as R or Python.

Julia

What you need to know

Released just over 5 years ago, Julia has made an impression in the world of numerical computing. Its profile was raised thanks to early adoption by several major organizationsincluding many in the finance industry.

License

Free!

Pros

  • Julia is a JIT (‘just-in-time’) compiled language, which lets it offer good performance. It also offers the simplicity, dynamic-typing and scripting capabilities of an interpreted language like Python.
  • Julia was purpose-designed for numerical analysis. It is capable of general purpose programming as well.
  • Readability. Many users of the language cite this as a key advantage

Cons

  • Maturity. As a new language, some Julia users have experienced instability when using packages. But the core language itself is reportedly stable enough for production use.
  • Limited packages are another consequence of the language’s youthfulness and small development community. Unlike long-established R and Python, Julia doesn’t have the choice of packages (yet).

Verdict — “one for the future”

The main issue with Julia is one that cannot be blamed for. As a recently developed language, it isn’t as mature or production-ready as its main alternatives Python and R. But, if you are willing to be patient, there’s every reason to pay close attention as the language evolves in the coming years.

MATLAB

What you need to know

MATLAB is an established numerical computing language used throughout academia and industry. It is developed and licensed by MathWorks, a company established in 1984 to commercialize the software.

License

Proprietary — pricing varies depending on your use case

Pros

  • Designed for numerical computing. MATLAB is well-suited for quantitative applications with sophisticated mathematical requirements such as signal processing, Fourier transforms, matrix algebra and image processing.
  • Data Visualization. MATLAB has some great inbuilt plotting capabilities.
  • MATLAB is often taught as part of many undergraduate courses in quantitative subjects such as Physics, Engineering and Applied Mathematics. As a consequence, it is widely used within these fields.

Cons

  • Proprietary licence. Depending on your use-case (academic, personal or enterprise) you may have to fork out for a pricey licence. There are free alternatives available such as Octave. This is something you should give real consideration to.
  • MATLAB isn’t an obvious choice for general-purpose programming.

Veredict — “best for mathematically intensive applications”

MATLAB’s widespread use in a range of quantitative and numerical fields throughout industry and academia makes it a serious option for data science. The clear use-case would be when your application or day-to-day role requires intensive, advanced mathematical functionality; indeed, MATLAB was specifically designed for this.

Other Languages

There are other mainstream languages that may or may not be of interest to data scientists. This section provides a quick overview… with plenty of room for debate of course!

C++

C++ is not a common choice for data science, although it has lightning fast performance and widespread mainstream popularity. The simple reason may be a question of productivity versus performance.

As one Quora user puts it:

“If you’re writing code to do some ad-hoc analysis that will probably only be run one time, would you rather spend 30 minutes writing a program that will run in 10 seconds, or 10 minutes writing a program that will run in 1 minute?”

The dude’s got a point. Yet for serious production-level performance, C++ would be an excellent choice for implementing machine learning algorithms optimized at a low-level.

Verdict — “not for day-to-day work, but if performance is critical…”

Jetpack Issue Solved – HTTP status code was not 200 (500)

I have been having some troubles managing a wordpress site via the WordPress App. The stats and other things worked fine, but I was not able to see my existing posts or pages.

Everytime I tried to synchronise them I would get an error that read something along the lines of:

"transport error – HTTP status code was not 200 (500)”

I did not pay too much attention to it as I thought it was an error with the App and that in new updates it would get sorted… but that did not seem to be the case.

This morning I rolled my sleeves and decided to take a look at things. I ended up updating the PHP version in the site from 5.3 to 5.4 and it seems to have done the trick!!

So, I thought of letting you know. Now, I cannot guarantee that that is the end of the story, but at least my posts are showing in the mobile app.

-j

Raspberry Pi

I am very pleased to have finally received the Raspberry Pi 3 that I ordered the other day. I also got a Sense Hat – an add-on board for Raspberry Pi, made especially for the Astro Pi mission

The Sense HAT has an 8×8 RGB LED matrix, a five-button joystick and includes the following sensors:

  • Gyroscope
  • Accelerometer
  • Magnetometer
  • Temperature
  • Barometric pressure
  • Humidity

There is even a  Python library providing easy access to everything on the board. I can’t wait to start using it with some of the APIs available at Bluemix for example. Any ideas are more than welcome.

20160730_RaspberryPi