Programming First Steps: Java or Python?

I am posting this as an extended response to the quesiton that a medic friend of mine asked me the other day. The main question was whether learning Java or Python was best for getting “a basic understanding of coding (medically related)” and “which one will serve best in the future within a remit of medical apps“.

The short answers is that it depends… and my answer is not only valid for medical applications, but in general. I believe that both are very good popular programming languages. Learning the basics of programming can be done in any programming language of your choice, including both Python and Java. If the aim is to get to grips with what programming is all about even getting to learn a bit of Scratch or Blocky could be a good start. I recommend looking as well at Swift Playgrounds. They all will let you have a look at the basics of programming and will get you started in an easy way.

In this case the query is mor nuance and the answer is also “it depends”. For example are there any other people around you who are already using a particualr programming language for their medical (or any other area of knowledge) application? For example, when doing a PhD I recommend you look at what other candidates are using for their work and stick to it. The reason for this is that if you have any questions there are people around you that may be able to support you. Also you may end up working with them and programming in the same language helps. In this case, if there are a number of medics that are already getting their hands dirty with one particular programming language I’d say go for that. And notice that it does not have to be neither Java nor Python.

In short you can’t go wrong with picking one of them to start with. Once you start, you may pick up the other with less trepidation.

If you are stuck, I would probably recommend using Python, but then again I may be biased. After all, I have written a couple of books for pythonistas, ok, ok I have also written one for Matlab. That is also an excellent language although not open source. In that case take a look at Octave… but I digress.

In the rest of the post I will look at some of the similarities and differences between Python and Java in the hope this may help you decide.

Learning curve

The learning curve for Java and Python is very diffterent. I believe that Python is a much easier language to get started. However, once you have picked up the basics in any of them you can contribute to production level code quite easily. Both languages are object oriented and depending on your level of knowledge you may be able to read through a program an figure out what it is doing.

The learning curve for anything depends on what you already know, how interested you are in learning the topic, and the learning environment. For example, if you have already done some type of coding or scripting, even if it is pasting some JavaScript into a web page, you may be familiar with the code structure you will run into with a language like Java. Here is an example of Java code. Let us look at a Hello World program in Java:

class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!"); 

What about Python? Well take a look:

print("Hello, world!")

Readability is part of the philosophy of writing Python code and we can see that in the example above. In this way, if you have never programmed before, Python tends to be easier to read.


Syntax refers to the rules that we need to follow to write correct “sentences” in the language. Java’s syntax requires a bit more effort than Python’s. Let us take a look. The following program calculates the averge of some a collection of numbers in Java

public class Average {
    public static void main(String args[]){
       int i,total;
       int a[] = {0,6,9,2,7};
       int n = 5;
       total = 0;
       for(i=0; i<n; i++) {
          total += a[i];
       System.out.println("Average :"+ total/(float)n);
  • Curly braces define the blocks of code.
  • Each statement must end in a semicolon (;)
  • Each time you create a new variable, it must have a type. When we instantiate the i and total objects we define them as int and later on casted the value of n as a float to be able to obtain a decimal number
  • Formatting and spacing is not important. Although the code above looks nice, the programme will run even if all of it were in a single line (don’t do that…)(
  • You will also notice how verbose the code is. You will usually end up typing more writing Java code than you would with Python code.

In Python we can calculate the average with something like this:

def average(a):
   avg = sum(a)/len(a)
   return avg
 a = [0,6,9,2,7]
 avg = average(a)
 print("Average : {0}".format(avg))
  • Line breaks and indentation define blocks of code in Python. There are no extra symbols like semicolons at the end of a line
  • Python uses a colon to start classes, methods, and loops. You see that in the definition of average
  • Whitespace is important to Python. Developers use it to define blocks of code, so the lines in the code above could not run on one line.

Executing code

A big difference between Java and Python is how both languages execute code. Java is a compiled language and this means that the code needs to be “translated” so that the machine can run it. Python is an interpreted language and this means that the code is executed line by line without the need for compilation.

If you are interested in performance, the distinction above means that Python could be a bit slower that Java, but I think that for the type of programming you may start with this does not matter all that much.

In a nutshell

If you are interested in learning more about programming, either of them should be able to get you started. There may be a number of pros and cones to each languaage and I would recommend you ask colleagues what they are using for the type os applications you are interested in. All in all it does not matter which one you chose. My recommendation is to stick with your choice and in no time you will pick up the nunances and idiosyncrasies of the language.

Natural Language Processing Talk – Newspaper Article

With the lockdown and social distancing rules forcing all of us to adjust our calendars, events and even lesson plans and lectures, I was not surprised to hear of speaking opportunities that otherwise may not arise.

A great example is the reprise of a talk I gave about a year ago while visiting Mexico. It was a great opportunity to talk to Social Science students at the Political Science Faculty of the Universidad Autónoma del Estado de México. The subject was open but had to cover the use of technology and I thought that talking about the use of natural language processing in terms of digital humanities would be a winner. And it was…

In March this year I was approached by the Faculty to re-run the talk but this time instead of doing it face to face we would use a teleconference room. Not only was I, the speaker, talking from the comfort of my own living room, but also all the attendees would be at home. Furthermore, some of the students may not have access to the live presentation (lack of broadband, equipment, etc) and recoding the session for later usage was the best option for them.

I didn’t hesitate in saying yes, and I enjoyed the interaction a lot. Today I learnt that the session was the focus of a small note in a local newspaper. The session was run in Spanish and the note in Portal, the local newspaper, is in Spanish too. I really liked that they picked a line I used in the session to convince the students that technology is not just for the natural sciences:

“Hay que hacer ciencias sociales con técnicas del Siglo XIX… El mundo es de los geeks.

“We should study social sciences applying techniques of the 21st Century. The world today belongs to us, the geeks.

The point is that although qualitative and quantitative techniques are widely used in social science, the use of new platforms and even programming languages such as python open up opportunities for social scientists too.

The talk is available in the blog the class uses to share their discussions: The Share Knowledge Network – Follow this link for the talk.

The newspaper article by Ximena Barragán can be found here.

Cover Draft for “Advanced Data Science and Analytics with Python”

I have received the latest information about the status of my book “Advanced Data Science and Analytics with Python”. This time reviewing the latest cover drafts for the book.

This is currently my favourite one.

Awaiting the proofreading comments, and I hope to update you about that soon.

Pandas 1.0 is out

If you are interested in #DataScience you surely have heard of #pandas and you would be pleased to hear that version 1.0 finally out. With better integration with bumpy and improvements with numba among others. Take a look!
— Read on

Advanced Data Science and Analytics with Python – Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.

The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

Natural Language Processing – Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

Adding new conda environment kernel to Jupyter and nteract

I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.

First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:

$ conda create -n tensorflow_env tensorflow pillow keras pandas jupyter ipykernel nb_conda

Now, to add this to the list of available environments in either Jupyter or nteract, we type the following:

$ conda activate tensor_env

$ python -m ipykernel install --name tensorflow_env

$ conda deactivate

Et voilà, you should now see the environment in the dropdown menu!

Data Science and Analytics with Python – Social Network Analysis

Using the time wisely during the Bank Holiday weekend. As my dad would say, “resting while making bricks”… Currently reviewing/editing/correcting Chapter 3 of “Advanced Data Science and Analytics with Python”. Yes, that is volume 2 of “Data Science and Analytics with Python“.


Python – Pendulum

Working with dates and times in programming can be a painful test at times. In Python, there are some excellent libraries that help with all the pain, and recently I became aware of Pendulum. It is effectively are replacement for the standard datetime class and it has a number of improvements. Check out the documentation for further information.

Installation of the packages is straightforward with pip:

$ pip install pendulum

For example, some simple manipulations involving time zones:

import pendulum

now ='Europe/Paris')

# Changing timezone

# Default support for common datetime formats

# Shifting

Duration can be used as a replacement for the standard timedelta class:

dur = pendulum.duration(days=15)

# More properties

# Handy methods
'2 weeks 1 day'

It also supports the definition of a period, i.e. a duration that is aware of the DateTime instances that created it. For example:

dt1 =
dt2 = dt1.add(days=3)

# A period is the difference between 2 instances
period = dt2 - dt1


# A period is iterable
for dt in period:

Give it a go, and let me know what you think of it.