Programming First Steps: Java or Python?

I am posting this as an extended response to the quesiton that a medic friend of mine asked me the other day. The main question was whether learning Java or Python was best for getting “a basic understanding of coding (medically related)” and “which one will serve best in the future within a remit of medical apps“.

The short answers is that it depends… and my answer is not only valid for medical applications, but in general. I believe that both are very good popular programming languages. Learning the basics of programming can be done in any programming language of your choice, including both Python and Java. If the aim is to get to grips with what programming is all about even getting to learn a bit of Scratch or Blocky could be a good start. I recommend looking as well at Swift Playgrounds. They all will let you have a look at the basics of programming and will get you started in an easy way.

In this case the query is mor nuance and the answer is also “it depends”. For example are there any other people around you who are already using a particualr programming language for their medical (or any other area of knowledge) application? For example, when doing a PhD I recommend you look at what other candidates are using for their work and stick to it. The reason for this is that if you have any questions there are people around you that may be able to support you. Also you may end up working with them and programming in the same language helps. In this case, if there are a number of medics that are already getting their hands dirty with one particular programming language I’d say go for that. And notice that it does not have to be neither Java nor Python.

In short you can’t go wrong with picking one of them to start with. Once you start, you may pick up the other with less trepidation.

If you are stuck, I would probably recommend using Python, but then again I may be biased. After all, I have written a couple of books for pythonistas, ok, ok I have also written one for Matlab. That is also an excellent language although not open source. In that case take a look at Octave… but I digress.

In the rest of the post I will look at some of the similarities and differences between Python and Java in the hope this may help you decide.

Learning curve

The learning curve for Java and Python is very diffterent. I believe that Python is a much easier language to get started. However, once you have picked up the basics in any of them you can contribute to production level code quite easily. Both languages are object oriented and depending on your level of knowledge you may be able to read through a program an figure out what it is doing.

The learning curve for anything depends on what you already know, how interested you are in learning the topic, and the learning environment. For example, if you have already done some type of coding or scripting, even if it is pasting some JavaScript into a web page, you may be familiar with the code structure you will run into with a language like Java. Here is an example of Java code. Let us look at a Hello World program in Java:

class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!"); 
    }
}

What about Python? Well take a look:

print("Hello, world!")

Readability is part of the philosophy of writing Python code and we can see that in the example above. In this way, if you have never programmed before, Python tends to be easier to read.

Syntax

Syntax refers to the rules that we need to follow to write correct “sentences” in the language. Java’s syntax requires a bit more effort than Python’s. Let us take a look. The following program calculates the averge of some a collection of numbers in Java

public class Average {
    public static void main(String args[]){
       int i,total;
       int a[] = {0,6,9,2,7};
       int n = 5;
       total = 0;
 ​
       for(i=0; i<n; i++) {
          total += a[i];
      }
       System.out.println("Average :"+ total/(float)n);
    }
 
  • Curly braces define the blocks of code.
  • Each statement must end in a semicolon (;)
  • Each time you create a new variable, it must have a type. When we instantiate the i and total objects we define them as int and later on casted the value of n as a float to be able to obtain a decimal number
  • Formatting and spacing is not important. Although the code above looks nice, the programme will run even if all of it were in a single line (don’t do that…)(
  • You will also notice how verbose the code is. You will usually end up typing more writing Java code than you would with Python code.

In Python we can calculate the average with something like this:

def average(a):
   avg = sum(a)/len(a)
   return avg
   
 a = [0,6,9,2,7]
 avg = average(a)
 print("Average : {0}".format(avg))
  • Line breaks and indentation define blocks of code in Python. There are no extra symbols like semicolons at the end of a line
  • Python uses a colon to start classes, methods, and loops. You see that in the definition of average
  • Whitespace is important to Python. Developers use it to define blocks of code, so the lines in the code above could not run on one line.

Executing code

A big difference between Java and Python is how both languages execute code. Java is a compiled language and this means that the code needs to be “translated” so that the machine can run it. Python is an interpreted language and this means that the code is executed line by line without the need for compilation.

If you are interested in performance, the distinction above means that Python could be a bit slower that Java, but I think that for the type of programming you may start with this does not matter all that much.

In a nutshell

If you are interested in learning more about programming, either of them should be able to get you started. There may be a number of pros and cones to each languaage and I would recommend you ask colleagues what they are using for the type os applications you are interested in. All in all it does not matter which one you chose. My recommendation is to stick with your choice and in no time you will pick up the nunances and idiosyncrasies of the language.

Natural Language Processing Talk – Newspaper Article

With the lockdown and social distancing rules forcing all of us to adjust our calendars, events and even lesson plans and lectures, I was not surprised to hear of speaking opportunities that otherwise may not arise.

A great example is the reprise of a talk I gave about a year ago while visiting Mexico. It was a great opportunity to talk to Social Science students at the Political Science Faculty of the Universidad Autónoma del Estado de México. The subject was open but had to cover the use of technology and I thought that talking about the use of natural language processing in terms of digital humanities would be a winner. And it was…

In March this year I was approached by the Faculty to re-run the talk but this time instead of doing it face to face we would use a teleconference room. Not only was I, the speaker, talking from the comfort of my own living room, but also all the attendees would be at home. Furthermore, some of the students may not have access to the live presentation (lack of broadband, equipment, etc) and recoding the session for later usage was the best option for them.

I didn’t hesitate in saying yes, and I enjoyed the interaction a lot. Today I learnt that the session was the focus of a small note in a local newspaper. The session was run in Spanish and the note in Portal, the local newspaper, is in Spanish too. I really liked that they picked a line I used in the session to convince the students that technology is not just for the natural sciences:

“Hay que hacer ciencias sociales con técnicas del Siglo XIX… El mundo es de los geeks.

“We should study social sciences applying techniques of the 21st Century. The world today belongs to us, the geeks.

The point is that although qualitative and quantitative techniques are widely used in social science, the use of new platforms and even programming languages such as python open up opportunities for social scientists too.

The talk is available in the blog the class uses to share their discussions: The Share Knowledge Network – Follow this link for the talk.

The newspaper article by Ximena Barragán can be found here.

Computer Programming Knowledge

I came across the image above in the Slack channel of the University of Hertfordshire Centre for Astrophysics Research. It summarises some of the fundamental knowledge in computer science that was assumed necessary at some point in time: Binar, CPU execution and algorithms.

They refer to 7 algorithms, but actually rather than actual algorithms they are classes:

  1. Sort
  2. Search
  3. Hashing
  4. Dynamic Programming
  5. Binary Exponentiation
  6. String Matching and Parsing
  7. Primality Testing

I like the periodic table shown at the bottom of the graphic. Showing some old friends such as Fortran, C, Basic and Cobol. Some other that are probably not used all that much, and others that have definitely been rising: Javascript, Java, C++, Lisp. It is great to se Python, number 35, listed as Multi-Paradigm!

Enjoy!

Structured Documents in LaTeX

This is a video I made a few years ago to encourage my students to use better tools to write dissertations, thesis and reports that include the use of mathematics. The principles stand, although the tools may have moved on since then. I am reposting them as requested by a colleague of mine, Dr Catarina Carvalho, who I hope will still find this useful.

In this video we continue explaining how to use LaTeX. Here we will see how to use a master document in order to build a thesis or dissertation.
We assume that you have already had a look at the tutorial entitled: LaTeX for writing mathematics – An introduction

Structured Documents in LaTeX

LaTeX for writing mathematics – An introduction

This is a video I made a few years ago to encourage my students to use better tools to write dissertations, thesis and reports that include the use of mathematics. The principles stand, although the tools may have moved on since then. I am reposting them as requested by a colleague of mine, Dr Catarina Carvalho, who I hope will still find this useful.

In this video we explore the LaTeX document preparation system. We start with a explaining an example document. We have made use of TeXmaker as an editor given its flexibility and the fact that it is available for different platforms.

LaTeX for writing mathematics – An introduction

Pandas 1.0 is out

If you are interested in #DataScience you surely have heard of #pandas and you would be pleased to hear that version 1.0 finally out. With better integration with bumpy and improvements with numba among others. Take a look!
— Read on www.anaconda.com/pandas-1-0-is-here/

Natural Language Processing – Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

Python – Pendulum

Working with dates and times in programming can be a painful test at times. In Python, there are some excellent libraries that help with all the pain, and recently I became aware of Pendulum. It is effectively are replacement for the standard datetime class and it has a number of improvements. Check out the documentation for further information.

Installation of the packages is straightforward with pip:

$ pip install pendulum

For example, some simple manipulations involving time zones:

import pendulum

now = pendulum.now('Europe/Paris')

# Changing timezone
now.in_timezone('America/Toronto')

# Default support for common datetime formats
now.to_iso8601_string()

# Shifting
now.add(days=2)

Duration can be used as a replacement for the standard timedelta class:

dur = pendulum.duration(days=15)

# More properties
dur.weeks
dur.hours

# Handy methods
dur.in_hours()
360
dur.in_words(locale='en_us')
'2 weeks 1 day'

It also supports the definition of a period, i.e. a duration that is aware of the DateTime instances that created it. For example:

dt1 = pendulum.now()
dt2 = dt1.add(days=3)

# A period is the difference between 2 instances
period = dt2 - dt1

period.in_weekdays()
period.in_weekend_days()

# A period is iterable
for dt in period:
    print(dt)


Give it a go, and let me know what you think of it. 

File Encoding with the Command Line – Determining and Converting

With the changes that Python 3 has brought to bear in terms of dealing with character encodings, I have written before some tips that I use on my day to day work. It is sometimes useful to determine the character encoding of a files at a much earlier stage. The command line is a perfect tool to help us with these issues. 

The basic syntax you need is the following one:

$ file -I filename

Furthermore, you can even use the command line to convert the encoding of a file into another one. The syntax is as follows:

$ iconv -f encoding_source -t encoding_target filename

For instance if you needed to convert an ISO88592 file called input.txt into UTF8 you can use the following line:

$ iconv -f iso-8859-1 -t utf-8 < input.txt > output.txt

If you want to check a list of know coded characters that you can handle with this command simply type:

$ iconv --list

Et voilà!

 

IEEE Language Rankings 2018

Python retains its top spot in the fifth annual IEEE Spectrum top programming language rankings, and also gains a designation as an “embedded language”. Data science language R remains the only domain-specific slot in the top 10 (where it as listed as an “enterprise language”) and drops one place compared to its 2017 ranking to take the #7 spot.

Looking at other data-oriented languages, Matlab as at #11 (up 3 places), SQL is at #24 (down 1), Julia at #32 (down 1) and SAS at #40 (down 3). Click the screenshot below for an interactive version of the chart where you can also explore the top 50 rankings.

Language Rank

The IEEE Spectrum rankings are based on search, social media, and job listing trends, GitHub repositories, and mentions in journal articles. You can find details on the ranking methodology here, and discussion of the trends behind the 2018 rankings at the link below.

IEEE Spectrum: The 2018 Top Programming Languages