How to Research a Machine Learning Algorithm – Reblog

How to Research a Machine Learning Algorithm

// A reblog from Machine Learning Mastery

Algorithms are a big part of the field of machine learning.

You need to understand what algorithms are out there, and how to use them effectively.

An easy way to shortcut this knowledge is to review what is already known about an algorithm, to research it.

In this post you will discover the importance of researching machine learning algorithms and the 5 different sources that you can use to accelerate your understanding of machine learning algorithms.

Research Machine Learning Algorithms
Photo by Anders Sandberg, some rights reserved

Why Research Machine Learning Algorithms

You need to understand algorithms to master machine learning.

Machine learning algorithms are not like other algorithms that you may be familiar with like sorting algorithms.

Not only are machine learning algorithms data-dependent, but they are adaptive. Often the heart of a given machine learning algorithm is an optimization process that is stochastic, meaning it has elements of randomness. As such, this makes machine learning algorithms more difficult to analyze and to make hard judgements about best and worst performance.

You need to apply, implement or think deeply about algorithms to understand them.

You can describe how an algorithm works as a mathematical recipe, but to understand it’s behaviours in practice, you must study it in action. You can do this my experimenting on an algorithm, applying it to a lot of problems and distilling out how it behaves and how to expose and exploit these behaviors in the face of different problem types.

Alternatively, the shortcut that you can take is to dive into what other people have understood about the algorithm before you.

You need background on the algorithms which only comes from researching them.

5 Sources To Use When Researching Algorithms

Researching a machine learning algorithm requires a systematic investigation of the algorithm from multiple sources.

This may sound more scary than it actually is. Your goal is to build up your own consistent understanding of different machine learning algorithms, and a consistent understanding is personal to you and will require collation of interpretation of a given algorithm from multiple sources.

Different sources can be used for different purposes, so you need to pick and choose those sources carefully and purposefully.

Start with a clear idea of why you want to research a given machine learning algorithm, and then pick those sources that can best answer the questions that you have.

There are 5 different sources that you can use in your research of a machine learning algorithm, we will review each in turn.

1. Authoritative Sources

Authoritative sources provide expert interpretations and descriptions of algorithms.

They are useful for getting up to speed on an algorithm fast as the explanations are often rigorous and somewhat standardized, at least within the material.

The descriptions can also be dense, often steeped in mathematics and focused on the theoretical side using academic language. In this way, they can be difficult to penetrate without sufficient background.

Examples of authoritative sources include:

Textbooks such as those used in graduate machine learning courses.
Lecture note and slides, such as those presented during graduate machine learning courses.
Overview papers such as those that make up academic compendiums on a topic.

2. Seminal Sources

Seminal sources are the expert sources and the original descriptions of the algorithms.

Seminal sources are good for getting inside the head of the original author or describer of a machine learning algorithm and teasing out the intent of algorithm parameters and processes.

These sources are almost always academic and theoretical and only occasionally include useful usage information.

Examples of seminal sources include:

Conference papers and journal articles.
Technical reports that might precede or supplement the original publications on the method.

3. Leading Edge Sources

Many algorithms suffer ongoing research. This may take the form of extensions, deeper investigation or even simple application and comparison of the method to other methods.

I call these sources leading edge because they expose useful new and state-of-the-art information about a machine learning algorithm.

Leading edge sources can be used to get a good idea of what problems related to an algorithm are currently being worked on. These may represent interesting or difficult sub-processes within the algorithm of which you can take note.

Often leading edge sources are dense and technical and will require much work on your behalf to interpret the intent of the work and extract salient details that help you better understand the algorithm.

Examples of leading edge sources include:

Conference papers and journal articles.
Conference talks such as plenaries and perhaps workshops.

4. Usage Heuristics Sources

Usage heuristics and best practices are probably the key type of information you are interested in when researching a machine learning algorithm for practical and applied purposes.

Usage heuristic sources provide an expert description for how to use a given machine learning algorithm in practice. They are good for practical usage advice such as parameter configurations, suggested data preparation steps and even advice on how to adapt and scale the algorithm for specific classes of problems.

Often details are missing from these sources that must be inferred or sought by directly contacting authors. Don’t expect to be able to easily reproduce the results from these sources, focus on extracting heuristics that you can use to prompt algorithm usage.

Examples of usage heuristics sources include:

Papers that describe the results from machine learning competitions, like KDD Cup and Kaggle.
“What I did” blog posts and forum posts related to machine learning competitions.
Question and Answer websites such as Cross Validated and other machine learning community sites.
Application conference papers.

5. Implementation Sources

You may be interested in researching an algorithm because you want to implement it. In addition to the other sources listed above, you should consult implementation sources.

These are sources that are prepared by experts or semi-experts that provide implementations of machine learning algorithms as examples, in libraries and tools. The samples may be released under a permissive or open source license for you to learn from.

These sources are good to get ideas on how given machine learning algorithms can be translated into an executable and usable system.

Example of implementation sources include:

Open source projects such as libraries and tools.
Posts on relevant machine learning blogs.
Technical reports prepared by graduate students or research labs.

Often, implementations on blog posts are provided for tutorial and understanding purposes and may not be written for speed or scalability. Open source algorithm implementations you find in libraries and tools are often highly optimized and are not written for readability.

Research is Not Just For Academics

You can research machine learning algorithms. Do not be scared off by the formal academic language and medium of papers and articles.

You do not need to be a PhD research nor a machine learning algorithm expert.

You can read the papers, books and algorithm implementations just as well as anyone.

Often the problem of a difficult to read paper lies with the author and not with you, the reader. It is very hard to write a good technical treatment of an algorithm or research and those good sources are gems when you find them.

Action Steps

In this post you discovered the importance of researching machine learning algorithms and 5 sources that you can use to find the information you need on machine learning algorithms.

The next step is to practice your newfound skills.

Select an algorithm that you want to research.
Consider what you want to know about the algorithm and select the sources that can best answer your questions from the list above.
Systematically research the algorithm. Start with Google Scholar and type in the algorithm name if you are looking for papers. Start with a Google search of GitHub and type in the algorithm name if you are looking for algorithm implementations.

Share what you learn.