Skip to content
Data Science and Machine Learning

Books

Data Science and Analytics with PythonCRC Press, Taylor & Francis Group, ISBN: 9781138043176 (2017)

Essential Matlab and Octave, CRC Press, Taylor & Francis Group, ISBN: 9781482234633 (2014)

Advanced Data Science and Analytics with Python CRC Press, Taylor & Francis Group, ISBN: 978-0429446610 (2020)

Statistics and Data Visualisation with Python. CRC Press, Taylor & Francis, ISBN: 9780367744519 (2022)

From Tokens to Transformers

Rethinking NLP in the Second Edition of Advanced Data Science and Analytics with Python

When I first wrote Advanced Data Science and Analytics with Python, natural language processing (NLP) occupied a niche corner of the data science landscape. Back then, much of the focus in Python revolved around parsing and vectorising text: extracting tokens, counting frequencies, maybe applying a topic model or two. Fast forward a few years, and NLP has become one of the engines driving modern AI, powering everything from search and recommendation to summarisation and chat interfaces.

That shift is at the heart of Chapter 2 in the second edition, where “Speaking Naturally” has been thoroughly reimagined for today’s ecosystem. Instead of stopping at token counts and bag-of-words, this chapter bridges the gap between traditional text processing and the language-rich representations that underlie contemporary AI systems.

From Soup to Semantics

We start where most real text projects begin, with acquisition and cleaning. Python’s Beautiful Soup still plays a starring role for scraping structured text off the web, but the focus now goes beyond parsing tags to extracting meaningfulcontent. Regular expressions, Unicode normalisation and tokenisation are introduced not as academic subjects but as practical tools you’ll reach for every time you ingest text.

Finding Structure in Language

Once you have clean text, the chapter furthers your intuition with topic modelling, an unsupervised way of surfacing latent themes across documents. These techniques remain valuable for exploration, summarisation and even automated labelling in the absence of annotated training data.

Encoding Meaning: Beyond Frequency Counts

The real leap comes with representation learning. Rather than relying on sparse counts, modern NLP encodes text as dense vectors that capture contextual meaning. Word embeddings — and their contextual successors — turn raw text into numbers that machine learning models can reason about. This edition makes that leap accessible, showing how to generate, visualise and use these representations in Python.

Semantic Search with Vector Engines

Building on embeddings, we explore vector similarity search — the backbone of semantic retrieval. Using tools like FAISS, you’ll learn how to retrieve text not based on matching keywords but on meaning, opening the door to advanced search, clustering and recommendation applications.

The NLP landscape has moved faster than almost any other area of AI. Transformers, contextual language models and embedding systems have shifted what’s possible — and what’s practical — for practitioners. This chapter is carefully redesigned to reflect that evolution, giving you the grounding you need to work with text data that isn’t just cleaned and counted, but understood.

More soon. Stay tuned.

Click here to read more...

Forecasting the Future: Time Series, Prophets, and Cross-Validation

When I wrote about the Jackalope’s return and the second edition of Advanced Data Science and Analytics with Python, I hinted that this wasn’t just a light refresh. It’s a proper evolution. New chapters, new tools, and, perhaps most importantly, a stronger emphasis on how we trust the models we build.

One of the chapters I’ve been spending time with recently dives head-first into forecasting. Not the hand-wavy, crystal-ball-gazing sort (sadly no actual precogs were harmed in the process), but practical, defensible forecasting that you can deploy without fear of your future self cursing your name.

Enter the Prophet

Yes, that Prophet.

Facebook’s (now Meta’s) Prophet framework gets its own dedicated treatment. Not because it’s fashionable, but because it occupies a genuinely interesting space: expressive enough to handle real-world seasonality, trends, and holiday effects, yet accessible enough that you don’t need to disappear into a cave with nothing but state-space equations and a beard.

The chapter walks through:

  • How Prophet decomposes time series into trend, seasonality, and effects you can actually explain to stakeholders
  • When it works beautifully; and when it really, really doesn’t
  • Why it’s often a strong baseline, even if you later graduate to more exotic architectures

Think of Prophet as the Millennium Falcon of forecasting: not the newest ship in the galaxy, occasionally held together with duct tape, but astonishingly reliable in the right hands.

The Bit Everyone Skips (and Shouldn’t)

Forecasting models are easy to build. Evaluating them properly is where things usually fall apart. So this chapter leans hard into time series cross-validation and forecast evaluation. No random shuffling. No accidental peeking into the future. No Schrödinger’s test set.

We cover:

  • Rolling and expanding windows (and why they matter)
  • Forecast horizons and why “one-step ahead” tells only half the story
  • Metrics that actually align with decision-making, not just leaderboard vanity

If you’ve ever had a model that looked flawless in development and then collapsed in production like a soufflé near a subwoofer, this section is for you.

In applied data science, forecasting sits at an awkward crossroads. It’s everywhere — demand planning, operations, finance, healthcare, energy — and yet it’s often treated as a dark art or an afterthought.

This chapter is about demystifying that space. About treating time seriously (literally), respecting causality, and building forecasts you can defend in a meeting without resorting to interpretive dance or “the model felt confident”.

This is just one chapter. Over the coming weeks, I’ll be writing about other additions and revisions in the second edition — from modern modelling techniques to deployment considerations, and a few opinionated takes on where data science education often goes wrong.

If this chapter is about seeing the future, the rest of the book is about making sure you survive it — preferably with clean code, reproducible results, and fewer existential crises.

More soon. 🛸📈

Click here to read more...

The Book Has Landed – 2nd Edition in Hand!

There are few moments as satisfying in a writer’s life as the thud of a package on the doorstep—especially when that package contains the culmination of years of work, revision, and relentless debugging. I’m thrilled to share that I’ve received the author copies of the second edition of Data Science and Analytics with Python, and I must say: it feels real now.

This new edition has been a labour of love (and no small amount of Pandas wrangling). The first edition found its way into universities, companies, and home offices around the world, and the feedback from readers has been both humbling and inspiring. With this edition, I wanted to keep that practical, hands-on spirit alive while refreshing the content to reflect the evolving landscape of data science.

So, what’s new?

  • Modernised Examples: Data science doesn’t stand still, and neither should code examples. I’ve updated them with newer libraries and better practices, with plenty of Jupyter-friendly walkthroughs.
  • Expanded Topics: Adding information on Generative AI, and adding more on NLP, machine learning workflows, and practical data strategy—a response to what many of you have asked for.
  • Streamlined Explanations: Concepts like dimensionality reduction, pipelines, and deployment are introduced more intuitively, with real-world context in mind.

And yes, the cover still holds up on a coffee table (or a desk strewn with notebooks and cables, if you’re anything like me).

What’s next? I have now turned my attention to reviewing the companion volume, “Advanced Data Science and Analytics with Python”, which will follow the same ethos: approachable, grounded, and deeply practical.

In the meantime, if you get a copy of the new edition, let me know what you think. Tag me, send a photo, or just share your favourite chapter. I’m always curious to see where the book travels and how it helps you shape your own data journey.

Onward!

Click here to read more...

Advanced Data Science and Analytics - 2nd Edition in the making

Well, it is happening! At the beginning of the year I announced in this blog that the revised second edition of my book "Data Science and Analytics with Python" had been delivered to my publisher and it was off to print. It is now available for pre-order from May 7th. Check it out here.

https://jrogel.com/the-2nd-edition-of-data-science-and-analytics-with-python-is-off-to-print/

At the time, I was also in discussions with my publisher about revising the companion volume "Advanced Data Science and Analytics with Python" with the idea not only to keep the two volumes updated in parallel, but more importantly to provide context to a lot of the recent growth in the area of large language models and generative AI. As such, back in March I announced that the second edition to the second volume was approved.

https://jrogel.com/advanced-data-science-and-analytics-with-python-2nd-ed-announced/

I have now been slowly but surely working on the revisions and, boy! there is a lot to do. I will keep you posted with the revisions.

Click here to read more...

“Advanced Data Science and Analytics with Python” - 2nd Ed. Announced

I’d like to let you know that I’ve officially started work on the 2nd edition of my book “Advanced Data Science and Analytics with Python”!

This updated edition will dive deeper into some of the most exciting developments in our field—particularly Large Language Models (LLMs) and Generative AI, alongside some advanced principles of advanced analytics, machine learning, and Python-based workflows.

If you’re curious about what the first edition covered, you can check it out here. You can get more info about this and my other books here.

Looking forward to sharing more as the update progresses—suggestions and ideas are always welcome!

Click here to read more...

The 2nd Edition of Data Science and Analytics with Python is Off to Print

It’s Done! The 2nd Edition of Data Science and Analytics with Python is Off to Print! 📚🐍💻

After months of reviewing, refining, and updating, I’m thrilled to share that the final version of the 2nd Edition of Data Science and Analytics with Python has been officially submitted! 🎉

This edition is packed with fresh insights, updated techniques, and even more hands-on examples to help data enthusiasts, analysts, and engineers make the most of Python’s data ecosystem. From foundational concepts to advanced analytics, I’ve worked hard to ensure this book remains a go-to resource for anyone navigating the world of data science.

Next stop? The printing press! 🚀 Looking forward to seeing it in readers’ hands soon. Stay tuned for release details!

Click here to read more...

The 2nd Edition of “Data Science and Analytics with Python” is Almost Here!

I’m excited to share a major milestone in my journey as an author and educator—I’ve officially completed proofreading the 2nd Edition of Data Science and Analytics with Python!

This updated edition has been a labour of love, carefully revised to incorporate the latest advancements in Python and data science techniques. From new libraries and tools to modern best practices, I’ve worked to ensure this book serves as a comprehensive resource for anyone looking to harness the power of Python in their data-driven journey.

What’s Next?

While proofreading is now complete, I’m taking a final end-to-end pass to ensure the content flows seamlessly and meets the high standards I’ve set for this book. Once this final revision is done, the manuscript will be submitted to the publisher for production.

When Can You Expect It?

The 2nd Edition is expected to hit the shelves in April 2025. Whether you’re a student, a professional, or simply a data enthusiast, I believe this edition will be an invaluable guide in mastering Python for data science and analytics.

Stay tuned for updates as we get closer to the release date!

Let’s continue exploring, learning, and pushing the boundaries of what’s possible with data science and Python.

Click here to read more...

Data Science and Analytics with Python - 2ed Proofs

The Journey Continues: Reviewing Proofs for the 2nd Edition of My Book "Data Science and Analytics with Python

January is shaping up to be an exciting and pivotal month in my writing journey as I dive into the proofs for the 2nd edition of Data Science and Analytics with Python. It’s a moment of both pride and meticulous focus—seeing months of effort take the final shape before reaching readers later this year.

Why a 2nd Edition?

The world of data science is evolving at an astonishing pace. Since the first edition was published, we’ve seen new tools, libraries, and techniques redefine what’s possible. The 2nd edition captures these advancements, with updates ranging from expanded coverage of machine learning techniques to improved discussions including Generative AI. Whether you’re just starting your data science journey or looking to refine your skills, this edition is designed to resonate with today’s data-driven professionals.

The Proofing Process

Reviewing proofs is both rewarding and nerve-wracking. It’s the final opportunity to fine-tune details, spot errors, and ensure clarity in every explanation and example. As someone who values precision in technical writing, I approach this phase with a blend of excitement and diligence. By the end of the month, the proofs should be finalised, marking another significant milestone on the path to publication.

A Glimpse Ahead: Advanced Data Science

Once the proofs for Data Science and Analytics with Python are complete, my focus will shift to the companion volume, Advanced Data Science and Analytics with Python. The 2nd edition of this book promises to take readers even deeper, exploring cutting-edge techniques and real-world applications. I’m particularly excited about updating sections on deep learning, reinforcement learning, and scaling data science pipelines—topics that are more relevant than ever.

What’s Next?

Both books represent my commitment to making data science accessible, practical, and inspiring. They’ve always been about more than just the code—they’re about fostering a mindset of exploration, creativity, and problem-solving.

As I work through the next stages, I look forward to sharing updates, sneak peeks, and reflections. Whether you’re an existing reader or considering delving into these books for the first time, I hope they’ll equip you to navigate the dynamic landscape of data science with confidence and curiosity.

Stay tuned for more updates, and thank you for being part of this journey!

Click here to read more...

Statistics and Data Visualisation in Python - Translation to Chinese

I am ecstatic to hear that my book "Statistics and Data Visualisation with Python" will be translated into Chinese with an estimated timeline of 24 months. Thanks to Taylor & Francis Group and in particular to Randi Slack for their support.

For more information about the book look at the page here, and for my other books check the following links.

Click here to read more...

Statistics and Data Visualisation with Python - Video

In order to be on the same footing with my other books, I have prepared a video introducing “Statistics and Data Visualisation with Python”.

https://vimeo.com/810525861

The book provides an introduction to statistical concepts and is intended to serve as a bridge in statistics for graduates and business practitioners interested in using their skills in the area of data science and analytics as well as statistical analysis in general.

Statistics and Data Visualisation with Python aims to build statistical knowledge from the ground up by enabling the reader to understand the ideas behind inferential statistics and begin to formulate hypotheses that form the foundations for the applications and algorithms in statistical analysis, business analytics, machine learning, and applied machine learning.

You can get the book here.

Learn more about my books here: https://jrogel.com/books/

You can see all the videos in the following Vimeo showcase:

https://vimeo.com/showcase/10283513

The other posts for book videos are here:

Click here to read more...

Statistics and Data Visualisation with Python - Published

Yay! The wait is over… my “Statistics and Data Visualisation with Python” book is finally a real object in the world. Here it is, in my hands! Looks a bit thicker than my other books

I hope readers enjoy the StarTrek, Battlestar Galactica, and other sci-fi references. I have been asked about the figure in the cover. Well, it is a figure from one of the chapters! I guess you will have to take a look at the book to find out more. Also the Jackalope has featured in my previous books. 

Get in touch - would love to hear what you think of the book.

UPDATE

And here is the hardback version!

Click here to read more...

Stats and Data Viz - Book Cover

I am very excited to see the approved cover of my most recent book "Statistics and Data Visualisation with Python". The use of a Jackalope continues to be a theme and in this case it it a figure that is used in Chapter 8 where we explore the creation of a number of useful charts with various Python modules.

The book is set to be published by December 14th and it has already started being listed at my publisher's website, marking the book as "forthcoming".

I look forward to the actual publication and tell you all about the book and its contents.

You can take a look at some info about the book here.

Click here to read more...

Statistics and Data Visualisation with Python - Final Draft

I am very pleased to announce that the final draft of my book "Statistics and Data Science with Python" has been completed. It has been a pleasure to write and use examples referencing some of my favourite SciFi characters, from StarTrek to Battle Star Galactica and more.

The book covers builds from the ground up the basis for statistical analysis underpinning a number of applications and algorithms in business analytics, machine learning and applied machine learning. The book starts with the basics of programming in python as well as data analysis to build a solid background in statistical methods and hypothesis testing useful in a variety of modern applications. 

Table of Contents

  1. Data, Stats and Stories - An Introduction
  2. Python Programming Primer
  3. Snakes, Bears & Other Numerical Beasts: NumPy, SciPy and Pandas
  4. The measure of all things - Statistics
  5. Definitely Maybe: Probability and Distributions
  6. Alluring Arguments and Ugly Facts - Statistical Modelling and Hypothesis Testing
  7. Delightful Details - Data Visualisation
  8. Dazzling Data Designs - Creating Charts

I will let you know how the revisions go and I hope it will be available soon!

Click here to read more...

Data Science and Analytics with Python - Now in Chinese

I received a notification that my "Data Science and Analytics with Python" book is now available in Chinese. Great news and 谢谢!

You can take a look at the English versions (here and here)

The Chinese version is available here.

Click here to read more...

Using MATLAB for Data Science and Machine Learning

Python and R are not the only programming languages used in data science or machine learning applications. In a recent post in the Domino Data Blog I argue about the usefulness of MATLAB.

Check the post here.

While you are doing that also check my "Essential Matlab and Octave" book.

Click here to read more...

Getting Data with Beautiful Soup

Always wanted to get some data from the web in a programmatic way? Well, check out my recent post in the Domino Data Blog where I discuss how to get data with the help of Beautiful Soup.

The aim is to show how we can create a script that grabs the pages we are interested in and obtain the information we are after. In the post I cover ho to complete the these steps:

  1. Identify the webpage with the information we need
  2. Download the source code
  3. Identify the elements of the page that hold the information we need
  4. Extract and clean the information
  5. Format and save the data for further analysis

Click here to read more...

Data Exploration with Pandas Profiler and D-Tale

Are you interested in exploring data using Python? If so, take a look at my this blog post of mine…  where I talk about using Pandas Profiler and D-Tale to carry out data exploration.

Helpful steps to:

  • Detect erroneous data.
  • Determine how much missing data there is.
  • Understand the structure of the data.
  • Identify important variables in the data.
  • Sense-check the validity of the data.

I use the The Mammographic Mass Data Set from the UCI Machine Learning Repository. Information about this dataset can be obtained here.

Read the full blog post in the Domino Data Blog here.

Click here to read more...

Statistics and Data Visualisation with Python - First Chapter Done

As you know I am writing a new book. This time it is a book about statistics and data visualisation using Python as the main language to analyse data. It was thinking that I was a bit behind with my plan for the book, but I managed to surprise myself by being bang on time completing the first chapter.

This is the introductory chapter where we cover some background on the importance of statistics, a bit of history and the personalities behind some concepts widely used in stats and data visualisation. We then cover some background in formulating questions to be answered with data and how to communicate our results.

On to the next chapter! 🐍📊📖

Click here to read more...

Advanced Data Science and Analytics with Python - Video

Hello again this is a video I recorded for my publisher about my book “Advanced Data Science and Analytics with Python”. This is a video I made for my publisher about my book “Data Science and Analytics with Python”. You can get the book here and more about the book here.

https://vimeo.com/516105510

This companion to "Data Science and Analytics with Python" is the result of arguments with myself about writing something to cover a few of the areas that were not included in that first volume, largely due to space/time constraints. Like the previous book, this one exists thanks to the discussions, stand-ups, brainstorms and eventual implementations of algorithms and data science projects carried out with many colleagues and friends.

As the title suggests, this book continues to use Python as a tool to train, test and implement machine learning models and algorithms. The book is aimed at data scientists who would like to continue developing their skills and apply them in business and academic settings.

The subjects discussed in this book are complementary and a follow-up to the ones covered in Volume 1. The intended audience for this book is still composed of data analysts and early-career data scientists with some experience in programming and with a background in statistical modelling. In this case, however, the expectation is that they have already covered some areas of machine learning and data analytics. The subjects discussed in this book are complementary and a follow-up to the topics discussed in "Data Science and Analytics with Python". Although there are some references to the previous book, this volume is written to be read independently.

I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. The aim is still to focus on showing the concepts and ideas behind popular algorithms and their use.

In summary, "Advanced Data Science and Analytics with Python" presents each of the topics addressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The material covered includes machine learning and pattern recognition algorithms including: Time series analysis, natural language processing, topic modelling, social network analysis, neural networks and deep learning. The book discusses the need to develop data products and addresses the subject of bringing models to their intended audiences – in this case, literally to the users’ fingertips in the form of an iPhone app.

I hope you enjoy it and if you want to know more about my other books, please check the related videos here:

Click here to read more...

Data Science and Analytics with Python - Video

This is a video I made for my publisher about my book “Data Science and Analytics with Python”. You can get the book here and more about the book here.

https://vimeo.com/512245277

The book provides an introduction to some of the most used algorithms in data science and analytics. This book is the result of very interesting discussions, debates and dialogues with a large number of people at various levels of seniority, working at startups as well as long-established businesses, and in a variety of industries, from science to media to finance.

“Data Science and Analytics with Python” is intended to be a companion to data analysts and budding data scientists that have some working experience with both programming and statistical modelling, but who have not necessarily delved into the wonders of data analytics and machine learning. The book uses Python as a tool to implement and exploit some of the most common algorithms used in data science and data analytics today.

Python is a popular and versatile scripting and object-oriented language, it is easy to use and has a large active community of developers and enthusiasts, not to mention the richness oall of this helped by the versatility of the iPython/Jupyter Notebook.

In the book I address the balance between the knowledge required by a data scientist sucha as mathematics and computer science, with the need for a good business background. To tackle the prevailing image of a unicorn data scientist, I am convinced that the use of a new symbol is needed. And a silly one at that! There is an allegory I usually propose to colleagues and those that talk about the data science Unicorn. It seems to me to be a more appropriate one than the existing image: It is still another mythical creature, less common perhaps than the unicorn, but more importantly with some faint fact about its actual existence: a Jackalope. You will have to read the book to find out more!

The main purpose of the book is to present the reader with some of the main concepts used in data science and analytics using tools developed in Python such as Scikit-learn, Pandas, Numpy and others. The book is intended to be a bridge to the data science and analytics world for programmers and developers, as well as graduates in scientific areas such as mathematics, physics, computational biology and engineering, to name a few.

The material covered includes machine learning and pattern recognition, various regression techniques, classification algorithms, decision tree and hierarchical clustering, and dimensionality reduction. Though this text is not recommended for those just getting started with computer programming,

There are a number of topics that were not covered in this book. If you are interested in more advanced topics take a look at my book called “Advanced Data Science and Analytics with Python”. There is a follow up video for that one! Keep en eye out for that!

Related Content: Please take a look at other videos about my books:

Click here to read more...

Essential MATLAB and Octave - Video

This is a video I made for my publisher about my book “Essential MATLAB and Octave”. You can get the book here and more about the book here.

The book is a primer for programming in Matlab and Octave within the context of numerical simulations for a variety of applications. Matlab and Octave are powerful programming languages widely used by scientists and engineers. They provide excellent capabilities for data analysis, visualisation and more.

https://vimeo.com/509514561

The book started as lecture notes for a course on Computational Physics - later turning into a wider encompassing syllabus covering aspects of computational finance, optimisation and even biology and economics

The aim of the book is to learn and apply programming in Matlab and octave using straightforward explanations and examples from different areas in mathematics, engineering, finance, and physics.

Essential MATLAB and Octave explains how MATLAB and Octave are powerful tools applicable to a variety of problems. This text provides an introduction that reveals basic structures and syntax, demonstrates the use of functions and procedures, outlines availability in various platforms, and highlights the most important elements for both programs.

The book can be considered as a companion for programmers (new and experienced) that require the use of computers to solve numerical problems.

Code is presented in individual boxes and explanations are added as margin notes. Although both Matlab and Octave share a large number of features, they are not strictly the same. In cases where code is specific to one of the languages the margin notes provide clarity.

This text requires no prior knowledge and it is self-contained, allowing the reader to use the material whenever needed rather than follow a particular order.

Compatible with both languages, the book material incorporates commands and structures that allow the reader to gain a greater awareness of MATLAB and Octave, write their own code, and implement their scripts and programs within a variety of applicable fields.

It is always made clear when particular examples apply only to MATLAB or only to Octave, allowing the book to be used flexibly depending on readers’ requirements.

Click here to read more...

Statistics and Data Visualisation with Python

Right!!! It is early December and this post has been in the inkwell for a few months now. Earlier in the year I received the comments and suggestions from reviewers and the final approval from the excellent team at CRC Press for my 4th book.

After a few weeks of frank procrastination and a few more on structuring the thoughts proposed a bit more, I have got a clear head to start writing. So I am pleased to announce that I am officially starting to write “Statistics and Data Visualisation with #Python”.

"Statistics and Data Visualisation with Python" builds from the ground up the basis for statistical analysis underpinning a number of applications and algorithms in business analytics, machine learning and applied machine learning. The book will cover the basics of programming in python as well as data analysis to build a solid background in statistical methods and hypothesis testing useful in a variety of modern applications.

Stay tuned!

Click here to read more...

"Advanced Data Science and Analytics with Python" - Arrived

I was not expecting this today, but I am very pleased to see that my first physical copies of "Advanced Data Science and Analytics" have arrived. I was working under the assumption that these would not be sent until after lockdowns were lifted, but that was not the case.

I am very happy to see the actual book and hold it in my hands!

I also hear that individual copies have started arriving to their new owners. If you ordered yours, let me know when it arrives. I will post your pictures!

Click here to read more...

Advanced Data Science and Analytics with Python - Discount

I am reaching out as volume 2 of my data science book will be out for publication in May and my publisher has made it possible for me to offer 20% off. You can order the book here.

This follows from "Data Science and Analytics with Python" and both books are intended for practitioners in data science and data analytics in both academic and business environments.

The new book aims to present the reader with concepts in data science and analytics that were deemed to be more advanced or simply out of scope in the author's first book, and are used in data analytics using tools developed in Python such as SciKit Learn, Pandas, Numpy, etc. The use of Python is of particular benefit given its recent popularity in the data science community. The book is therefore a reference to be used by seasoned programmers and newcomers alike and the key benefit is the practical approach presented throughout the book

More information about the first book can be found here.

Click here to read more...

Advanced Data Science and Analytics with Python - Final Corrections

Well, this are the final corrections for my latest book "Advanced Data Science and Analytics with Python". Next stop publication!

 

via Instagram https://ift.tt/2UrJ4oj

Click here to read more...

Advanced Data Science and Analytics with Python - Proofreading

Super excited to have received the proofread version of Advanced Data Science and Analytics with Python. They all seem to be very straightforward corrections: a few missing commas, some italics here and there and capitalisation bits and bobs.

I hope to be able to finish the corrections before my deadline for March 25th, and then enter the last phase before publication in May 2020.

Click here to read more...

Cover Draft for “Advanced Data Science and Analytics with Python”

I have received the latest information about the status of my book “Advanced Data Science and Analytics with Python”. This time reviewing the latest cover drafts for the book.

This is currently my favourite one.

Awaiting the proofreading comments, and I hope to update you about that soon.

Click here to read more...

Pandas 1.0 is out

If you are interested in #DataScience you surely have heard of #pandas and you would be pleased to hear that version 1.0 finally out. With better integration with bumpy and improvements with numba among others. Take a look!
— Read on www.anaconda.com/pandas-1-0-is-here/

Click here to read more...

Data Science Talk at University of Hertfordshire

It was great to invited to give the joint Physics Astronomy and Maths + Computer Science research seminar today at the University of Hertfordshire. I had a good opportunity to meet old colleagues and meet new faculty. There were also many students and they with many questions.

I was glad to hear they are thinking about offering more data science courses and even a dedicated programme. I would definitely be interested to hear more about that.

Click here to read more...

Advanced Data Science and Analytics with Python - Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of "Advanced Data Science and Analytics with Python".

The book has been some time in the making (and in the thinking...). It is a follow up from my previous book, imaginatively called "Data Science and Analytics with Python" . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

Click here to read more...

ODSC Europe 2019

It was a pleasure to come to the opening day of ODSC Europe 2019. This time round I was the first speaker of the first session, and it was very apt as the talk was effectively an introduction to Data Science.

The next 4 days will be very hectic for the attendees and it the quality is similar to the previous editions we are going to have a great time.

Click here to read more...

Natural Language Processing - Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

Click here to read more...

"Advanced Data Science And Analytics" is finished!

It has been a few months of writing, testing, re-writing and starting again, and I am pleased to say that the first complete draft of "Advanced Data Science and Analytics with Python" is ready. Last chapter is done and starting revisions now. Yay!

Click here to read more...

Adding new conda environment kernel to Jupyter and nteract

I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.

First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:

$ conda create -n tensorflow_env tensorflow pillow keras pandas jupyter ipykernel nb_conda

Now, to add this to the list of available environments in either Jupyter or nteract, we type the following:

$ conda activate tensor_env

$ python -m ipykernel install --name tensorflow_env


$ conda deactivate


Et voilà, you should now see the environment in the dropdown menu!

Click here to read more...

Data Science and Analytics with Python - Social Network Analysis

Using the time wisely during the Bank Holiday weekend. As my dad would say, "resting while making bricks"... Currently reviewing/editing/correcting Chapter 3 of "Advanced Data Science and Analytics with Python". Yes, that is volume 2 of "Data Science and Analytics with Python".

NSA_jrs.jpg

Click here to read more...

Social Network Analysis and Star Wars

On my way back to London and making the most of the time in the train to work on my Data Science and Analytics Vol 2 book. Working with #StarWars data to explain Social Network Analysis #datascience #geek

Click here to read more...

ODSC - Introduction to Data Science – A Practical Viewpoint


Very pleased to have the opportunity to share some thoughts with the keen audience attending the ODSC Europe 2018


My talk is not a technical presentation, as many of the other ones in the conference have been. Instead I wanted to present a workshop-style session that gives us the opportunity to interact with each other, share experiences and learn best practice in data science. The audience in mind is varied, from newcomers to the field to experienced practitioners.  You can find a handout of the slides in the link below:



[gview file="http://jrogel.com/wp-content/uploads/2018/09/JRogel_ODSC_An_Intro_To_DataScience.pdf"]

Click here to read more...

nteract - a great Notebook experience

I am a supporter of using Jupyter Notebooks for data exploration and code prototyping. It is a great way to start writing code and immediately get interactive feedback. Not only can you document your code there using markdown, but also you can embed images, plots, links and bring your work to life.

Nonetheless, there are some little annoyances that I have, for instance the fact that I need to launch a Kernel to open a file and having to do that "the long way" - i.e. I cannot double-click on the file that I am interested in seeing. Some ways to overcome this include looking at Gihub versions of my code as the notebooks are rendered automatically, or even saving HTML or PDF versions of the notebooks. I am sure some of you may have similar solutions for this.

Last week, while looking for entries on something completely different, I stumbled upon a post that suggested using nteract. It sounded promising and I took a look. It turned out to be related to the Hydrogen package available for Atom, something I have used in the past and loved it. nteract was different though as it offered a desktop version and other goodies such as in-app support for publishing, a terminal-free experience sticky cells, input and output hiding... Bring it on!

I just started using it, and so far so good. You may want to give it a try, and maybe even contribute to the git repo.

nteract_screenshot.jpg

Click here to read more...

Intro to Data Science Talk

Full room and great audience at General Assembly his evening. Lots of thoughtful questions and good discussion.

Click here to read more...

Now... presenting at ODSC Europe

Data science is definitely in everyone’s lips and this time I had the opportunity of showcasing some of my thoughts, practices and interests at the Open Data Science Conference in London.

The event was very well attended by data scientists, engineers and developers at all levels of seniority, as well as business stakeholders. I had the great opportunity to present the landscape that newcomers and seasoned practitioners must be familiar with to be able to make a successful transition into this exciting field.

It was also a great opportunity to showcase “Data Science and Analytics with Python” and to get to meet new people including some that know other members of my family too.

-j

Click here to read more...

Data Science and Analytics with Python - New York Team

Earlier this week I received this picture of the team in New York. As you can see they have recently all received a copy of my "Data Science and Analytics with Python" book.

Thanks guys!

TeamNY.PNG

Click here to read more...

Another "Data Science and Analytics with Python" Delivered

Another "Data Science and Analytics with Python" Delivered. Thanks for sharing the picture Dave Groves.

Click here to read more...

Data Science and Analytics - In the hands of readers!

I’m very pleased to see that my “Data Science and Analytics” book is arriving to the hands of readers.

Here’s a picture that my colleague and friend Rob Hickling sent earlier today:

 

Click here to read more...


"Essential Matlab and Octave" in the CERN Document Server

I got pinged this screenshot from a friend that saw "Essential MATLAB and Octave" being included in the CERN Document Server!

Chuffed!

 

Click here to read more...

Data Science and Analytics with Python - Proofread Manuscript

I have now received comments and corrections for the proofreading of my “Data Science and Analytics with Python” book.

Two weeks and counting to return corrections and comments back to the editor and project manager.

 

Click here to read more...

Data Analytics Python

"Data Science and Analytics with Python" enters production

Data Analytics Python

I am very pleased to tell you about some news I received a couple of weeks ago from my editor: my book "Data Science and Analytics with Python" has been transferred to the production department so that they can begin the publication process!

UPDATE: The book is available here.

The book has been assigned a Project Editor who will handle the proofreading and handle all aspects of the production process. This was after clearing the review process I told you about some time ago. The review was lengthy but it was very positive and the comments of the reviewers have definitely improved the manuscript.

As a result of the review, the table of contents has changed a bit since the last update I posted. Here is the revised table:

  1. The Trials and Tribulations of a Data Scientist
  2. Python: For Something Completely Different!
  3. The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
  4. The Relationship Conundrum: Regression
  5. Jackalopes and Hares: Clustering
  6. Unicorns and Horses: Classification
  7. Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
  8. Less is More: Dimensionality Reduction
  9. Kernel Trick Under the Sleeve: Support Vector Machines

Each of the chapters is intended to be sufficiently self-contained. There are some occasions where reference to other sections is needed, and I am confident that it is a good thing for the reader. Chapter 1 is effectively a discussion of what data science and analytics are, paying particular attention to the data exploration process and munging. It also offers my perspective as to what skills and roles are required to get a successful data science function.

Chapter 2 is a quick reminder of some of the most important features of Python. We then move into the core of machine learning concepts that are used in the rest of the book. Chapter 4 covers regression from ordinary least squares to LASSO and ridge regression. Chapter 5 covers clustering (k-means for example) and Chapter 6 classification algorithms such as Logistic Regression and Naïve Bayes.

In Chapter 7 we introduce the use of hierarchical clustering, decision trees and talk about ensemble techniques such as bagging and boosting.

Dimensionality reduction techniques such as Principal Component Analysis are discussed in Chapter 8 and Chapter 9 covers the support vector machine algorithm and the all important Kernel trick in applications such as regression and classification.

The book contains 55 figures and 18 tables, plus plenty of bits and pieces of Python code  to play with.

I guess I will have to sit and wait for the proofreading to be completed and then start the arduous process of going through the comments and suggestions. As ever I will keep you posted as how things go.

Ah! By the way, I will start a mailing list to tell people when the book is ready, so if you are interested, please let me know!

Keep in touch!

PS. The table of contents is also now available at CRC Press here.

Click here to read more...

Artificial Intelligence, Revealed

A few weeks ago I was invited by General Assembly to give a short intro to Data Science to a group of interested (and interesting) students. They all had different backgrounds, but they all shared an interest for technology and related subjects.

While I was explaining some of the differences between supervised and unsupervised machine learning, I used my example of an alien life trying to cluster (and eventually classify) cats and dogs. If you are interested to know more about this, you will probably have to wait for the publication of my "Data Science and Analytics with Python" book.. I digress...

So, Ed Shipley - one of the admissions managers at GA London - asked me and the students if we had seen the videos that Facebook had produced to explain machine learning... He was reminded of them as they use an example about a machine distinguishing between dogs and cars... (see what they did there?...). If you haven't seen the videos, here you go:

Intro to AI

Machine Learning

Convolutional Neural Nets

Click here to read more...

Data Analytics Python

First full draft of "Data Science and Analytics with Python"

It has been nearly 12 months in development almost to the day, and I am very please to tell you that the first full draft of my new book entitled "Data Science and Analytics with Python" is ready.

Data Analytics Python

The book is aimed at data enthusiasts and professionals with some knowledge of programming principles as well as developers and business people interested in learning more about data science and analytics The proposed table of contents is as follows:

  1. The Trials and Tribulations of a Data Scientist
  2. Firsts Slithers with Python
  3. The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
  4. The Relationship Conundrum: Regression
  5. Jackalopes and Hares, Unicorns and Horses: Clustering and Classification
  6. Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
  7. Dimensionality Reduction and Support Vector Machines

At the moment the book contains 53 figures and 18 tables, plus plenty of bits and pieces of code ready to be tried.

The next step is to start the re-reading, re-draftings and revisions in preparation for the final version and submission to my publisher CRC Press later in the year. I will keep you posted as how things go.

Keep in touch!

 

Click here to read more...