I Run a Machine-Learning-Based Chatbot on a Raspberry Pi, and Here’s What Happened

Running a machine learning program on a Raspberry Pi probably is not a new thing for you. But for me, it’s a new experience and it gave me a new understanding about it. Originally, this experiment was done about two months ago (early August 2017) and already posted on my Facebook, but I think it’s good to share the story here.

Intro

So, I just bought my first Raspberry Pi from a local online marketplace for about $50 (1 unit Raspberry Pi 3 Model B and 1 set official case). It doesn’t take me a long time to start my this experiment. From the beginning, I already interested to see how far this cute machine can go with machine learning programs running on it, and am more likely to use this Raspberry Pi as a low-power headless server rather than use it on hardware-hacking projects.

Here’s the hardware specification for Raspberry Pi 3 Model B:

Quad Core 1.2GHz Broadcom BCM2837 64bit CPU
1GB RAM
BCM43438 wireless LAN and Bluetooth Low Energy (BLE) on board
40-pin extended GPIO
4 USB 2 ports
4 Pole stereo output and composite video port
Full size HDMI
CSI camera port for connecting a Raspberry Pi camera
DSI display port for connecting a Raspberry Pi touchscreen display
Micro SD port for loading your operating system and storing data
Upgraded switched Micro USB power source up to 2.5A
I installed a copy of Raspbian Jessie operating system on a 32GB Micro-SD card.

Preparation

For my first project on Raspberry Pi, I wrote a machine-learning-based text classification program. The code itself was written in Python, and utilize the famous NLTK module for it. And since I’m going to try to access the program from a Telegram Bot (or basically from any other external services), I also made an API, and Flask was the chosen module. To create the Telegram bot handler, I trusted Python Telegram Bot.

The Abstraction

The training data for this machine learning was taken from BBC dataset that was collected by D. Greene & P. Cunningham from University College Dublin. This dataset contains 2225 documents (news article) during 2004–2005, and divided into five categories: business, entertainment, politic, sport & tech. But in the end, I only use 1930 documents out of 2225 documents (386 documents per category) because the data imbalance in each categories. For your information, imbalanced data leads to wrong classification. It tends to classify the testing data into categories with higher training data.

Training The Machine Learning

I use a mid-2013 Macbook Air to wrote the code (also to compare the performance between these two machines — I know, it’s not fair). After the data were pre-processed, the machines were trained using Naive Bayes Classifier algorithm, and of course, there were huge performance gap between my Macbook Air and Raspberry Pi.

On my Macbook Air, it took only 15 minutes to finish all the processes (pre-processing and training). On my Raspberry Pi, I chose to stop the process in 40th minute because there were no signs that it will going to be finished anytime soon (it was even still on the first step of pre-processing — out of several pre-processing steps).

I ended up using another approach to make this happen. The training was done in the Macbook Air, and put the generated intelligence into the Raspberry Pi’s system — thank’s to Pickle! — and this approach seemed to work.

Pickle is used for serializing and de-serializing a Python object structure. Any object in python can be pickled so that it can be saved on disk. What pickle does is that it “serialises” the object first before writing it to file. — PythonTips.com

The Result

I copied the generated pickle file from my Macbook Air to Raspberry Pi using a simple scp command and the Raspberry Pi had no problem to read it. Everything ran well. I tried to classify several articles using both Macbook Air and Raspberry Pi, and here’s the result:

Macbook Air took 0–1 second(s) in average to classify an article, and Raspberry Pi took 2–5 second(s) in average to classify an article.

Text classification result on Macbook Air

Even though I think the Raspberry Pi showed a pretty good result (or not?), but somehow I believe that we can still improve the performance.

The Telegram bot is the part I love the most. We can access the system (and you can add any other features) nearly anywhere and anytime as long as both the users and the Raspberry Pi connected to a working internet connection.

The Telegram bot in action, running on Raspberry Pi

Have you done a machine learning process on a Raspberry Pi? Let the world hear your stories!

This story is imported from my Medium; and the featured image in this post is edited using my Spotify Photo Filter which you can use for free from my Mini Product.