Friday, 6 December, 2019 UTC


Summary

In this article, I will show you one of the many possible ways to build an Image Classifier using Twilio Autopilot in your flow. We will use Node.js with TensorFlow and show the final result on WhatsApp.
Let's get started.
Pre-requisites
Our Image Classifier is based on Transfer Learning using TensorFlow. Transfer Learning is a great way to build on top of thousands of existing training models that are already available, saving you valuable development time and resources. To learn more about it and apply your own logic, check out this TensorFlow image retraining tutorial.
Install TensorFlow by following these relevant steps.
The Main Components
The building blocks for this demo code are the following:
  • A Node.js Express-based app to handle all incoming requests (for example, getting the name of the new set of images you want to train the bot with; e.g. "polar bear")
  • Script for downloading the relevant images from the web, clearing the invalid ones, and adding to the overall training sets
  • Script for building the new model (which needs to run for any new training set added so it can be recognized and taken into account in the results)
  • A function to handle the incoming image that the user will send and will be compared against our created training sets
The Folder Structure
Although this is entirely up to your implementation, here is a sample folder structure (you will see that when we call the scripts we include the relative paths).
Note: The “webpage” folder simply contains a sample webpage you might want to implement to visualize the process and call the endpoints. The implementation is left to you, although you can simply call the endpoints from any API app like Postman.
The Express app main endpoints
Below we register two main endpoints to our Express app:
  • /imageClassifier: this is the endpoint where we send the words of the new category of images we want to include e.g. "polar bear". Then we get that category name and call a script named “setupModel.sh” that will take care of downloading the images we need from the web and cleaning the invalid ones.
  • /buildModel: this endpoint calls our second script called “buildModel.sh” which trains the overall set of categories or training sets we have. Every time we add a new set of images we need to call this endpoint to properly train everything.
 /*This webhook takes the text that describes what model the business wants to build e.g. "national IDs"  Calls the script that sets up the models  */  app.post('/imageclassifier', function(req, res) {   console.log("In the app: " + req.body.text);  let imageCategory = req.body.text;  console.log(shell.pwd());  shell.exec('./ImageClassifier/setupModel.sh \"' + imageCategory + '\"');  res.status(200).send("OK");  });   /*Calls the script that builds the models for the folders included in the trainingSets*/  app.post('/buildModel', function(req, res) {  shell.exec('./ImageClassifier/buildModel.sh');  res.status(200).send("OK");  }); 
In this specific implementation we use the npm module shelljs to call the scripts.
The Shell Scripts

setupModel.sh

This script does the following:
  1. Downloads all the images of the chosen category in the trainingSets folder using the relevant Python module. For our example, we choose only “jpg” file types and we download only twenty images, but this can be changed to match your needs. The more images downloaded, the more accurate the results will be.
  2. Enters the relevant new folder created and performs a cleanup of invalid files.
  3. Reduces the file name characters because some filenames can be really long and fail during training in the TensorFlow script.
#!/bin/bash googleimagesdownload -o ./ImageClassifier/trainingSets -f jpg --keywords "$1" --limit 20 cd ./ImageClassifier/trainingSets/"$1" find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1| xargs rm for FILE in *.jpg ; do mv "${FILE}" "${FILE:0:20}.jpg" ; done echo "DONE WITH CREATING IMAGE CATEGORY" 
A sample output would be like the below:

buildModel.sh

This script builds the whole Image Classifier after you have added all the training sets you need. Every time you add new ones, you need to run this script again. It defaults to 4000 training steps but this can be changed.
IMPORTANT: You need a MINIMUM of two training sets created from the previous step, otherwise the training will fail.
As mentioned above, you can find all the details regarding the script and the overall retrain process on the TensorFlow image retraining tutorial.
#!/bin/bash echo "STARTED BUILDING. TENSORFLOW VERBOSITY CAN BE CHANGED FROM retrain.py LINE 994" python -W ignore ./ImageClassifier/code/retrain.py --image_dir=./ImageClassifier/trainingSets echo "DONE WITH BUILDING" 
A sample output could look like the following:
The Image Classifier is done, throw it in the Autopilot mix
Ok, so all of the above helped us set up the image classifier. But how can we add it in the Autopilot flow since there is no direct way to do so in the Twilio Console?
Let's understand the flow we wish to achieve:
  • User sends a message
  • We need to understand if this message is an image or text. This means that when the user sends the message, we first send it to our own webhook to be evaluated
  • If it's text, then we send it to Autopilot directly (from where it will be directed to the relevant Task)
  • If it's an image, then we want to grab the image URL and send it to our handleImage() function above so it can be processed by our Classifier
The above means that we need to handle different payloads that will come from the user. Here is an example of the different payloads that we get with our final setup:
- For a text message, we get a payload like below:
- For an image message, we get a payload like below:
It's up to you to set the logic to differentiate between those. A clear distinction is the presence of "MediaContentType0" in the image payload, and we will use that as the differentiator in our case so your main webhook endpoint when the user messages could look like this (in the simplest form)
 app.post('/webhook', function(req, res) { let payload; let result; /*When a POST is sent to this endpoint, check what the payload looks like. Depending on the type, we send the relevant info to the handler methods*/ payload = req.body; if (!payload) res.sendStatus(404); //If the payload is an image, return the relevant details if (payload.MediaContentType0) { if (payload.MediaContentType0.indexOf("image") !== -1) { result = { "type": "image", "value": payload.MediaUrl0, "mid": payload.MessageSid, "from": payload.From, "to": payload.To, "channel": channel } } } else { result = { "type": "text", "value": payload.Body, "mid": payload.MessageSid, "from": payload.From, "to": payload.To, "channel": channel } } if (result.type === "text") { console.log("\nFound text message\n1. Value: " + result.value + "\n2. Mid: " + result.mid + "\n3. From: " + result.from + "\n4. To: " + result.to + "\n5. Channel: " + result.channel); handler.handleText(result.value, result.mid, result.from, result.to, result.channel); } else if (result.type === "image") { console.log("\nFound image\n1. Value: " + result.value + "\n2. Mid: " + result.mid + "\n3. From: " + result.from + "\n4. To: " + result.to + "\n5. Channel: " + result.channel); handler.handleImage(result.value, result.mid, result.from, result.to, result.channel); } res.status(200).send("OK"); });
So now we have the initial webhook routing set up. Let's see how to handle the incoming payload.

Handle Incoming Text

In the function below we send the incoming text to the relevant Autopilot channel endpoint. We then do the following:
  • Get the response, which will be in TwiML format
  • Parse the TwiML and extract the text
  • Send the text to the relevant channel
Obviously this is handling the case where the Autopilot response consists only of text. For more complex cases (e.g. images or other data), you need to build the function accordingly:
function handleText(text, mid, userId, pageId, channel) { //Send user text to the Autopilot channels endpoint request({ url: "https://channels.autopilot.twilio.com/v1/" + process.env.ACCOUNT_SID + "/" + process.env.AUTOPILOT_SID + "/twilio-messaging/" + channel, method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded", "Accept": "text/xml" }, form: { Body: text, MessageSid: mid, To: pageId, From: userId }, auth: { user: process.env.ACCOUNT_SID, pass: process.env.AUTH_TOKEN } }, function(error, response, body) { var jsonData; //Checking if the response contains the xml string that is part of a valid TwiML if (body.toString().includes("?xml")) { //We are using "xml2js" npm module to parse the TwiML var parseString = require('xml2js').parseString; var twiml = body.toString(); parseString(twiml, function(err, result) { let reply = result.Response.Message[0].Body[0]; console.log("Final reply: " + reply); //Send the response back to the relevant channel replyToChannel(channel, userId, pageId, reply); }); } else if (body.code === "20001" || body.status !== 200) { replyToChannel(channel, userId, pageId, "Hmm something went wrong, we are working hard to fix it. Please try again later..."); } }); }

Handle Incoming Images

Now it’s time to handle the incoming image sent from the end-user and produce the results i.e. to understand which category it belongs to. We will do this with the following:
  1. Get the image URL from the relevant channel (for example WhatsApp)
  2. Download the image locally - here in a compare.jpg file - using the image-downloader npm module
  3. Call the relevant Python script to evaluate the image, keeping only the top match (this can be changed to get the full list of training sets and what the match probability for each one was)
  4. From the top match get the category name (i.e. removing the probability so that we can post the category name to the user)
function handleImage(url, mid, from, to, channel) { const options = { url: url, dest: './compare.jpg' } //Download the image locally and store in compare.jgp download.image(options) .then(({ filename, image }) => { //Call the python script to evaluate which set image belongs to. In this example we get only the highest match. Remove the 'head' command to see full output' let output = shell.exec('python ImageClassifier/trainingSets/label_image.py --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=' + filename + ' | head -n 1'); console.log("out: " + output); //From the output we get the name of the highest match let arr = output.split(' '); let final = ""; for (let i = 0; i < arr.length; i++) { if (arr[i].indexOf("0.") === -1) { final += arr[i] + " "; } } let reply; reply = "The highest match I found is " + final.trim(); replyToChannel(channel, from, to, reply); }) .catch((err) => console.error(err)) }

Reply to Channel

Although this is not mandatory, if you start adding more channels it would be useful to have a function for it. In our case we respond to WhatsApp in a function that looks like this:
const client = require('twilio')(process.env.ACCOUNT_SID, process.env.AUTH_TOKEN); /*This function sends the reply/payload to each respective channel*/ function replyToChannel(channel, userId, pageId, reply) { if (channel === "whatsapp") { client.messages .create({ from: pageId, body: reply, to: userId }) .then(message => console.log(message.sid)); } }
Setting the Twilio Channel endpoint
We have now set up our Image Classifier and built our handler functions to manage the different payloads.
The very last step is to add our webhook endpoint as the "entry" point on Twilio Console for the channel we are interested in exposing this extra functionality.
We promised we will try this on WhatsApp! So let's go ahead and do it (either on our Sandbox if you want to do testing or your main WhatsApp Sender number if you have one provisioned). In this example we show the Sandbox page:
Testing the Image Classifier across Channels
So now we can send images to our conversation and see what results we get. Notice that the image handling in our channels is dependent on what you or your business want to do. In this sample app, we decide that if the user uploads an image, we want to pass it through our Image Classifier.
So below is the end result in WhatsApp.
Conclusion
Twilio Autopilot journeys can be enhanced in many ways based on your business case. In this example, we built our own image classifier and added it in the flow. This is just a starting point, but think about all the amazing things you can achieve!
Evangelos Resvanis is a Solutions Engineer at Twilio. He is currently working on his favorite topics, Autopilot and Twilio Functions, exploring and expanding the possibilities offered by these tools. He can be reached at eresvanis [at] twilio.com.