The Image Classifier is done, throw it in the Autopilot mix
Ok, so all of the above helped us set up the image classifier. But how can we add it in the Autopilot flow since there is no direct way to do so in the Twilio Console?
Let's understand the flow we wish to achieve:
- User sends a message
- We need to understand if this message is an image or text. This means that when the user sends the message, we first send it to our own webhook to be evaluated
- If it's text, then we send it to Autopilot directly (from where it will be directed to the relevant Task)
- If it's an image, then we want to grab the image URL and send it to our
handleImage()
function above so it can be processed by our Classifier
The above means that we need to handle different payloads that will come from the user. Here is an example of the different payloads that we get with our final setup:
- For a text message, we get a payload like below:
- For an image message, we get a payload like below:
It's up to you to set the logic to differentiate between those. A clear distinction is the presence of "MediaContentType0" in the image payload, and we will use that as the differentiator in our case so your main webhook endpoint when the user messages could look like this (in the simplest form)
app.post('/webhook', function(req, res) { let payload; let result; /*When a POST is sent to this endpoint, check what the payload looks like. Depending on the type, we send the relevant info to the handler methods*/ payload = req.body; if (!payload) res.sendStatus(404); //If the payload is an image, return the relevant details if (payload.MediaContentType0) { if (payload.MediaContentType0.indexOf("image") !== -1) { result = { "type": "image", "value": payload.MediaUrl0, "mid": payload.MessageSid, "from": payload.From, "to": payload.To, "channel": channel } } } else { result = { "type": "text", "value": payload.Body, "mid": payload.MessageSid, "from": payload.From, "to": payload.To, "channel": channel } } if (result.type === "text") { console.log("\nFound text message\n1. Value: " + result.value + "\n2. Mid: " + result.mid + "\n3. From: " + result.from + "\n4. To: " + result.to + "\n5. Channel: " + result.channel); handler.handleText(result.value, result.mid, result.from, result.to, result.channel); } else if (result.type === "image") { console.log("\nFound image\n1. Value: " + result.value + "\n2. Mid: " + result.mid + "\n3. From: " + result.from + "\n4. To: " + result.to + "\n5. Channel: " + result.channel); handler.handleImage(result.value, result.mid, result.from, result.to, result.channel); } res.status(200).send("OK"); });
So now we have the initial webhook routing set up. Let's see how to handle the incoming payload.
Handle Incoming Text
In the function below we send the incoming text to the relevant Autopilot channel endpoint. We then do the following:
- Get the response, which will be in TwiML format
- Parse the TwiML and extract the text
- Send the text to the relevant channel
Obviously this is handling the case where the Autopilot response consists only of text. For more complex cases (e.g. images or other data), you need to build the function accordingly:
function handleText(text, mid, userId, pageId, channel) { //Send user text to the Autopilot channels endpoint request({ url: "https://channels.autopilot.twilio.com/v1/" + process.env.ACCOUNT_SID + "/" + process.env.AUTOPILOT_SID + "/twilio-messaging/" + channel, method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded", "Accept": "text/xml" }, form: { Body: text, MessageSid: mid, To: pageId, From: userId }, auth: { user: process.env.ACCOUNT_SID, pass: process.env.AUTH_TOKEN } }, function(error, response, body) { var jsonData; //Checking if the response contains the xml string that is part of a valid TwiML if (body.toString().includes("?xml")) { //We are using "xml2js" npm module to parse the TwiML var parseString = require('xml2js').parseString; var twiml = body.toString(); parseString(twiml, function(err, result) { let reply = result.Response.Message[0].Body[0]; console.log("Final reply: " + reply); //Send the response back to the relevant channel replyToChannel(channel, userId, pageId, reply); }); } else if (body.code === "20001" || body.status !== 200) { replyToChannel(channel, userId, pageId, "Hmm something went wrong, we are working hard to fix it. Please try again later..."); } }); }
Handle Incoming Images
Now it’s time to handle the incoming image sent from the end-user and produce the results i.e. to understand which category it belongs to. We will do this with the following:
- Get the image URL from the relevant channel (for example WhatsApp)
- Download the image locally - here in a
compare.jpg
file - using the image-downloader npm module - Call the relevant Python script to evaluate the image, keeping only the top match (this can be changed to get the full list of training sets and what the match probability for each one was)
- From the top match get the category name (i.e. removing the probability so that we can post the category name to the user)
function handleImage(url, mid, from, to, channel) { const options = { url: url, dest: './compare.jpg' } //Download the image locally and store in compare.jgp download.image(options) .then(({ filename, image }) => { //Call the python script to evaluate which set image belongs to. In this example we get only the highest match. Remove the 'head' command to see full output' let output = shell.exec('python ImageClassifier/trainingSets/label_image.py --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=' + filename + ' | head -n 1'); console.log("out: " + output); //From the output we get the name of the highest match let arr = output.split(' '); let final = ""; for (let i = 0; i < arr.length; i++) { if (arr[i].indexOf("0.") === -1) { final += arr[i] + " "; } } let reply; reply = "The highest match I found is " + final.trim(); replyToChannel(channel, from, to, reply); }) .catch((err) => console.error(err)) }
Reply to Channel
Although this is not mandatory, if you start adding more channels it would be useful to have a function for it. In our case we respond to WhatsApp in a function that looks like this:
const client = require('twilio')(process.env.ACCOUNT_SID, process.env.AUTH_TOKEN); /*This function sends the reply/payload to each respective channel*/ function replyToChannel(channel, userId, pageId, reply) { if (channel === "whatsapp") { client.messages .create({ from: pageId, body: reply, to: userId }) .then(message => console.log(message.sid)); } }
Setting the Twilio Channel endpoint
We have now set up our Image Classifier and built our handler functions to manage the different payloads.
The very last step is to add our webhook endpoint as the "entry" point on Twilio Console for the channel we are interested in exposing this extra functionality.
We promised we will try this on WhatsApp! So let's go ahead and do it (either on our Sandbox if you want to do testing or your main WhatsApp Sender number if you have one provisioned). In this example we show the Sandbox page:
Testing the Image Classifier across Channels
So now we can send images to our conversation and see what results we get. Notice that the image handling in our channels is dependent on what you or your business want to do. In this sample app, we decide that if the user uploads an image, we want to pass it through our Image Classifier.
So below is the end result in WhatsApp.
Conclusion
Twilio Autopilot journeys can be enhanced in many ways based on your business case. In this example, we built our own image classifier and added it in the flow. This is just a starting point, but think about all the amazing things you can achieve!
Evangelos Resvanis is a Solutions Engineer at Twilio. He is currently working on his favorite topics, Autopilot and Twilio Functions, exploring and expanding the possibilities offered by these tools. He can be reached at eresvanis [at] twilio.com.