Transcribe a Voice Message in Node.js with Twilio Functions

April 15, 2021
Written by
Reviewed by

voicetranscribe.png

In this tutorial you’ll leverage Twilio Programmable Voice to receive phone calls at your Twilio phone number, and transcribe any voice messages left by the caller. This guide can be used as a foundation to build your own voicemail system.

Prerequisites

To get started with this tutorial, you’ll need the following:

Create a new Twilio Functions Service

Through Twilio, you can configure a webhook that will handle any incoming calls to your Twilio phone number.

You’ll need a place to host the code for the webhook - this could be your own server, or locally running application, but for this tutorial you’ll use Twilio Functions. Twilio Functions is a serverless environment that lets you deploy backend services without the overhead of setting up a server.

To set up a new Functions service, visit the Functions section of the Twilio Console. Once there, click the button that says Create Service. A Functions Service is a unique container for related functions, assets, and environments.

Clicking the Create Service button will prompt you to enter a friendly name. For this field, enter “transcription-service”. After entering the friendly name, click the Next button. This will take you to the editor for your new Service.

Towards the top of the page click the blue Add + button and select Add Function. Give the function the name /transcribe-call.

You’ll be able to edit your function in the text editor to the right.

Delete any code automatically populated inside your new function and replace it with the following:

exports.handler = function(context, event, callback) {
  let twiml = new Twilio.twiml.VoiceResponse();
  
  twiml.record({
    transcribeCallback: '/transcription'
  });
  
  return callback(null, twiml);
};

Before I dive into the TwiML <Record> verb it’s important to mention that recording phone calls or voice messages has a variety of legal considerations and you must ensure that you’re adhering to local, state, and federal laws when recording anything.

The code above first creates a new variable called twiml that holds a reference to a new TwiML Voice Response object.

TwiML, which stands for Twilio Markup Language, is XML that has special tags defined by Twilio. You can use TwiML to tell Twilio how to handle an incoming phone call or SMS. Instead of writing XML, you can also write TwiML programmatically, which is what you’re doing in this function.

After creating the twiml variable, this code says “Okay, Twilio, record whatever is said!”. <Record> is one of many TwiML verbs. TwiML verbs tell Twilio what actions to take, and these actions can be customized by providing the verb with certain parameters called attributes.

The <Record> verb will create an audio recording of anything the caller says after the call connects, and it can be modified with a number of different attributes. The attributes most relevant for this tutorial are transcribe and transcribeCallback.

transcribe is an optional attribute that, when included and set to true, will tell Twilio to create a speech-to-text transcription of any message left by the caller, with the caveat that the message has to be between 2 and 120 seconds in length. This means that some very short messages and very long messages will not be transcribed, though the actual audio recordings of the message will not be impacted.

The content of the transcription will be stored by Twilio for you, and can be accessed via the transcription API.

Alternatively, you can provide a transcription callback to the <Record> verb that will execute when the transcription is finished. In this callback, you can access the contents of the transcription and perform an action on it, like save it to a database or print it to a webpage.

If you use the transcribeCallback attribute, you don’t also need to include the transcribe: true attribute. This all is what you’re seeing in the code above, on lines 4-6.

This brings you to your next step: creating the transcription callback function.

Add the transcription callback function

Hit the Save button in the Functions editor and then add a new function by once again clicking the Add + button towards the top of the editor.

Give this function the name /transcription.

Delete any code that’s prepopulated in your function and replace it with the following:

exports.handler = function(context, event, callback) {
  const transcription = event.TranscriptionStatus == 'failed' ? 'No transcription available' : event.TranscriptionText;
  console.log(transcription)

  // do something with transcription text here
  
  return callback(null);
}

All the data about the transcription is available on the event object.

In the code above, the function checks to see if the TranscriptionStatus is failed. If so, it assigns the string No transcription available to a variable called transcription.

If the transcription was successful, this code assigns the actual content of the transcription, event.TranscriptionText, to the transcription variable. It then logs the value of transcription.

Click the Save button, and then hit Deploy All towards the bottom of the editor.

Screenshot of Functions editor showing the two functions and the code inside

Configure the webhook for your Twilio phone number

Don’t close the Functions editor, and in a new tab, visit the Twilio phone numbers section of the Console.

Find the phone number you’re using for this tutorial in the list and click on it to open the configuration page for that number.

Scroll down until you see a section titled Voice & Fax.

Make the following adjustments to the information shown in this section:

  • For Accept Incoming, select Voice Calls
  • For Configure With, select Webhooks, TwiML Bins, Functions, Studio, or Proxy
  • For A Call Comes In, select Function
  • For Service, select /transcribe-call 
  • For Environment, select ui
  • For Function Path, select /call 

Screenshot showing webhook configuration for twilio phone number

After making these changes, click the Save button and then head back to the tab with your Functions editor.

Test your app

Back in your Functions editor, ensure that the Enable Live Logs option is on in the console portion of the editor.

Screenshot showing live logs toggle turned on

Call your Twilio phone number from your personal phone. You’ll hear a beep after which you can speak into the phone and say a few words. Make sure you speak for at least a few seconds to ensure that there is enough content for the transcription to be triggered. After leaving your message, hang up the call.

While you do this, keep an eye on the console portion of the Functions editor. It may take a few seconds, but shortly you’ll see the transcription text logged to the console.

Congratulations, now that you’ve learned how to record transcriptions, what will you do next? Let me know on twitter!

Ashley is a JavaScript Editor for the Twilio blog. To work with her and bring your technical stories to Twilio, find her at @ahl389 on Twitter. If you can’t find her there, she’s probably on a patio somewhere having a cup of coffee (or glass of wine, depending on the time).