Introduction
Validating phone numbers can be a very challenging task. The format of a phone number can vary from one country to another. Heck, it can also vary within the same country! Some countries share the same country code, while some other countries use more than one country code. According to an example from the Google's libphonenumber
GitHub repository, USA, Canada, and Caribbean islands, all share the same country code (+1
). On the other hand, it is possible to call the phone numbers from Kosovo by Serbian, Slovenian and Moroccan country codes.
These are only a few of the challenges in identifying or validating phone numbers. At first glance, one can at least validate the country code of a phone number with a RegEx. However, this means that you would have to write a custom RegEx rule for every country in the world, just to validate a country code. On top of that, some mobile phone carriers have their own rules (for example, certain digits can only use a certain range of numbers). You can see that things can quickly get out of hand and make it almost impossible for us to validate phone number inputs by ourselves.
Luckily, there is a Python library that can help us to get through the validation process easily and efficiently. The Python Phonenumbers library is derived from Google’s libphonenumber
library, which is also available for other programming languages like C++, Java, and JavaScript.
In this tutorial, we’ll learn how to parse, validate and extract phone numbers, as well as how to extract additional information from the phone number(s) like the carrier, timezone, or geocoder details.
Using the library is very straight-forward and it's typically used like this:
import phonenumbers
from phonenumbers import carrier, timezone, geocoder
my_number = phonenumbers.parse("+447986123456", "GB")
print(phonenumbers.is_valid_number(my_number))
print(carrier.name_for_number(my_number, "en"))
print(timezone.time_zones_for_number(my_number))
print(geocoder.description_for_number(my_number, 'en'))
And here's the output:
True
EE
('Europe/Guernsey', 'Europe/Isle_of_Man', 'Europe/Jersey', 'Europe/London')
United Kingdom
Let’s get started by setting up our environment and installing the library.
Installing phonenumbers
First, let's create and activate our virtual environment:
$ mkdir phonenumbers && cd phonenumbers
$ python3 -m venv venv
$ . venv/bin/active # venv\Scripts\activate.bat on Windows
Then we install the Python Phonenumbers library:
$ pip3 install Phonenumbers
This tutorial will use Phonenumbers library version of 8.12.19
.
Now we are ready to start discovering the Phonenumbers library.
Parse Phone Numbers with Python phonenumbers
Whether you get user input from a web form or other sources, like extracting from some text (more on that later in this tutorial), the input phone number will most likely be a string. As a first step, we’ll need to parse it using phonenumbers
, and turn it into a PhoneNumber
instance so that we can use it for validation and other functionalities.
We can parse the phone number using the parse()
method:
import phonenumbers
my_string_number = "+40721234567"
my_number = phonenumbers.parse(my_string_number)
The phonenumbers.parse()
method takes a phone number string as a required argument. You can also pass the country information in ISO Alpha-2 format as an optional argument. Take, for example, the following code into consideration:
my_number = phonenumbers.parse(my_string_number, "RO")
"RO" stands for Romania in ISO Alpha-2 format. You can check other Alpha-2 and numeric country codes from this website. In this tutorial, for simplicity, I will omit the ISO Alpha-2 country code for most cases and include it only when it's strictly necessary.
The phonenumbers.parse()
method already has some built-in basic validation rules like the length of a number string, or checking a leading zero, or for a +
sign. Note that this method will throw an exception when any of the needed rules are not fulfilled. So remember to use it in a try/catch block in your application.
Now that we got our phone number parsed correctly, let's proceed to validation.
Validate Phone Numbers with Python Phonenumbers
Phonenumbers has two methods to check the validity of a phone number. The main difference between these methods is the speed and accuracy.
To elaborate, let's start with is_possible_number()
:
import phonenumbers
my_string_number = "+40021234567"
my_number = phonenumbers.parse(my_string_number)
print(phonenumbers.is_possible_number(my_number))
And the output would be:
True
Now let's use the same number, but with the is_valid_number()
method this time:
import phonenumbers
my_string_number = "+40021234567"
my_number = phonenumbers.parse(my_string_number)
print(phonenumbers.is_valid_number(my_number))
Even though the input was the same, the result would be different:
False
The reason is that the is_possible_number()
method makes a quick guess on the phone number's validity by checking the length of the parsed number, while the is_valid_number()
method runs a full validation by checking the length, phone number prefix, and region.
When iterating over a large list of phone numbers, using phonenumbers.is_possible_number()
would provide faster results comparing to the phonenumbers.is_valid_number()
. But as we see here, these results may not always be accurate. It can be useful to quickly eliminate phone numbers that do not comply with the length. So use it at your own risk.
Extract and Format Phone Numbers with Python Phonenumbers
User input is not the only way to get or collect phone numbers. For instance, you may have a spider/crawler that would read certain pages from a website or a document and would extract the phone numbers from the text blocks. It sounds like a challenging problem but luckily, the Phonenumbers library provides us just the functionality we need, with the PhoneNumberMatcher(text, region)
method.
PhoneNumberMatcher
takes a text block and a region as an argument then iterates over to return the matching results as PhoneNumberMatch
objects.
Let's use PhoneNumberMatcher
with a random text:
import phonenumbers
text_block = "Our services will cost about 2,200 USD and we will deliver the product by the 10.10.2021. For more information, you can call us at +44 7986 123456 or send an e-mail to [email protected]"
for match in phonenumbers.PhoneNumberMatcher(text_block, "GB"):
print(match)
This will print the matching phone numbers along with their index in the string:
PhoneNumberMatch [131,146) +44 7986 123456
You may have noticed that our number is formatted in the standardized international format and divided by the spaces. This may not always be the case in real-life scenarios. You may receive your number in other formats, like divided by dashes or formatted to the national (instead of the international) format.
Let's put the PhoneNumberMatcher()
method to the test with other phone number formats:
import phonenumbers
text_block = "Our services will cost about 2,200 USD and we will deliver the product by the 10.10.2021. For more information you can call us at +44-7986-123456 or 020 8366 1177 send an e-mail to [email protected]"
for match in phonenumbers.PhoneNumberMatcher(text_block, "GB"):
print(match)
This would output:
PhoneNumberMatch [130,145) +44-7986-123456
PhoneNumberMatch [149,162) 020 8366 1177
Even though the phone numbers are embedded deep into the text with a variety of formats with other numbers, PhoneNumberMatcher
successfully returns the phone numbers with great accuracy.
Apart from extracting data from the text, we might also want to get the digits one by one from the user. Imagine that your app's UI works similar to modern mobile phones, and formats the phone numbers as you type in. For instance, on your web page, you might want to pass the data to your API with each onkeyup
event and use AsYouTypeFormatter()
to format the phone number with each incoming digit.
Since UI part is out of the scope of this article, we'll use a basic example for AsYouTypeFormatter
. To simulate on-the-fly formatting, let's jump into the Python interpreter:
>>> import phonenumbers
>>> formatter = phonenumbers.AsYouTypeFormatter("TR")
>>> formatter.input_digit("3")
'3'
>>> formatter.input_digit("9")
'39'
>>> formatter.input_digit("2")
'392'
>>> formatter.input_digit("2")
'392 2'
>>> formatter.input_digit("2")
'392 22'
>>> formatter.input_digit("1")
'392 221'
>>> formatter.input_digit("2")
'392 221 2'
>>> formatter.input_digit("3")
'392 221 23'
>>> formatter.input_digit("4")
'392 221 23 4'
>>> formatter.input_digit("5")
'392 221 23 45'
Not all user input happens as they type. Some forms have simple text input fields for phone numbers. However, that doesn't necessarily mean that we'll have data entered in a standard format.
The Phonenumbers library got us covered here too with the format_number()
method. This method allows us to format phone numbers into three well-known, standardized formats. National, International, and E164. National and International formats are pretty self-explanatory, while the E164 format is an international phone number format that ensures phone numbers are limited with 15 digits and are formatted {+}{country code}{number with area code}. For more information on E164, you can check this Wikipedia page.
Let's start with the national formatting:
import phonenumbers
my_number = phonenumbers.parse("+40721234567")
national_f = phonenumbers.format_number(my_number, phonenumbers.PhoneNumberFormat.NATIONAL)
print(national_f)
This will return a nicely spaced phone number string with the national format:
0721 234 567
Now let's try to format the national number as in international format:
import phonenumbers
my_number = phonenumbers.parse("0721234567", "RO") # "RO" is ISO Alpha-2 code for Romania
international_f = phonenumbers.format_number(my_number, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
print(international_f)
The above code will return a nicely spaced phone number string:
+40 721 234 567
Notice that we passed "RO"
as the second parameter into the parse()
method. Since the input number is a national number, it has no country code prefix to hint at the country. In these cases, we need to specify the country with its ISO Alpha-2 code to get an accurate result. Excluding either the numeric and ISO Alpha-2 country codes, will cause an exception of NumberParseException: (0) Missing or invalid default region.
.
Now let's try the E164
formatting option. We'll pass a national string as the input:
import phonenumbers
my_number = phonenumbers.parse("0721234567", "RO")
e164_f=phonenumbers.format_number(my_number, phonenumbers.PhoneNumberFormat.E164)
print(e164_f)
The output will be very similar to the PhoneNumberFormat.INTERNATIONAL
, except with the spaces:
+40721234567
This is very useful when you want to pass the number to a background API. It isn't uncommon for APIs to expect phone numbers to be non-spaced strings.
Get Additional Information on Phone Number
A phone number is loaded with data about a user that could be of interest to you. You may want to use different APIs or API endpoints depending on the carrier of the particular phone number since this plays a role in the product cost. You might want to send your promotion notifications depending on your customer's (phone number's) timezone so that you don't send them a message in the middle of the night. Or you might want to get information about the phone number's location so that you can provide relevant information. The Phonenumbers library provides the necessary tools to fulfill these needs.
To start with the location, we will use the description_for_number()
method from the geocoder
class. This method takes in a parsed phone number and a short language name as parameters.
Let's try this with our previous fake number:
import phonenumbers
from phonenumbers import geocoder
my_number = phonenumbers.parse("+447986123456")
print(geocoder.description_for_number(my_number, "en"))
This will print out the origin country of the phone number:
United Kingdom
Short language names are pretty intuitive. Let's try to get output in Russian:
import phonenumbers
from phonenumbers import geocoder
my_number = phonenumbers.parse("+447986123456")
print(geocoder.description_for_number(my_number, "ru"))
And here's the output which says the United Kingdom in Russian:
Соединенное Королевство
You can try it out with other languages of your preferences like "de", "fr", "zh", etc.
As mentioned before, you might want to group your phone numbers by their carriers, since in most cases it will have an impact on the cost. To clarify, the Phonenumbers library probably will provide most of the carrier names accurately, but not 100%.
Today in most countries it is possible to get your number from one carrier and later on move the same number to a different carrier, leaving the phone number exactly the same. Since Phonenumbers is merely an offline Python library, it is not possible to detect these changes. So it's best to approach the carrier names as a reference, rather than a fact.
We will use the name_for_number()
method from carrier
class:
import phonenumbers
from phonenumbers import carrier
my_number = phonenumbers.parse("+40721234567")
print(carrier.name_for_number(my_number, "en"))
This will display the original carrier of the phone number if possible:
Vodafone
Note: As it is mentioned in the original documents of the Python Phonenumbers, carrier information is available for mobile numbers in some countries, not all.
Another important piece of information about a phone number is its timezone. The time_zones_for_number()
method will return a list of timezones that the number belongs to. We'll import it from phonenumbers.timezone
:
import phonenumbers
from phonenumbers import timezone
my_number = phonenumbers.parse("+447986123456")
print(timezone.time_zones_for_number(my_number))
This will print the following timezones:
('Europe/Guernsey', 'Europe/Isle_of_Man', 'Europe/Jersey', 'Europe/London')
This concludes our tutorial on Python Phonenumbers.
Conclusion
We learned how to parse phone numbers with parse()
method, extract numbers from text blocks with PhoneNumberMatcher()
, get the phone numbers digit by digit and format it with AsYouTypeFormatter()
, use different validation methods with is_possible_number()
and is_possible_number()
, format numbers using NATIONAL
, INTERNATIONAL
, and E164
formatting methods, and extract additional information from the phone numbers using geocoder
, carrier
, and timezone
classes.
Remember to check out the original GitHub repo of the Phonenumbers library. Also if you have any questions in mind, feel free to comment below.