JSFeeds: stackabuse.com - The Python Requests Module

Friday, 3 August, 2018 UTC

The Python Requests Module

Summary

Introduction

Dealing with HTTP requests is not an easy task in any programming language. If we talk about Python, it comes with two built-in modules, urllib and urllib2, to handle HTTP related operation. Both modules come with a different set of functionalities and many times they need to be used together. The main drawback of using urllib is that it is confusing (few methods are available in both urllib, urllib2), the documentation is not clear and we need to write a lot of code to make even a simple HTTP request.

To make these things simpler, one easy-to-use third-party library, known as Requests, is available and most developers prefer to use it instead or urllib/urllib2. It is an Apache2 licensed HTTP library powered by urllib3 and httplib.

Installing the Requests Module

Installing this package, like most other Python packages, is pretty straight-forward. You can either download the Requests source code from Github and install it or use pip:

$ pip install requests

For more information regarding the installation process, refer to the official documentation.

To verify the installation, you can try to import it like below:

import requests

If you don't receive any errors importing the module, then it was successful.

Making a GET Request

GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. Let me start with a simple example first. Suppose we want to fetch the content of the home page of our website and print out the resultin HTML data. Using the Requests module, we can do it like below:

import requests

r = requests.get('https://api.github.com/events')  
print(r.content)

It will print the response in an encoded form. If you want to see the actual text result of the HTML page, you can read the .text property of this object. Similarly, the status_code property prints the current status code of the URL:

import requests

r = requests.get('https://api.github.com/events')  
print(r.text)  
print(r.status_code)

requests will decode the raw content and show you the result. If you want to check what type of encoding is used by requests, you can print out this value by calling .encoding. Even the type of encoding can be changed by changing its value. Now isn't that simple?

Reading the Response

The response of an HTTP request can contain many headers that holds different information.

httpbin is a popular website to test different HTTP operation. In this article, we will use httpbin/get to analyse the response to a GET request. First of all, we need to find out the response header and how it looks. You can use any modern web-browser to find it, but for this example, we will use Google's Chrome browser.

In Chrome, open the URL http://httpbin.org/get, right click anywhere on the page, and select the "Inspect" option
This will open a new window within your browser. Refresh the page and click on the "Network" tab.
This "Network" tab will show you all different types of network requests made by the browser. Click on the "get" request in the "Name" column and select the "Headers" tab on right.

The content of the "Response Headers" is our required element. You can see the key-value pairs holding various information about the resource and request. Let's try to parse these values using the requests library:

import requests

r = requests.get('http://httpbin.org/get')  
print(r.headers['Access-Control-Allow-Credentials'])  
print(r.headers['Access-Control-Allow-Origin'])  
print(r.headers['CONNECTION'])  
print(r.headers['content-length'])  
print(r.headers['Content-Type'])  
print(r.headers['Date'])  
print(r.headers['server'])  
print(r.headers['via'])

We retrieved the header information using r.headers and we can access each header value using specific keys. Note that the key is not case-sensitive.

Similarly, let's try to access the response value. The above header shows that the response is in JSON format: (Content-type: application/json). The Requests library comes with one built-in JSON parser and we can use requests.get('url').json() to parse it as a JSON object. Then the value for each key of the response results can be parsed easily like below:

import requests

r = requests.get('http://httpbin.org/get')

response = r.json()  
print(r.json())  
print(response['args'])  
print(response['headers'])  
print(response['headers']['Accept'])  
print(response['headers']['Accept-Encoding'])  
print(response['headers']['Connection'])  
print(response['headers']['Host'])  
print(response['headers']['User-Agent'])  
print(response['origin'])  
print(response['url'])

The above code will print the below output:

{'headers': {'Host': 'httpbin.org', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Accept': '*/*', 'User-Agent': 'python-requests/2.9.1'}, 'url': 'http://httpbin.org/get', 'args': {}, 'origin': '103.9.74.222'}
{}
{'Host': 'httpbin.org', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Accept': '*/*', 'User-Agent': 'python-requests/2.9.1'}
*/*
gzip, deflate  
close  
httpbin.org  
python-requests/2.9.1  
103.9.74.222  
http://httpbin.org/get

Third line, i.e. r.json(), printed the JSON value of the response. We have stored the JSON value in the variable response and then printed out the value for each key. Note that unlike the previous example, the key-value is case sensitive.

Similar to JSON and text content, we can use requests to read the response content in bytes for non-text requests using the .content property. This will automatically decode gzip and deflate encoded files.

Passing Parameters in GET

In some cases, you'll need to pass parameters along with your GET requests, which take the form of query strings. To do this, we need to pass these values in the params parameter, as shown below:

import requests

payload = {'user_name': 'admin', 'password': 'password'}  
r = requests.get('http://httpbin.org/get', params=payload)

print(r.url)  
print(r.text)

Here, we are assigning our parameter values to the payload variable, and then to the GET request via params. The above code will return the following output:

http://httpbin.org/get?password=password&user_name=admin  
{"args":{"password":"password","user_name":"admin"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"origin":"103.9.74.222","url":"http://httpbin.org/get?password=password&user_name=admin"}

As you can see, the Reqeusts library automatically turned our dictionary of parameters to a query string and attached it to the URL.

Note that you need to be careful what kind of data you pass via GET requests since the payload is visible in the URL, as you can see in the output above.

Making POST Requests

HTTP POST requests are opposite of the GET requests as it is meant for sending data to a server as opposed to retrieving it. Although, POST requests can also receive data within the response, just like GET requests.

Instead of using the get() method, we need to use the post() method. For passing an argument, we can pass it inside the data parameter:

import requests

payload = {'user_name': 'admin', 'password': 'password'}  
r = requests.post("http://httpbin.org/post", data=payload)  
print(r.url)  
print(r.text)

Output:

http://httpbin.org/post  
{"args":{},"data":"","files":{},"form":{"password":"password","user_name":"admin"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"33","Content-Type":"application/x-www-form-urlencoded","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"json":null,"origin":"103.9.74.222","url":"http://httpbin.org/post"}

The data will be "form-encoded" by default. You can also pass more complicated header requests like a tuple if multiple values have same key, a string instead of a dictionary, or a multipart encoded file.

Sending Files with POST

Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different form-fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples, like below:

import requests

url = 'http://httpbin.org/post'  
file_list = [  
    ('image', ('image1.jpg', open('image1.jpg', 'rb'), 'image/png')),
    ('image', ('image2.jpg', open('image2.jpg', 'rb'), 'image/png'))
]

r = requests.post(url, files=file_list)  
print(r.text)

The tuples containing the files' information are in the form (field_name, file_info).

Other HTTP Request Types

Similar to GET and POST, we can perform other HTTP requests like PUT, DELETE, HEAD, and OPTIONS using the requests library, like below:

import requests

requests.put('url', data={'key': 'value'})  
requests.delete('url')  
requests.head('url')  
requests.options('url')

Handling Redirections

Redirection in HTTP means forwarding the network request to a different URL. For example, if we make a request to "http://www.github.com", it will redirect to "https://github.com" using a 301 redirect.

import requests

r = requests.post("http://www.github.com")  
print(r.url)  
print(r.history)  
print(r.status_code)

Output:

https://github.com/  
[<Response [301]>, <Response [301]>]
200

As you can see the redirection process is automatically handled by requests, so you don't need to deal with it yourself. The history property contains the list of all response objects created to complete the redirection. In our example, two Response objects were created with the 301 response code. HTTP 301 and 302 responses are used for permanent and temporary redirection, respectively.

If you don't want the Requests library to automatically follow redirects, then you can disable it by passing the allow_redirects=False parameter along with the request.

Handling Timeouts

Another important configuration is telling our library how to handle timeouts, or requests that take too long to return. We can configure requests to stop waiting for a network requests using the timeout parameter. By default, requests will not timeout. So, if we don't configure this property, our program may hang indefinitely, which is not the functionality you'd want in a process that keeps a user waiting.

import requests

requests.get('http://www.google.com', timeout=1)

Here, an exception will be thrown if the server will not respond back within 1 second (which is still aggressive for a real-world application). To get this to fail more often (for the sake of an example), you need to set the timeout limit to a much smaller value, like 0.001.

The timeout can be configured for both the "connect" and "read" operations of the request using a tuple, which allows you to specify both values separately:

import requests

requests.get('http://www.google.com', timeout=(5, 14))

Here, the "connect" timeout is 5 seconds and "read" timeout is 14 seconds. This will allow your request to fail much more quicklly if it can't connect to the resource, and if it does connect then it will give it more time to download the data.

Cookies and Custom Headers

We have seen previously how to access headers using the headers property. Similarly, we can access cookies from a response using the cookies property.

For example, the below example shows how to access a cookie with name cookie_name:

import requests

r = requests.get('http://www.examplesite.com')  
r.cookies['cookie_name']

We can also send custom cookies to the server by providing a dictionary to the cookies parameter in our GET request.

import requests

custom_cookie = {'cookie_name': 'cookie_value'}  
r = requests.get('http://www.examplesite.com/cookies', cookies=custom_cookie)

Cookies can also be passed in a Cookie Jar object. This allows you to provide cookies for a different path.

import requests

jar = requests.cookies.RequestsCookieJar()  
jar.set('cookie_one', 'one', domain='httpbin.org', path='/cookies')  
jar.set('cookie_two', 'two', domain='httpbin.org', path='/other')

r = requests.get('https://httpbin.org/cookies', cookies=jar)  
print(r.text)

Output:

{"cookies":{"cookie_one":"one"}}

Similarly, we can create custom headers by assigning a dictionary to the request header using the headers parameter.

import requests

custom_header = {'user-agent': 'customUserAgent'}

r = requests.get('https://samplesite.org', headers=custom_header)

The Session Object

The session object is mainly used to persist certain parameters, like cookies, across different HTTP requests. A session object may use a single TCP connection for handling multiple network requests and responses, which results in performance improvement.

import requests

first_session = requests.Session()  
second_session = requests.Session()

first_session.get('http://httpbin.org/cookies/set/cookieone/111')  
r = first_session.get('http://httpbin.org/cookies')  
print(r.text)

second_session.get('http://httpbin.org/cookies/set/cookietwo/222')  
r = second_session.get('http://httpbin.org/cookies')  
print(r.text)

r = first_session.get('http://httpbin.org/anything')  
print(r.text)

Output:

{"cookies":{"cookieone":"111"}}

{"cookies":{"cookietwo":"222"}}

{"args":{},"data":"","files":{},"form":{},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Cookie":"cookieone=111","Host":"httpbin.org","User-Agent":"python-requests/2.9.1"},"json":null,"method":"GET","origin":"103.9.74.222","url":"http://httpbin.org/anything"}

The httpbin path /cookies/set/{name}/{value} will set a cookie with name and value. Here, we set different cookie values for both first_session and second_session objects. You can see that the same cookie is returned in all future network requests for a specific session.

Similarly, we can use the session object to persist certain parameters for all requests.

import requests

first_session = requests.Session()

first_session.cookies.update({'default_cookie': 'default'})

r = first_session.get('http://httpbin.org/cookies', cookies={'first-cookie': '111'})  
print(r.text)

r = first_session.get('http://httpbin.org/cookies')  
print(r.text)

Output:

{"cookies":{"default_cookie":"default","first-cookie":"111"}}

{"cookies":{"default_cookie":"default"}}

As you can see, the default_cookie is sent with each requests of the session. If we add any extra parameter to the cookie object, it appends to the default_cookie. "first-cookie": "111" is append to the default cookie "default_cookie": "default"

Using Proxies

The proxies argument is used to configure a proxy server to use in your requests.

http = "http://10.10.1.10:1080"  
https = "https://10.10.1.11:3128"  
ftp = "ftp://10.10.1.10:8080"

proxy_dict = {  
  "http": http,
  "https": https,
  "ftp": ftp
}

r = requests.get('http://sampleurl.com', proxies=proxy_dict)

The requests library also supports SOCKS proxies. This is an optional feature and it requires the requests[socks] dependency to be installed before use. Like before, you can install it using pip:

$ pip install requests[socks]

After the installation, you can use it as shown here:

proxies = {  
  'http': 'socks5:user:pass@host:port'
  'https': 'socks5:user:pass@host:port'
}

SSL Handling

We can also use the Requests library to verify the HTTPS certificate of a website by passing verify=true with the request.

import requests

r = requests.get('https://www.github.com', verify=True)

This will throw an error if there is any problem with the SSL of the site. If you don't want to verity, just pass False instead of True. This parameter is set to True by default.

Downloading a File

For downloading a file using requests, we can either download it by streaming the contens or directly downloading the entire thing. The stream flag is used to indicate both behaviors.

As you probably guessed, if stream is True, then requests will stream the content. If stream is False, all content will be downloaded to the memory bofore returning it to you.

For streaming content, we can iterate the content chunk by chunk using the iter_content method or iterate line by line using iter_line. Either way, it will download the file part by part.

For example:

import requests

r = requests.get('https://cdn.pixabay.com/photo/2018/07/05/02/50/sun-hat-3517443_1280.jpg', stream=True)  
downloaded_file = open("sun-hat.jpg", "wb")  
for chunk in r.iter_content(chunk_size=256):  
    if chunk:
        downloaded_file.write(chunk)

The code above will download an image from Pixabay server and save it in a local file, sun-hat.jpg.

We can also read raw data using the raw property and stream=True in the request.

import requests

r = requests.get("http://exampleurl.com", stream=True)  
r.raw

For downloading or streaming content, iter_content() is the prefered way.

Errors and Exceptions

requests throws different types of exception and errors if there is ever a network problem. All exceptions are inherited from requests.exceptions.RequestException class.

Here is a short description of the common erros you may run in to:

ConnectionError exception is thrown in case of DNS failure,refused connection or any other connection related issues.
Timeout is raised if a request times out.
TooManyRedirects is raised if a request exceeds the maximum number of predefined redirections.
HTTPError exception is raised for invalid HTTP responses.

For a more complete list and description of the exceptions you may run in to, check out the documentation.

Conclusion

In this tutorial I explained to you many of the features of the requests library and the various ways to use it. You can use requests library not only for interacting with a REST API, but it can be used equally as well for scraping data from a website or to download files from the web.

Modify and try the above examples and drop a comment below if you have any question regarding requests.

... more @ stackabuse.com

stackabuse.com