4 minute read

Python, known for its readability and versatility, has become a staple in various domains, including web development, data science, and automation. However, its Global Interpreter Lock (GIL) presents challenges for CPU-bound multithreading. This is where asyncio comes into play, offering a powerful solution for concurrent, non-blocking I/O operations.

The Problem: Blocking I/O and the GIL

Traditional, synchronous programming in Python means that when a program encounters a blocking I/O operation (e.g., waiting for a network response, reading from a file), it pauses execution until that operation completes. While one task is waiting, the entire program idles. The GIL further complicates matters by allowing only one thread to hold control of the Python interpreter at any given time, even on multi-core systems.

Enter asyncio: Concurrency Done Right

asyncio is a library that provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives. It’s particularly well-suited for I/O-bound tasks where the program spends more time waiting for external operations than performing calculations. The article Async Python for Data Scientists (Laszlo Sragner) highlights this, noting that “async programming is designed for precisely the use cases where other machines do the majority of work through API calls.”

Key Concepts

  1. Event Loop: The heart of asyncio is the event loop, which manages and schedules the execution of coroutines. Think of it as a conductor orchestrating the concurrent execution of different tasks. Sragner describes it as follows:

    “Instead of running the commands directly, in async programming, we start an “eventloop” (read: asyncio.run(main)). The code is defined as “coroutines” (read: async def instead of just def), and execution happens by calling a coroutine and then waiting for the result to return (read: await). While we are awaiting, the execution is handed back to the event loop handler, which can start another coroutine; then, we await that as well.”

  2. Coroutines: Defined using the async and await keywords, coroutines are special functions that can pause their execution and hand control back to the event loop, allowing other coroutines to run.
import asyncio

async def my_coroutine():
    print("Coroutine started")
    await asyncio.sleep(1)  # Simulate an I/O-bound operation
    print("Coroutine finished")

async def main():
    await my_coroutine()

asyncio.run(main())
  1. async and await: The async keyword declares a coroutine, while await is used inside a coroutine to pause execution until an awaitable object (another coroutine, a future, or a task) completes. It is imperative to only call awaitable objects within an async function.
  2. Tasks: Tasks are used to wrap coroutines for scheduling and execution within the event loop. asyncio.create_task() creates a Task object.
  3. Futures: Futures represent the result of an asynchronous operation that may not be available yet.

Practical Applications and Benefits

  • Web Servers and Clients: asyncio is ideal for building high-performance web servers and clients that can handle numerous concurrent connections without blocking.
  • Data Science and API Interactions: When building RAG-based GenAI pipelines, such as the one in the file good_bye_rag.txt, Python asyncio offers a great way to increase throughput. Laszlo Sragner states that “document processing for RAGs consists of: Getting the data (waiting for the documents to download); Chunking and embedding (waiting for the embedding model to return the vectors); Saving the data into a database and waiting for the transaction to complete; Waiting for summarisation; Waiting for structured extraction.
  • Concurrent API Calls: asyncio enables you to make multiple API requests concurrently, significantly reducing the overall execution time.
  • Real-time Applications: From chat servers to online games, asyncio facilitates the development of real-time applications that require handling many simultaneous connections.

Example: Concurrent API Requests

import asyncio
import aiohttp # For asynchronous HTTP requests

async def fetch_url(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
        "https://www.example.com",
        "https://www.python.org",
        "https://realpython.com"
    ]

    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks) # Run tasks concurrently

    for url, result in zip(urls, results):
        print(f"Content from {url}: {result[:50]}...")  # Print first 50 chars

asyncio.run(main())

This example uses the aiohttp library (an asynchronous HTTP client) to fetch the contents of multiple URLs concurrently. asyncio.gather() is used to run the coroutines concurrently, and the results are collected once all tasks are complete.

Moving from Synchronous to Asynchronous

  1. Identify I/O-Bound Operations: Pinpoint the sections of your code that involve waiting for external resources.
  2. Use Asynchronous Libraries: Replace synchronous libraries (e.g., requests) with their asynchronous counterparts (e.g., aiohttp).
  3. Refactor with async and await: Convert your code into coroutines and use await to handle asynchronous operations.

Conclusion

asyncio is a powerful tool for writing concurrent Python code, particularly for I/O-bound tasks. It allows you to maximise resource utilisation and build highly scalable and responsive applications. Understanding its core concepts and best practices is essential for any Python developer looking to unlock the full potential of concurrent programming. While the initial learning curve might seem steep, the benefits in terms of performance and efficiency are well worth the investment.