vda19999 Oct 15 2023 at 08:52

How sqlalchemy uses greenlet to call an async Python function from a normal function

5 min

The Python language has two kind of functions — normal functions that you would use in most cases, and async functions. The latter functions are used when performing network IO in an asynchronous manner. The problem with this division is that async functions can only be called from other async functions. Normal functions, on the other hand, can be called from any functions — however, if you call a normal function that does a blocking operation from an async function, it will block the whole event loop and all your coroutines. These limitations usually mean that when writing an using Python`s asyncio, you can`t use any of the IO libraries that you use when writing a synchronous application, and vice versa, unless a library supports usage both in sync and async applications.

Now, the question is, in case you are developing a large and complex library, that, say, allows users to interact with relational databases, abstracting away (some of) the differences between the SQL syntax and other aspects of these databases, and abstracting away the differences between the drivers for that database, how do you support both sync and async usage of your library without duplicating the code of your library? The way sqlalchemy is organized is that regardless of what database and driver for it you are using, you will be calling functions and methods related to Engine, Connection, etc classes, which will do some general work independent of database, then apply the logic specific to your database and finally, call the functions of your database driver to actually communicate with the database. If you are using Python`s asyncio, the database driver will expose async functions and methods, but the rest of the library that is driver‑independent would ideally remain the same. However, the issue is that that you can`t call the async functions of the driver from the normal functions of the core of the library.

There are two solutions for this in pure Python:

Duplicate the code of your library: have the almost same code written in normal functions and in async functions. The issue with this is that, obviously, your code becomes almost twice larger and harder to maintain.
Mark all the inner functions that may call IO functions async. Then if the user isn`t using asyncio but needs to call an inner async function, start an event loop just for running this function. The drawback here is that the majority of users who don`t use asyncio, would have to pay the event loop creation and running overhead.

What is greenlet

greenlet is an alternative implementation of coroutines in Python. Unlike the coroutines in asyncio , which are in fact light threads (fibers in C++ boost terms, virtual threads in Java terms), greenlet coroutines (greenlets) don`t have a scheduler — and a greenlet only starts (continues) running if another greenlet explicitly switches to it (asyncio, in contrast, decides by itself, when to run each coroutine). An example from the documentation:

>>> from greenlet import greenlet

>>> def test1():
...     print("[gr1] main  -> test1")
...     gr2.switch()
...     print("[gr1] test1 <- test2")
...     return 'test1 done'

>>> def test2():
...     print("[gr2] test1 -> test2")
...     gr1.switch()
...     print("This is never printed.")

>>> gr1 = greenlet(test1)
>>> gr2 = greenlet(test2)
>>> gr1.switch()
[gr1] main  -> test1
[gr2] test1 -> test2
[gr1] test1 <- test2
'test1 done'

In the example above, we create two greenlets, executing two test functions, and then switch to the first greenlet. The first greenlet prints, then switches to the second, then prints again and returns a string.

How to use greenlet to call an async function from a normal function

The way sqlalchemy works in case you are using asyncio is the following:

You call an async function to perform some operation on the database
The async function calls the corresponding sync function from core sqlalchemy
After doing required work, the core sqlalchemy calls a function from the sqlalchemy wrapper (dialect) over the async database driver
The wrapper needs to call a function of the async driver, but it can`t do this directly, because it is a sync function.

With greenlet, this is how sqlalchemy solves the problem:

Rather than call an inner sync function directly, it creates a greenlet that will run this function, and switches to this greenlet.
When the driver wrapper (dialect) needs to await a coroutine (that is, call an async function), it switches back to the context of the function that created the original greenlet, and has that function await the result and switch back again.

It is easier to illustrate it with code (I have copied parts of slqalchemy that work with greenlet, but to make things easier, added a few functions of mine for illustration purposes):

import asyncio
import sys
from typing import Callable, Any, Awaitable, TypeVar

from greenlet import getcurrent, greenlet


async def driver_create_connection():
    """this is the driver-provided function for connection creation"""
    print("async executing")
    # simulating some IO
    await asyncio.sleep(1)
    print("async done")


def __create_connection_sync():
    """this is a function for connection creation on the core of the library
       used both by async and sync users"""
    print("sync starting")
    await_only(driver_create_connection())
    print("sync done")


async def create_database_connection_async():
    """this is a function that library users call"""
    # just call our inner function to execute the regular function for conn creation
    return await greenlet_spawn(__create_connection_sync)


class _AsyncIoGreenlet(greenlet):
    dead: bool

    def __init__(self, fn: Callable[..., Any], driver: greenlet):
        greenlet.__init__(self, fn, driver)
        self.driver = driver


async def greenlet_spawn(
    fn,
    *args,
    _require_await: bool = False,
    **kwargs,
):
    """Runs a sync function ``fn`` in a new greenlet.

    The sync function can then use :func:`await_only` to wait for async
    functions.

    :param fn: The sync callable to call.
    :param \\*args: Positional arguments to pass to the ``fn`` callable.
    :param \\*\\*kwargs: Keyword arguments to pass to the ``fn`` callable.
    """

    context = _AsyncIoGreenlet(fn, getcurrent())
    # runs the function synchronously in gl greenlet. If the execution
    # is interrupted by await_only, context is not dead and result is a
    # coroutine to wait. If the context is dead the function has
    # returned, and its result can be returned.
    try:
        # start executing the function
        result = context.switch(*args, **kwargs)
        while not context.dead:
            switch_occurred = True
            try:
                # wait for a coroutine from await_only and then return its
                # result back to it.
                value = await result
            except BaseException:
                # this allows an exception to be raised within
                # the moderated greenlet so that it can continue
                # its expected flow.
                result = context.throw(*sys.exc_info())
            else:
                result = context.switch(value)
    finally:
        # clean up to avoid cycle resolution by gc
        del context.driver
    return result

_T = TypeVar("_T")


def await_only(awaitable: Awaitable[_T]) -> _T:
    """Awaits an async function in a sync method.

    The sync method must be inside a :func:`greenlet_spawn` context.
    :func:`await_only` calls cannot be nested.

    :param awaitable: The coroutine to call.

    """
    # this is called in the context greenlet while running fn
    current = getcurrent()
    # switch back to the context of greenlet_spawn and provide it
    # a coroutine an await
    return current.driver.switch(awaitable)


if __name__ == "__main__":
    asyncio.run(create_database_connection_async())

This demonstrates that it is possible to make a sync function call an async function if the sync function itself is called from an async function. This can be useful in library code for avoiding code repetition when supporting both sync and async usage, but feels like a work-around and isn`t 100% effecient - as we have to switch contexts redundantly.

Hubs:

Python