kentavr009 Feb 12 at 14:42

Developing a Python Script — Geetest CAPTCHA solver: How to bypass Geetest 4 and any others

Easy

13 min

1.5K

Python*

Review

Translation

Original author: Alex

Introduction or why Geetest CAPTCHA is not like the new Haval?

These days, Chinese products and services have seeped into nearly every niche. Sure, when someone mentions a Chinese development, you might chuckle and be reminded of those 90’s internet gems like “Glasses, do you need ‘em?”—and honestly, not much has changed. Even DeepSeek ended up being neither truly deep nor entirely seek. Yet, there’s something they’ve perfected, which leaves many SEO optimizers weeping salty tears while trying to bypass the Geetest CAPTCHA.

Officially defined, Geetest CAPTCHA is a state-of-the-art protection system employed by various web services to fend off automated requests. At its core lies a dynamic slider puzzle, where users must drag a fragment of an image into the correct slot.

I got curious about this CAPTCHA and decided to delve deeper into its intricacies—exploring the hidden pitfalls one might encounter when attempting to bypass it, and sharing some tips on what to watch out for when crafting your own CAPTCHA solver.

A quick heads-up: I’ll be bypassing Geetest CAPTCHA exclusively through a CAPTCHA solving service (I personally rely on 2captcha).

How Geetest CAPTCHA Works and why bypass GeeTest not equal bypass reCAPTCHA

Geetest CAPTCHA is a two-layered defense system—the visual slider puzzle plus an intricate backend analysis:

Dynamic Image Generation:
Every request triggers the server to generate a unique background featuring a “hole” and a matching puzzle piece. This ever-changing setup makes pre-prepared solutions a non-starter.
Interactive Slider:The user is tasked with dragging the puzzle piece until it perfectly aligns with the missing segment. In the process, the system meticulously logs:
- The final position of the puzzle piece.
- The movement trajectory of the slider.
- The time intervals between each action.

The second layer involves collecting and analyzing behavioral data—not as a separate module but interwoven throughout the CAPTCHA interaction. It tracks how you move your mouse, the nuances of your drag-and-drop, even the slightest cursor tremors that you might not consciously notice.

And finally, there’s server-side validation. Once you’ve finished dragging, your browser sends all the movement data to the server, which then checks it against expected parameters. This multi-layered approach makes it exceedingly difficult for bots to mimic human behavior and complicates any attempts to automate the bypass.

While this technology is characteristic of Geetest version 4 (GeeTest V4), its older sibling, Geetest v3, lacked an “invisible” mode and featured a simpler behavioral analysis. In essence, regardless of whether it’s version 3 or 4, Geetest CAPTCHAs are no walk in the park for automation enthusiasts—and they’re even tougher nuts to crack than reCAPTCHA (thankfully, Geetest hasn’t spread much in Europe).

The Nuances of Geetest CAPTCHA: What’s the Catch and why GeeTest bypass is not enough easy task?

When dealing with reCAPTCHA, you simply need to locate specific static parameters on the page, send them off to a CAPTCHA solving service, and wait for the answer. The static nature of these parameters makes the process relatively straightforward. Sure, other factors might complicate things, but overall, fetching and transmitting those static values is as plain as day.

With Geetest CAPTCHA, however, things aren’t so black and white. It’s a blend of the reliable static and the ever-shifting dynamic parameters that need to be retrieved afresh every time the CAPTCHA changes.

Let’s break it down with examples from Geetest v3 and v4:

Geetest v3

Static Parameters Required for Proper Recognition:
- websiteURL: The URL of the page hosting the CAPTCHA.
- gt: A server-provided value.
Dynamic Parameter:
- challenge: Generated upon page load (this must be freshly retrieved every time, or else the CAPTCHA will be invalid).

Geetest v4

Instead of separate gt and challenge parameters, Geetest v4 bundles them into an initParameters object, which must include a captcha_id—the configuration identifier for the site’s CAPTCHA.

Technically, it all seems straightforward—but remember, these parameters aren’t hardcoded in the HTML; they’re generated only once you start interacting with the CAPTCHA. This means that besides finding the necessary parameters, you must also simulate user actions, which might tip off GeeTest’s detectors. Consequently, using proxies often becomes essential.

In short, every additional requirement adds another layer of complexity. I’ll be testing the bypass on a demo page provided by the service; it’s not overly complex there, and everything should work smoothly without proxies—but keep in mind, in real-world scenarios, you might need them.

Preparing for Implementation GeeTest CAPTCHA solver

After our brief technical deep-dive, it’s time to get down to brass tacks: how do we bypass Geetest CAPTCHA?

What You’ll Need:

Python 3:
Head over to python.org, download the installer for your operating system, and follow the instructions (make sure to check the option to add Python to PATH).
pip Package Manager:
pip usually comes bundled with Python. To verify, open a command prompt (or terminal) and run:

pip --version

Required Python Libraries: requests and selenium:
These libraries are essential:

requests: For sending HTTP requests to the 2Captcha API.
selenium: For controlling Chrome and automating browser interactions.
Install them via:

pip install requests selenium

ChromeDriver:
ChromeDriver is a separate application that allows Selenium to drive Google Chrome.
Steps to Install:

Determine your Chrome version (go to “About Chrome” in your browser).
Download the corresponding version of ChromeDriver from the official site.
Extract the archive and place the chromedriver executable in a folder that's in your system’s PATH, or specify its location in your Selenium settings, e.g.,

driver = webdriver.Chrome(executable_path='/путь/до/chromedriver', options=options)

API Key for the Geetest CAPTCHA Solving Service:
You’ll need this key shortly—stay tuned for where it fits into the script.

Now, let’s take a look at the complete script. Following that, I’ll walk you through its functionality and what each part does.

# Replace with your actual 2Captcha API key
API_KEY = "INSERT_YOUR_API_KEY"

# 2Captcha API endpoints
CREATE_TASK_URL = "https://api.2captcha.com/createTask"
GET_TASK_RESULT_URL = "https://api.2captcha.com/getTaskResult"

def extract_geetest_v3_params(html):
    """
    Attempt to extract parameters for GeeTest V3 (gt and challenge) from HTML.
    (Used if the parameters are available in the page source)
    """
    gt_match = re.search(r'["\']gt["\']\s*:\s*["\'](.*?)["\']', html)
    challenge_match = re.search(r'["\']challenge["\']\s*:\s*["\'](.*?)["\']', html)
    gt = gt_match.group(1) if gt_match else None
    challenge = challenge_match.group(1) if challenge_match else None
    return gt, challenge

def extract_geetest_v4_params(html):
    """
    Extracts captcha_id for GeeTest V4 from HTML.
    Looks for a string in the form: captcha_id=<32 hexadecimal characters>
    If extra characters are found after captcha_id, they are discarded.
    """
    match = re.search(r'captcha_id=([a-f0-9]{32})', html)
    if match:
        return match.group(1)
    match = re.search(r'captcha_id=([^&"\']+)', html)
    if match:
        captcha_id_raw = match.group(1)
        captcha_id = captcha_id_raw.split("<")[0]
        return captcha_id.strip()
    return None

def get_geetest_v3_params_via_requests(website_url):
    """
    For the GeeTest V3 demo page, return static parameters as specified in the examples
    (PHP, Java, Python). This prevents errors where split() might return the entire HTML.
    """
    gt = "f3bf6dbdcf7886856696502e1d55e00c"
    challenge = "12345678abc90123d45678ef90123a456b"
    return gt, challenge

def auto_extract_params(website_url):
    """
    If the URL contains "geetest-v4", work with V4 (using Selenium to extract captcha_id).
    If the URL contains "geetest" (without -v4), assume it is GeeTest V3 and use static parameters via GET.
    Returns a tuple: (driver, version, gt, challenge_or_captcha_id)
    """
    if "geetest-v4" in website_url:
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        time.sleep(3)
        try:
            wait = WebDriverWait(driver, 10)
            element = wait.until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "#embed-captcha .gee-test__placeholder"))
            )
            driver.execute_script("arguments[0].click();", element)
            time.sleep(5)
        except Exception as e:
            print("Error loading V4 widget:", e)
        html = driver.page_source
        captcha_id = extract_geetest_v4_params(html)
        return driver, "4", None, captcha_id
    elif "geetest" in website_url:
        # For the GeeTest V3 demo page, use static parameters
        gt, challenge = get_geetest_v3_params_via_requests(website_url)
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        return driver, "3", gt, challenge
    else:
        return None, None, None, None

def create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V3 using the 2Captcha API.
    Required parameters: websiteURL, gt, challenge.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "gt": gt,
        "challenge": challenge
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V4 using the 2Captcha API.
    Required parameters: websiteURL, version (4) and initParameters with captcha_id.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "version": 4,
        "initParameters": {
            "captcha_id": captcha_id
        }
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def get_task_result(task_id, retry_interval=5, max_retries=20):
    """
    Poll the 2Captcha API until a result is obtained.
    """
    payload = {
        "clientKey": API_KEY,
        "taskId": task_id
    }
    for i in range(max_retries):
        response = requests.post(GET_TASK_RESULT_URL, json=payload)
        result = response.json()
        if result.get("status") == "processing":
            print(f"Captcha not solved yet, waiting... {i+1}")
            time.sleep(retry_interval)
        else:
            return result
    return {"errorId": 1, "errorDescription": "Timeout waiting for solution."}

def main():
    parser = argparse.ArgumentParser(
        description="Solve GeeTest CAPTCHA using 2Captcha API with automatic parameter extraction"
    )
    parser.add_argument("--website-url", required=True, help="URL of the page with the captcha")
    # Optional parameters for using a proxy
    parser.add_argument("--proxy-type", help="Proxy type (http, socks4, socks5)")
    parser.add_argument("--proxy-address", help="Proxy server IP address")
    parser.add_argument("--proxy-port", type=int, help="Proxy server port")
    parser.add_argument("--proxy-login", help="Proxy login (if required)")
    parser.add_argument("--proxy-password", help="Proxy password (if required)")
    args = parser.parse_args()

    proxyless = True
    proxy_details = {}
    if args.proxy_type and args.proxy_address and args.proxy_port:
        proxyless = False
        proxy_details = {
            "proxyType": args.proxy_type,
            "proxyAddress": args.proxy_address,
            "proxyPort": args.proxy_port
        }
        if args.proxy_login:
            proxy_details["proxyLogin"] = args.proxy_login
        if args.proxy_password:
            proxy_details["proxyPassword"] = args.proxy_password

    print("Loading page:", args.website_url)
    driver, version, gt, challenge_or_captcha_id = auto_extract_params(args.website_url)
    if driver is None or version is None:
        print("Failed to load page or extract parameters.")
        return

    print("Detected GeeTest version:", version)
    if version == "3":
        if not gt or not challenge_or_captcha_id:
            print("Failed to extract gt and challenge parameters for GeeTest V3.")
            driver.quit()
            return
        print("Using parameters for GeeTest V3:")
        print("gt =", gt)
        print("challenge =", challenge_or_captcha_id)
        create_response = create_geetest_v3_task(
            website_url=args.website_url,
            gt=gt,
            challenge=challenge_or_captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    elif version == "4":
        captcha_id = challenge_or_captcha_id
        if not captcha_id:
            print("Failed to extract captcha_id for GeeTest V4.")
            driver.quit()
            return
        print("Using captcha_id for GeeTest V4:", captcha_id)
        create_response = create_geetest_v4_task(
            website_url=args.website_url,
            captcha_id=captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    else:
        print("Unknown version:", version)
        driver.quit()
        return

    if create_response.get("errorId") != 0:
        print("Error creating task:", create_response.get("errorDescription"))
        driver.quit()
        return

    task_id = create_response.get("taskId")
    print("Task created. Task ID:", task_id)
    print("Waiting for captcha solution...")
    result = get_task_result(task_id)
    if result.get("errorId") != 0:
        print("Error retrieving result:", result.get("errorDescription"))
        driver.quit()
        return

    solution = result.get("solution")
    print("Captcha solved. Received solution:")
    print(json.dumps(solution, indent=4))

    # Inject the received data into the page
    if version == "3":
        # For GeeTest V3, expected fields: challenge, validate, seccode
        js_script = """
        function setOrUpdateInput(id, value) {
            var input = document.getElementById(id);
            if (!input) {
                input = document.createElement('input');
                input.type = 'hidden';
                input.id = id;
                input.name = id;
                document.getElementById('geetest-demo-form').appendChild(input);
            }
            input.value = value;
        }
        setOrUpdateInput('geetest_challenge', arguments[0]);
        setOrUpdateInput('geetest_validate', arguments[1]);
        setOrUpdateInput('geetest_seccode', arguments[2]);
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">' +
            'Captcha successfully solved!<br>' +
            'challenge: ' + arguments[0] + '<br>' +
            'validate: ' + arguments[1] + '<br>' +
            'seccode: ' + arguments[2] +
            '</div>';
        """
        challenge_sol = solution.get("challenge")
        validate_sol = solution.get("validate")
        seccode_sol = solution.get("seccode")
        driver.execute_script(js_script, challenge_sol, validate_sol, seccode_sol)
    elif version == "4":
        js_script = """
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">GeeTest V4 captcha successfully solved!</div>';
        """
        driver.execute_script(js_script)
    
    print("Solution injected into page. The browser will remain open for 30 seconds for visual verification.")
    time.sleep(30)
    driver.quit()

if __name__ == "__main__":
    main()#!/usr/bin/env python3
import re
import time
import json
import argparse
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Geetest solver Code Overview

What Does the Script of GeeTest solver Do?

Importing Libraries & Defining Constants:
- Module Imports:
  The script pulls in standard Python modules—re for regex magic, time for delays, json for data formatting, and argparse for command-line arguments. It also uses requests for HTTP communication and Selenium for automating Chrome.
- Constants:
  You’ll find an API key for 2Captcha (API_KEY), and URLs for creating tasks (CREATE_TASK_URL) and fetching results (GET_TASK_RESULT_URL). These constants are pivotal for interacting with the 2Captcha service.
Functions to Extract CAPTCHA Parameters:
- extract_geetest_v3_params(html):
  Scans the HTML for the gt and challenge values needed for GeeTest V3 using regex. If found, it returns these values.
- extract_geetest_v4_params(html):
  Digs through the HTML to extract the captcha_id for GeeTest V4, first attempting to locate a 32-character hexadecimal string, and then resorting to an alternate pattern if needed.
- auto_extract_params:
  This function grabs the URL and determines which version of GeeTest to use:
  - GeeTest V4:
    If the URL contains “geetest-v4”, it:
    - Launches Chrome with GPU disabled and sandbox mode off.
    - Loads the page.
    - Waits for the element #embed-captcha.gee-test__placeholder to appear.
    - Clicks the element to kickstart the CAPTCHA load.
    - Waits a few seconds for the widget to appear.
    - Retrieves the page’s HTML and extracts captcha_id via extract_geetest_v4_params.
    - Returns the Selenium driver, version “4”, a placeholder for gt (None), and the extracted captcha_id.
  - GeeTest V3:
    If the URL contains “geetest” (but not “geetest-v4”), it:
    - Retrieves static parameters (gt and challenge) via get_geetest_v3_params_via_requests.
    - Initializes Chrome with similar settings.
    - Loads the page.
    - Returns the driver, version “3”, and the values for gt and challenge.
  - If neither condition is met, it returns None for all values.
Creating Tasks to Solve CAPTCHA via 2Captcha API:
- create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None):
  Forms a JSON package with the task type—“GeeTestTaskProxyless” if not using a proxy, or “GeeTestTask” otherwise. The package includes the page URL, gt, and challenge. Optionally, proxy details can be added. It then sends a POST request to the 2captcha API and returns the JSON response.
- create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None):
  Works similarly for GeeTest V4 by specifying version 4 and including an initParameters dictionary containing the captcha_id.
- get_task_result:
  This function periodically posts to the 2captcha API (using GET_TASK_RESULT_URL), passing in the task ID and API key. If the status returned is “processing”, it prints a status message and waits (typically 5 seconds) before retrying. Once a final result is returned—or if it times out—it passes the outcome back.
The Main Function:
- Command-Line Argument Parsing:
  Uses argparse to capture the required --website-url (the page hosting the CAPTCHA) and optional proxy parameters (--proxy-type, --proxy-address, --proxy-port, --proxy-login, --proxy-password).
- Proxy Setup:
  If proxy parameters are supplied, the script sets proxyless to False and builds a proxy_details dictionary. Without proxies, proxyless remains True.
- Extracting CAPTCHA Parameters:
  It prints a message about loading the page, then calls auto_extract_params to retrieve:
  - The Selenium driver (to control the browser).
  - The CAPTCHA version (“3” or “4”).
  - For V3: the gt and challenge values; for V4: the captcha_id. If these aren’t successfully obtained, an error is reported and the script halts.
- Creating a CAPTCHA Solving Task:
  Depending on the version:
  - For GeeTest V3:
    It verifies that gt and challenge are present, prints them, and calls create_geetest_v3_task.
  - For GeeTest V4:
    It checks for captcha_id, prints its value, and calls create_geetest_v4_task. If the version isn’t recognized, the script exits.
- Handling the 2Captcha Response:
  Should the API return an error (checked via errorId), the script prints an error message, closes the browser, and terminates. If successful, it prints the task ID and starts polling for the CAPTCHA solution using get_task_result.
- Fetching and Displaying the CAPTCHA Solution:
  Once the solution is received, it is printed as a nicely formatted JSON.
- Injecting the Solution into the Page via JavaScript:
  Using driver.execute_script, the script:
  - For GeeTest V3:
    Creates or updates hidden form fields (geetest_challenge, geetest_validate, and geetest_seccode) with the solution values, and updates the content of the #embed-captcha element to show a success message along with the parameters.
  - For GeeTest V4:
    Simply replaces the content of #embed-captcha with a success notification.
- Delay and Cleanup:
  After injecting the solution, the script waits 30 seconds (giving you time to visually verify the result) before closing the browser.

Soon, I plan to test SolveCaptcha too—just to see how well they handle recognition. In this way, the script is designed so you can actually see the CAPTCHA being solved in real time. I’ve even recorded a screen capture to demonstrate its performance.

Conclusion

In this article, I’ve taken a deep dive into the inner workings of Geetest CAPTCHA, showing that—even with modest programming skills (is Python really “programming”?)—it’s possible to bypass this robust security mechanism. However, be prepared to pay close attention to every necessary parameter; you might end up wrestling with an ever-changing Challenge for hours, just as I did.

Hubs:

Python