Pull to refresh

Solving Amazon captcha Waf Captcha fully automatically with almost zero experience in development

Level of difficultyEasy
Reading time8 min
Views509
Original author: Alex Gerasimchuk

A small epigraph - if you are making an instruction, then do it to the end, otherwise instructions on how to solve the Amazon captcha for junior developer will be as clear as mud.

What's it all about? When I needed to solve a captcha from Amazon, the notorious Waf Captcha, I started looking for information at a service that I constantly use when I work with GSA Ranker and some other services (2captchas - it’s a pity Habr bans articles for referral links).

I found instructions there and posted the link to it above. As you probably understood from the epigraph, I didn’t understand a thing, or rather, I understood that I needed to use the API, but that’s all...

It was much easier with Selenium

The main issue is the short timeout given for a solution from Amazon's side. The time to solve the captcha is limited, and if there's no response, the captcha refreshes (two of its parameters get updated - iv and context)

It turns out the captcha freshness timeout is about 30 seconds, and in that time, you need to find the parameters on the page, copy them, paste them into the script code, and run it. After that, 2captcha should solve it and return the correct answer. I tried to do this for a couple of fruitless hours, developing a routine of actions, but alas, searching for and replacing the changing parameters takes at least 12-15 seconds, leaving only 15 to 18 seconds for the captcha to be solved by the service, which in current realities sounds quite fantastical.

We need a different approach here, the script should search for and insert the parameters, but how can someone who's never dealt with anything more complex than Ahrefs in their life write it? That's exactly why I think instructions like those mentioned in the article need to be more detailed than just "Just use the API, are you completely nuts?"

Solution

After all the solution was found, and it took me about 3 hours. Let me tell you how a junior developer can solve the Amazon captcha fully automatically.

You'll need GPT Chat, and in my case, video recognition for Amazon captcha (you won't need this, as I'll give you the ready-made file).

I got the video from a friend who is a programmer, but he did not allow me to make it public, as it wasn't anonymized.

However, you'll keep nagging me for proofs, and then I'll end up on antidepressants because no one believes me, so I reproduced this video at the very end, when I got the final script, and will happily attach it at the end of this text, as a demonstration of my endless dedication to the audience!

So, let’s do it step by step.

I took the video, made 3 screenshots, uploaded them to GPT Chat, and asked it to transcribe this code into text for me.

The video contained several files but showed only the contents of 2 of them - index.js and inject.js - which is where I started.

I won't bore you with the details of deciphering the screenshots (it was a bit of a hassle to compile the code from the screenshots into one unit but in the end, I got these two pieces of code for the two files:

// index.js
import { launch } from 'puppeteer'
import { Captcha } from '2captcha-ts'
import { readFileSync } from 'fs'

const solver = new Captcha(process.env.APIKEY)

const target = 'URL of website where you faced CAPTCHA'

const example = async () => {
  const browser = await launch({
    headless: false,
    devtools: true
  })

  const [page] = await browser.pages()

  const preloadFile = readFileSync('./inject.js', 'utf8')
  await page.evaluateOnNewDocument(preloadFile)

  // Here we intercept the console messages to catch the message logged by inject.js script
  page.on('console', async (msg) => {
    const txt = msg.text()
    if (txt.includes('intercepted-params:')) {
      const params = JSON.parse(txt.replace('intercepted-params:', ''))

      const wafParams = {
        pageurl: target,
        sitekey: params.key,
        iv: params.iv,
        context: params.context,
        challenge_script: params.challenge_script,
        captcha_script: params.captcha_script
      }
      console.log(wafParams)

      try {
        console.log('Solving the captcha...')
        const res = await solver.solveRecaptchaV2(wafParams)
        console.log(`Solved the captcha ${res.id}`)
        console.log(res)
        console.log('Using the token...')
        await page.evaluate(token => {
          window.localStorage.inputCaptchaToken(token)
        }, res.data.captcha.voucher)
        console.log(e)
      } catch (e) {
        console.log(e)
      }
    }
  })

  await page.goto(target)
  // Additional code to interact with the page after captcha is solved might be here...
}

example()

And the second file, inject.js

console.clear = () => console.log('Console was cleared')

const i = setInterval(() => {
  if (window.CaptchaScript) {
    clearInterval(i)

    let params = gokProps

    Array.from(document.querySelectorAll('script')).forEach(s => {
      const src = s.getAttribute('src')
      if (src && src.includes('captcha.js')) params.captcha_script = src
      if (src && src.includes('challenge.js')) params.challenge_script = src
    })

    console.log('intercepted-params: ' + JSON.stringify(params))
  }
}, 5)

I asked the Chat how to make the code work, and I was advised to use the standard command.

node index.js

But I've been around these internet streets for a long time, and I understand that the code won't just work like that if the necessary libraries aren't installed on the computer. My neuro-consultant helped me out here too.

After examining the code, I gave him in the form of screenshots, he recommended installing the following packages:

puppeteer

2captcha-ts

Honestly, I didn't quite understand why playwright is needed here, but who am I to doubt the competence of the Chat.

As such, the installation code is npm install puppeteer 2captcha-ts playwright (though I couldn't install them all together using VS Code, probably because of my clumsy hands, I installed them one by using the console).

npm install puppeteer

npm install 2captcha-ts

Next is where it gets interesting, because everything up to this point I know one way or another, but the next block is a real mystery to me, and I just do what my neuro-consultant tells me.

So, for the code to work correctly, a file with the .env extension was needed. It can be unnamed, but it should contain the following data:

APIKEY=your_2captcha_api_key

It is understood that instead of the "your_2captcha_api_key" parameter, I inserted my own key from the service.

There were 6 more recommendations on what to do to make the code work, but...

So, the first run and the first error occurred.

The error was related to using syntax ES6 import in Node.js. To use it, you either need to specify the module type in package.json or change the file extension to .mjs.

I was too lazy to specify the module type in package.json, so I took the path of least resistance and simply renamed the file extension from .js to .mjs.

The next error was related to the imported package 2captcha-ts, but I don't want to go into detail about it. I'll just say that I changed the code responsible for running this module several times, added logging, and checked the correctness of connecting the .env file.

Still, it didn't work. The script kept throwing errors and stubbornly refused to run. I compared the screenshot and the code extracted from it by the Chat and found a small typo, which turned out to be the root of the problem.

As it turned out, even that didn't help - an error occurred saying APIKEY is not defined in .env file

We fixed this by adding a new package:

dotenv

npm install dotenv

After that, I encountered a couple more typos, which the Chat tried to solve by changing the code structure, but I just needed to carefully compare the screenshot and the code itself.

In the end, the code of the index.js file looked like this:

// index.js
import { launch } from 'puppeteer'
import { Captcha } from '2captcha-ts'
import { readFileSync } from 'fs'

const solver = new Captcha(process.env.APIKEY)

const target = 'URL of website where you faced CAPTCHA'

const example = async () => {
  const browser = await launch({
    headless: false,
    devtools: true
  })

  const [page] = await browser.pages()

  const preloadFile = readFileSync('./inject.js', 'utf8')
  await page.evaluateOnNewDocument(preloadFile)

  // Here we intercept the console messages to catch the message logged by inject.js script
  page.on('console', async (msg) => {
    const txt = msg.text()
    if (txt.includes('intercepted-params:')) {
      const params = JSON.parse(txt.replace('intercepted-params:', ''))

      const wafParams = {
        pageurl: target,
        sitekey: params.key,
        iv: params.iv,
        context: params.context,
        challenge_script: params.challenge_script,
        captcha_script: params.captcha_script
      }
      console.log(wafParams)

      try {
        console.log('Solving the captcha...')
        const res = await solver.solveRecaptchaV2(wafParams)
        console.log(`Solved the captcha ${res.id}`)
        console.log(res)
        console.log('Using the token...')
        await page.evaluate(token => {
          window.localStorage.inputCaptchaToken(token)
        }, res.data.captcha.voucher)
        console.log(e)
      } catch (e) {
        console.log(e)
      }
    }
  })

  await page.goto(target)
  // Additional code to interact with the page after captcha is solved might be here...
}

example()

Turned into the following index.mjs

// index.js
import 'dotenv/config';
import { launch } from 'puppeteer'
import Captcha from '2captcha-ts';
import { readFileSync } from 'fs'

if (!process.env.APIKEY) {
    console.error("APIKEY is not defined in .env file");
    process.exit(1); // Terminate execution if key not found
}

const solver = new Captcha.Solver(process.env.APIKEY);

const target = 'URL of website where you faced CAPTCHA'

const example = async () => {
    const browser = await launch({
        headless: false,
        devtools: true
    })

    const [page] = await browser.pages()

    const preloadFile = readFileSync('./inject.js', 'utf8')
    await page.evaluateOnNewDocument(preloadFile)

    // Here we intercept the console messages to catch the message logged by inject.js script
    page.on('console', async (msg) => {
        const txt = msg.text()
        if (txt.includes('intercepted-params:')) {
            const params = JSON.parse(txt.replace('intercepted-params:', ''))

            const wafParams = {
                pageurl: target,
                sitekey: params.key,
                iv: params.iv,
                context: params.context,
                challenge_script: params.challenge_script,
                captcha_script: params.captcha_script
            }
            console.log(wafParams)

            try {
                console.log('Solving the captcha...')
                const res = await solver.amazonWaf(wafParams)
                console.log(`Solved the captcha ${res.id}`)
                console.log(res)
                console.log('Using the token...')
                await page.evaluate(async (token) => {
                    await ChallengeScript.submitCaptcha(token);
                    window.location.reload ()
                }, res.data.captcha_voucher);
            } catch (e) {
                console.log(e)
            }
        } else {
            return
        }
    })

    await page.goto(target)
    // Additional code to interact with the page after captcha is solved might be here...
}

example()

In the code above, don't forget to substitute the correct URL where the solution is needed (if you decide to apply it).

From this inject.js:

console.clear = () => console.log('Console was cleared')

const i = setInterval(() => {
  if (window.CaptchaScript) {
    clearInterval(i)

    let params = gokProps

    Array.from(document.querySelectorAll('script')).forEach(s => {
      const src = s.getAttribute('src')
      if (src && src.includes('captcha.js')) params.captcha_script = src
      if (src && src.includes('challenge.js')) params.challenge_script = src
    })

    console.log('intercepted-params: ' + JSON.stringify(params))
  }
}, 5)

Into this inject.js:

// console.clear = () => console.log('Console was cleared')

let counter = 0;
const MAX_ATTEMPTS = 2000; // Approximately 10 seconds at 5 ms intervals
console.log('Let's start searching for CaptchaScript...');

const i = setInterval(() => {
    console.log(`Attempt ${counter}: Checking for CaptchaScript...`);

    if (window.CaptchaScript || counter > MAX_ATTEMPTS) {
        clearInterval(i);
        if (!window.CaptchaScript) {
            console.log('CaptchaScript не найден');
        } else {
            console.log('CaptchaScript найден');

            let params = gokuProps;
            Array.from(document.querySelectorAll('script')).forEach(s => {
                const src = s.getAttribute('src');
                if (src && src.includes('captcha.js')) params.captcha_script = src;
                if (src && src.includes('challenge.js')) params.challenge_script = src;
            });

            console.log('intercepted-params: ' + JSON.stringify(params));
        }
    }
    counter++;
}, 5);

Finally, the script worked. In full. It solved the captcha and continues to solve it every time it runs.

Looks like the future has arrived, buddy. But this is not certain.

What to do with this information? Well, darn it, I really like the advice - just live with it now. But I don't feel like giving advice. Have it your way, maybe someone will find it useful for their projects...

Tags:
Hubs:
0
Comments0

Articles