Skip to content

Immoscout: Bot detection/No captcha necessary #302

@phi1eas

Description

@phi1eas

Hi,

I am trying to run flathunter on immscout24 using imagetyperz. I run into the following issue:

$ pipenv run python3 flathunt.py
[2023/01/25 21:04:20|config.py               |INFO    ]: Using config path /home/max/flathunter/config.yaml
[2023/01/25 21:04:20|chrome_wrapper.py       |INFO    ]: Initializing Chrome WebDriver for crawler...
[2023/01/25 21:04:21|patcher.py              |INFO    ]: patching driver executable /home/max/.local/share/undetected_chromedriver/9418e1b60bf980e1_chromedriver
[2023/01/25 21:04:33|abstract_crawler.py     |INFO    ]: Timeout waiting for iframe element - no captcha verification necessary?
[2023/01/25 21:04:33|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window
[2023/01/25 21:04:33|crawl_immobilienscout.py|ERROR   ]: IS24 bot detection has identified our script as a bot - we've been blocked

What I think is weird is this: If I do not pass "--headless" as a driver_argument, a Chromium window opens. This window has the immoscout bot detection page loaded. If I copy the URL from that window, and open this URL in a new tab in Chromium, I get the same page, but this time with the Captcha.

Is this because immoscout24 classified me as a bot, or is there something else going on?

This is my config.yaml:

loop:
    active: yes
    sleeping_time: 600

urls:
  - https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?enteredFrom=one_step_search

filters:

blacklist:
  - Innenstadt

durations:
    - name: John
      destination: Hauptbahnhof, München
      modes: 
          - gm_id: transit
            title: "Öff."
          - gm_id: bicycling
            title: "Rad"
    - name: Jane
      destination: Karlsplatz, München
      modes: 
          - gm_id: transit
            title: "Öff."
          - gm_id: driving
            title: "Auto"

message: |
    {title}
    Zimmer: {rooms}
    Größe: {size}
    Preis: {price}
    Ort: {address}

    {url}

google_maps_api:
    key: YOUR_API_KEY
    url: https://maps.googleapis.com/maps/api/distancematrix/json?origins={origin}&destinations={dest}&mode={mode}&sensor=true&key={key}&arrival_time={arrival}
    enable: False

captcha:
     imagetyperz:
           token: 4B59D2B4CC6B4DE0AFC09D310F77D8CE
#       2captcha:
#             api_key: alskdjaskldjfklj
     driver_arguments:
       - "--no-sandbox"
       - "--disable-gpu"
       - "--remote-debugging-port=9222"
       - "--disable-dev-shm-usage"
       - "window-size=1024,768"

notifiers:
    - telegram
#     - mattermost
#     - apprise

telegram:
  bot_token: (censored)
  notify_with_images: true
  receiver_ids:
      - (censored)

Thank you so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions