Skip to content

truanjsimran/dm-product-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

DM Product Scraper

A powerful tool for collecting structured product data from DM-drogeriemarkt sites across all supported countries. It extracts clean, well-organized product details including GTIN, brand, pricing, availability, and ratings. Built for developers, analysts, and automation workflows requiring accurate retail product data at scale.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for DM Product Scraper you've just found your team — Let’s Chat. 👆👆

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for dm-product-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

The DM Product Scraper helps you gather comprehensive product information from DM-drogeriemarkt websites across multiple countries. It streamlines extracting GTINs, prices, categories, brand names, ratings, and detailed metadata from both category URLs and individual product pages. This solution is ideal for data engineers, e-commerce analysts, automation builders, and organizations that need fresh, structured retail product datasets.

Multi-Country Product Intelligence

  • Supports all DM operating countries including Germany, Austria, Croatia, Poland, Romania, Italy, Slovenia, Serbia, and more.
  • Extracts product metadata from both filtered and unfiltered category listings.
  • Provides deep product details when scraping single product URLs.
  • Respects robots.txt rules by automatically skipping disallowed /search URLs.
  • Ensures stable, predictable results with enforced per-URL scraping limits for large categories.

Features

Feature Description
Multi-country support Scrapes product data from every official DM website domain.
Category & filtered scraping Collect product listings from category pages with optional filters.
Single product extraction Retrieves full product metadata including descriptions, pricing details, and breadcrumbs.
Robots.txt-compliant logic Automatically ignores disallowed /search URLs to ensure safe operation.
Structured JSON output Provides clean, well-labeled fields ready for analytics or ingestion pipelines.
Large-scale friendly Supports up to 1,000 results per category URL with no limit on total data collected across multiple URLs.

What Data This Scraper Extracts

Field Name Field Description
website DM country domain where the product was found.
gtin GTIN-13 (EAN) product identifier.
dan Internal DM article number.
brand Official product brand name.
title Product title as displayed on the website.
category Product’s primary category or subcategory.
image Main product image URL.
priceCurrency Currency code (EUR, RSD, etc.).
price Current product price as a numeric value.
url Canonical product page URL.
ratingValue Average user rating.
ratingCount Number of user reviews.
additionalData Detailed pricing info, taxes, unit price, categories, accessibility labels, or extra metadata.
breadcrumbs Hierarchical category navigation path.
description Full product description in text format.

Example Output

Example:

    {
      "website": "dm.de",
      "gtin": 8411061041673,
      "dan": 1598845,
      "brand": "CAROLINA HERRERA",
      "title": "Good Girl Eau de Parfum, 30 ml",
      "category": "Damen Parfum",
      "image": "https://products.dm-static.com/images/f_auto,q_auto,c_fit,h_320,w_320/v1747524668/assets/pas/images/ec835906-2a35-4d8e-9cf2-4b0f2c798485/carolina-herrera-good-girl-eau-de-parfum",
      "priceCurrency": "EUR",
      "price": 59.95,
      "url": "https://www.dm.de/carolina-herrera-good-girl-eau-de-parfum-p8411061041673.html",
      "ratingValue": 4.9,
      "ratingCount": 10,
      "additionalData": {
        "categories": ["Damen Parfum"]
      }
    }


    {
      "website": "dm.de",
      "gtin": 4005900917171,
      "dan": 1442173,
      "brand": "NIVEA",
      "title": "Gesichtscreme in der Dose, 250 ml",
      "category": "Bodylotion & Hautcreme",
      "image": "https://products.dm-static.com/images/f_auto,q_auto,c_fit,h_440,w_500/v1755095906/assets/pas/images/290af9fc-c152-4bd1-a528-6d11d13ffc47/nivea-gesichtscreme-in-der-dose",
      "priceCurrency": "EUR",
      "price": 3.65,
      "url": "https://www.dm.de/nivea-gesichtscreme-in-der-dose-p4005900917171.html",
      "ratingValue": 4.8229,
      "ratingCount": 638
    }

Directory Structure Tree

    DM Product Scraper/
    ├── src/
    │   ├── main.py
    │   ├── parsers/
    │   │   ├── category_parser.py
    │   │   ├── product_parser.py
    │   │   └── utils_format.py
    │   ├── services/
    │   │   ├── request_client.py
    │   │   └── robots_checker.py
    │   └── config/
    │       └── settings.example.json
    ├── data/
    │   ├── sample_category.json
    │   └── sample_product.json
    ├── requirements.txt
    └── README.md

Use Cases

  • E-commerce analysts use it to compare pricing and availability across DM countries, enabling better market insights.
  • Data engineers automate product catalog updates to power internal dashboards and monitoring systems.
  • Retail intelligence teams collect large-scale product metadata for trend analysis and category performance research.
  • Brand owners track how their products appear across regions, including ratings and pricing variations.
  • Marketing teams gather structured product attributes for content generation, competitive research, and demand planning.

FAQs

Q1: Can I scrape more than 1,000 products from a category? Yes — the 1,000-item limit applies only per individual category URL. You can scrape unlimited total results by using multiple subcategory or filtered URLs.

Q2: Why are /search URLs skipped automatically? These URLs are disallowed in DM’s robots rules. The scraper respects these rules to ensure safe, compliant data collection.

Q3: Does country selection affect the data returned? Yes. Each country store may have different prices, availability, and product assortment. The scraper captures data accurately per domain.

Q4: Can this scraper handle filters such as brand or sort order? Yes. It supports any category URL with applied filters, as long as the URL is allowed by the website’s rules.


Performance Benchmarks and Results

Primary Metric: Processes 300–500 product items per minute on average, depending on category complexity and domain.

Reliability Metric: Maintains a 98%+ successful extraction rate across all supported DM country domains.

Efficiency Metric: Optimized request batching ensures low overhead and consistent throughput even when processing multiple large URLs.

Quality Metric: Delivers near-complete product metadata with high precision in fields like GTIN, price, and category attribution — averaging over 97% completeness in test runs.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★