Hm Product New Arrivals Scraper fetches the latest H&M new arrivals in a clean, structured JSON format you can plug into dashboards, price trackers, and product discovery tools. It solves the annoying problem of manually checking updates by turning new arrivals into an API-style dataset. If you need reliable H&M new arrivals data for analysis or automation, this scraper keeps it simple and fast.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for hm-product-new-arrivals you've just found your team — Let’s Chat. 👆👆
This project retrieves H&M product new arrivals by page and country, then returns normalized product data including pricing, availability, images, and color variants. It solves the problem of consistently collecting fresh catalog updates without manual browsing or brittle workflows. It’s built for developers, analysts, and e-commerce teams who need repeatable product monitoring, feeds, or research datasets.
- Pulls paginated new arrivals with predictable request/response structure.
- Supports market-specific results via
countryCode(localized catalog). - Returns both product-level and variant (swatch/article) details.
- Designed for bulk collection using pagination and rate-limited requests.
- Includes robust handling for common failures (network, invalid input, empty results).
| Feature | Description |
|---|---|
| Paginated new arrivals | Fetch new arrivals by page and perPage for efficient browsing and bulk collection. |
| Market localization | Use countryCode to retrieve region-specific H&M catalog results. |
| Normalized JSON output | Returns consistent fields for products, pricing, images, and availability. |
| Variant-aware data | Includes color/variant swatches with article IDs and images for each option. |
| Resilient extraction | Built-in retry, rate limiting, and graceful handling for empty/invalid responses. |
| Bulk-friendly usage | Works well with queues, schedulers, and cache layers for high-volume monitoring. |
| Field Name | Field Description |
|---|---|
| requestDateTime | ISO timestamp for when the dataset was generated. |
| pagination.currentPage | Current page number returned in the response. |
| pagination.nextPageNum | Next page number if available. |
| pagination.totalPages | Total number of pages available for the query. |
| numberOfHits | Total number of matching items available across pages. |
| products[].id | Primary product identifier. |
| products[].productName | Product display name/title. |
| products[].brandName | Brand label (typically H&M). |
| products[].url | Product page URL. |
| products[].external | Indicates whether the product is external to the catalog feed. |
| products[].trackingId | Tracking token for catalog navigation/analytics contexts. |
| products[].showPriceMarker | Flag indicating whether price marker UI hints are present. |
| products[].prices[] | Pricing objects for the product (type, numeric values, formatted string). |
| products[].prices[].priceType | Price category (e.g., whitePrice). |
| products[].prices[].price | Primary numeric price. |
| products[].prices[].minPrice | Minimum price where ranges apply. |
| products[].prices[].maxPrice | Maximum price where ranges apply. |
| products[].prices[].formattedPrice | Human-readable price string with currency. |
| products[].availability.stockState | Stock status (e.g., Available). |
| products[].availability.comingSoon | Indicates whether the item is marked as coming soon. |
| products[].iswatches[] | Variant/swatches array (often per color/article). |
| products[].iswatches[].articleId | Variant/article identifier. |
| products[].iswatches[].colorName | Variant color name. |
| products[].iswatches[].colorCode | Variant color code (hex-like). |
| products[].iswatches[].productImage | Variant image URL. |
| products[].images[] | Additional image URLs for the product. |
| products[].hasVideo | Indicates whether the product listing includes video. |
| products[].colorName | Primary color name for the listed item. |
| products[].colors | Primary color code(s). |
| products[].colourShades | Color shade metadata when available. |
| products[].productImage | Primary product image URL. |
| products[].newArrival | Indicates whether the item is flagged as a new arrival. |
| products[].isOnline | Indicates whether the product is available online. |
| products[].isPreShopping | Indicates whether the item is in pre-shopping state. |
| products[].isLiquidPixelUrl | Flag related to tracking pixel URL usage. |
| products[].colorWithNames | Composite color mapping metadata. |
| products[].mainCatCode | Main category code for classification/filtering. |
| products[].productMarkers | Marker badges/promotions list (when present). |
| products[].percentageDiscount | Discount string when applicable (often empty for new arrivals). |
{
"requestDateTime": "2025-01-09T12:57:16.135Z",
"pagination": {
"currentPage": 1,
"nextPageNum": 2,
"totalPages": 52
},
"products": [
{
"id": "1223910004",
"productName": "Loose-fit Shacket",
"brandName": "H&M",
"url": "https://www2.hm.com/en_us/productpage.1223910004.html",
"prices": [
{
"priceType": "whitePrice",
"price": 44.99,
"minPrice": 44.99,
"maxPrice": 44.99,
"formattedPrice": "$ 44.99"
}
],
"availability": {
"stockState": "Available",
"comingSoon": false
},
"iswatches": [
{
"articleId": "1223910004",
"colorName": "Dark denim blue",
"colorCode": "4C5164",
"productImage": "https://image.hm.com/assets/hm/ad/d3/add305fc32f87c395b9192c0306e87f889df8c98.jpg"
}
],
"images": [
{ "url": "https://image.hm.com/assets/hm/11/88/11887f50e50e0226d2e222588ea090f47f5f403f.jpg" }
],
"newArrival": true,
"mainCatCode": "ladies_jacketscoats_jackets"
}
],
"numberOfHits": 723
}
Hm Product New Arrivals/
├── src/
│ ├── index.js
│ ├── runner.js
│ ├── clients/
│ │ ├── httpClient.js
│ │ └── rateLimiter.js
│ ├── extractors/
│ │ ├── newArrivalsFetcher.js
│ │ └── responseNormalizer.js
│ ├── validators/
│ │ └── inputSchema.js
│ ├── utils/
│ │ ├── logger.js
│ │ ├── retry.js
│ │ └── sanitize.js
│ └── outputs/
│ ├── toJson.js
│ └── sampleOutput.json
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── tests/
│ ├── unit/
│ │ ├── inputSchema.test.js
│ │ └── normalizer.test.js
│ └── integration/
│ └── newArrivalsFetcher.test.js
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
├── LICENSE
└── README.md
- E-commerce analysts use it to track daily H&M new arrivals, so they can spot assortment shifts and pricing trends early.
- Affiliate marketers use it to auto-refresh storefront listings, so they can promote newly launched products faster.
- Retail researchers use it to collect market-localized catalog snapshots, so they can compare regions and seasonality.
- Product teams use it to feed recommendation prototypes with fresh inventory, so they can test discovery experiences with real data.
- Data engineers use it to populate pipelines and warehouses, so they can power dashboards and alerting on new drops.
How do I run a basic request?
Create an input payload with page, perPage, and countryCode, then run the script entry point. Example input:
- page: "1"
- perPage: "14"
- countryCode: "en_US"
The output is a JSON object containing pagination, products, and numberOfHits.
What are the input parameters and which ones are required? All three inputs are required:
page: page number to fetch.perPage: number of results per page (max 100).countryCode: locale/market identifier (example:en_US).
What limits should I be aware of for bulk collection?
- Maximum 100 results per page.
- Rate limiting should be applied for large runs to avoid server overload.
- For full-catalog monitoring, iterate through
pagination.totalPageswith caching and backoff.
Why might I receive an empty product list? Common causes include:
- Invalid or unsupported
countryCode. - Page number outside the available range.
- Temporary network issues or upstream throttling.
- No new arrivals available for the requested market at that moment.
Primary Metric: Typical retrieval speed of 1 page (14–50 items) in ~0.8–1.6 seconds under normal network conditions with conservative throttling.
Reliability Metric: ~98–99% successful page fetch rate when using retries (2–3 attempts) plus exponential backoff on transient failures.
Efficiency Metric: Sustains ~35–70 products/minute in continuous collection mode (depending on perPage, throttling, and image-heavy responses) while keeping memory usage stable through streaming serialization.
Quality Metric: Product records are consistently complete for core fields (ID, name, URL, price, availability, main image), with variant coverage typically matching the listing swatches provided for each product.