Why the sitemap may differ across scans

Last updated: June 8, 2026

Introduction

If you've noticed your sitemap showing more or fewer endpoints between scans, this is expected behavior in most cases. This guide explains the common reasons for sitemap variation and what you can do to improve consistency.

Common Reasons for Differences

1. Application Changes

The most straightforward reason — if your application changed between scans, the sitemap will reflect that. This includes:

  • Pages or endpoints added, removed, or modified

  • Navigation structure updates

  • Feature flags toggling visibility of certain sections

2. Data-Dependent Pages

Some pages only appear when specific data exists. For example:

  • A "Track Order" page that only appears when an active order exists

  • Refund pages that only appear after a completed order

  • User-specific dashboards that vary by account state

3. Login Configuration Issues

Most dashboard-style applications sit behind authentication. If login isn't configured correctly, the scanner only sees public pages. Common causes:

  • Expired credentials or tokens

  • Login flow changes since the last recording

  • User role with restricted permissions missing certain sections

4. Different Scan Configurations

Comparing sitemaps from scans with different settings will naturally produce different results. Changes that affect sitemap output include:

  • Subdomain crawl settings

  • URL exclusion rules

  • Custom headers or authentication methods

5. Scan Type Differences

Different scan types crawl at different depths:

Scan Type

Crawl Behavior

Web Crawl

Passive crawl only — collects visible endpoints

Full Scan

Crawl + fuzzing — may surface additional endpoints

Lightning Scan

Reduced crawl depth — focuses on high-priority areas

Emerging Threats

Targeted scan — not a full crawl

Always compare sitemaps from the same scan type for meaningful results.

6. Environmental Factors

Temporary conditions during the scan can prevent the crawler from reaching certain pages:

  • Server slowdowns or timeouts

  • Firewall or WAF returning unexpected responses

  • Cookie consent banners blocking content

  • Captcha challenges triggered by scanner activity

How We Ensure Consistency

Astra takes several steps to reduce unnecessary variation:

  • Central Endpoint Inventory — All endpoints discovered across scans are stored cumulatively, not just per scan

  • Authentication Health Checks — Login failures are flagged in your dashboard for quick resolution

  • Automatic Retries — The crawler retries failed attempts using exponential backoff to handle temporary network issues

Tips to Improve Consistency

  • Keep scan configurations (login, scope, exclusions) identical across runs

  • Use consistent test data so data-dependent pages are always accessible

  • Review credentials and user roles assigned to your scans regularly

  • Use the same scan type when comparing sitemaps

Troubleshooting

Sitemap is significantly smaller than expected

  • Check if login recording is working correctly by reviewing the scan progress on the Continuous Scan Details page.

  • Confirm that user credentials haven't expired under Target Settings → User Roles.

New endpoints not appearing after deployment

  • Trigger a fresh Web Crawl scan to update the endpoint inventory.

  • Alternatively, manually mark endpoints as changed under API & Web Endpoints.

Sitemap varies significantly between identical scan configurations

Next Steps