Why the sitemap may differ across scans
Last updated: June 8, 2026
Introduction
If you've noticed your sitemap showing more or fewer endpoints between scans, this is expected behavior in most cases. This guide explains the common reasons for sitemap variation and what you can do to improve consistency.
Common Reasons for Differences
1. Application Changes
The most straightforward reason — if your application changed between scans, the sitemap will reflect that. This includes:
Pages or endpoints added, removed, or modified
Navigation structure updates
Feature flags toggling visibility of certain sections
2. Data-Dependent Pages
Some pages only appear when specific data exists. For example:
A "Track Order" page that only appears when an active order exists
Refund pages that only appear after a completed order
User-specific dashboards that vary by account state
3. Login Configuration Issues
Most dashboard-style applications sit behind authentication. If login isn't configured correctly, the scanner only sees public pages. Common causes:
Expired credentials or tokens
Login flow changes since the last recording
User role with restricted permissions missing certain sections
4. Different Scan Configurations
Comparing sitemaps from scans with different settings will naturally produce different results. Changes that affect sitemap output include:
Subdomain crawl settings
URL exclusion rules
Custom headers or authentication methods
5. Scan Type Differences
Different scan types crawl at different depths:
Always compare sitemaps from the same scan type for meaningful results.
6. Environmental Factors
Temporary conditions during the scan can prevent the crawler from reaching certain pages:
Server slowdowns or timeouts
Firewall or WAF returning unexpected responses
Cookie consent banners blocking content
Captcha challenges triggered by scanner activity
How We Ensure Consistency
Astra takes several steps to reduce unnecessary variation:
Central Endpoint Inventory — All endpoints discovered across scans are stored cumulatively, not just per scan
Authentication Health Checks — Login failures are flagged in your dashboard for quick resolution
Automatic Retries — The crawler retries failed attempts using exponential backoff to handle temporary network issues
Tips to Improve Consistency
Keep scan configurations (login, scope, exclusions) identical across runs
Use consistent test data so data-dependent pages are always accessible
Review credentials and user roles assigned to your scans regularly
Use the same scan type when comparing sitemaps
Troubleshooting
Sitemap is significantly smaller than expected
Check if login recording is working correctly by reviewing the scan progress on the Continuous Scan Details page.
Confirm that user credentials haven't expired under Target Settings → User Roles.
New endpoints not appearing after deployment
Trigger a fresh Web Crawl scan to update the endpoint inventory.
Alternatively, manually mark endpoints as changed under API & Web Endpoints.
Sitemap varies significantly between identical scan configurations
Check server health and response times during the scan window.
Review whether a WAF or Cloudflare is intermittently blocking scanner requests. See How to scan web applications protected with CAPTCHA.