Keeping sitemaps healthy is a mix of monitoring Google Search Console (GSC), diagnosing the specific error codes GSC reports, and validating that search bots can consistently reach your sitemap URLs. Use the following playbook whenever sitemap submissions fail or pages stay stuck in "Discovered – currently not indexed."
Step-by-step: Working Through GSC Sitemap Reports
- Open the Sitemaps report in GSC and note the submission date, processed status, and the total number of discovered URLs. Differences between submitted and discovered counts are the first signal of crawl issues.
- Expand the "Details" panel for the sitemap to review each error type. GSC groups errors into Coverage, Indexing, and Fetch categories—start with the one showing the highest affected URL count.
- Use the "Inspect URL" tool on a failing entry. The live test will confirm whether Googlebot can fetch the resource and which HTTP status code it receives. Capture the screenshot and HTML to compare with your origin server.
- Download the "Open report" CSV for large batches. Sort by URL path to see whether failures cluster around specific directories (e.g., localized folders or dynamically generated endpoints).
- Validate fixes with the "Test sitemap" button. After making corrections, rerun the test to confirm Google can fetch and parse the XML before resubmitting it for indexing.
Interpreting Common Sitemap Error Codes
| Error | Meaning | Action |
|---|---|---|
Couldn't fetch |
GSC could not retrieve the sitemap, usually because of DNS failures, TLS issues, or firewalls blocking Googlebot. | Check DNS propagation, review CDN firewall logs for Googlebot blocks, and ensure TLS certificates include the sitemap hostname. |
General HTTP error |
Any non-200 response while fetching the sitemap. | Inspect web server logs around the fetch timestamp, reproduce with curl -I, and confirm that caching/CDN rules do not rewrite the sitemap path. |
Invalid XML |
The file isn't well-formed XML or exceeds the size/URL limits. | Run xmllint locally, ensure the encoding is UTF-8, and split files larger than 50 MB or 50,000 URLs. |
Submitted URL not found (404) |
URLs listed in the sitemap return 404 during verification. | Audit CMS publishing workflows, restore missing content, or remove obsolete URLs from the sitemap. |
Blocked by robots.txt |
URLs are disallowed for Googlebot. | Align the sitemap and robots.txt policies (see guide)—either unblock the paths or stop listing them. |
Redirect error (3xx) |
URLs redirect instead of returning canonical 200 responses. | Update the sitemap to list final destination URLs only (see fix guide), and ensure redirects are not chained. |
Fixing HTTP Status Issues
- Capture the failing status with cURL.
curl -I https://example.com/sitemap.xmlconfirms the code, headers, and any intermediate redirects. - Compare origin vs. CDN behavior. Bypass the CDN by hitting the origin host directly to rule out caching or security appliances introducing 4xx/5xx statuses.
- Check server logs for the timestamp shown in GSC. Look for spikes in 500-series errors, timeouts, or rate limiting.
- Review authentication rules. Ensure Basic Auth, IP allowlists, or signed URLs are not required for the sitemap path.
- Retest after fixes using the GSC "Test sitemap" button and automated monitoring (e.g., a cron
curljob) to verify consistent 200 responses.
Verifying Access for Googlebot
- Robots.txt: Confirm the sitemap location is declared in
robots.txtand not disallowed elsewhere. - Firewall/CDN rules: Allow Googlebot IP ranges or user agents. Use server logs to verify repeated hits from
GooglebotorGooglebot-Imagewhen testing. - Authentication: Remove login requirements for sitemap URLs. If staging environments need protection, host public copies of the sitemap on accessible infrastructure.
- Network reachability: Run
tracerouteor use third-party uptime monitors from multiple regions to ensure the sitemap host responds globally.
Triage Flow
- Is the sitemap URL reachable (200) from an external network? If no, fix DNS/TLS/firewall issues first.
- Does GSC's "Test sitemap" still fail after reachability is restored? If yes, focus on XML validity and size limits.
- Are individual URLs failing in Coverage reports? Inspect affected URLs with the URL Inspection tool and resolve their HTTP status or robots blocking.
- Do fixes stay in place for 24 hours? Monitor logs and uptime; if intermittent, introduce alerting and rate limits to prevent overload.
- Resubmit the sitemap and monitor discovery counts for 48–72 hours to confirm recovery.
Following this structured approach keeps sitemap issues contained and makes it obvious where to focus engineering effort: transport (network), document validity (XML rules), or URL-level health (HTTP/robots).