Skip to content

Crawler Stuck or Over Limit

Fix issues when Discovery/Ingestion seems stuck or you hit plan limits.

What’s happening

  • Large sites can take time to process.
  • Some pages are blocked (robots.txt, login).
  • Your plan limits may prevent more pages from being learned.

Quick checks

  • Keep the dashboard tab open during Ingestion.
  • Test your site is reachable in a normal browser tab.
  • Check your plan capacity in Plan limits and upgrades.

Steps

  1. Confirm Discovery finished

    • After “Add entire website,” you see a confirmation screen with Pages Found and your limits.
    • If you haven’t clicked Confirm, Ingestion won’t start.
  2. Check job progress

    • In Train Voice Agent, watch the Knowledge Base entries change from “processing” to “ready.”
    • Very large sites may take longer.
  3. Reduce scope if needed

    • Cancel the running job.
    • Restart with a smaller scope (begin at your main site URL and ensure internal links are clear).
    • Add critical pages first; add more later.
  4. Handle plan limits

  5. Retry unreachable pages

    • Some pages may be blocked or require login. Only public pages are included.
    • For one page, use Scrape a single page.

What you should see

  • New entries appearing in the Knowledge Base with status “processing,” then “ready.”
  • If you reduced scope, the job should complete faster.

Tips

  • Start from your homepage (https://example.com) so Discovery sees your main navigation.
  • Add high‑value pages first: FAQ, pricing, services.
  • For frequent updates, refresh only the pages that changed.

Troubleshooting

  • Discovery found more pages than allowed
    • On the confirmation screen, reduce the selection, or upgrade your plan.
  • Job won’t start
    • Make sure you clicked Confirm after Discovery.
  • Entries never become “ready”
    • Cancel and retry with fewer pages; check that the pages are publicly accessible.
  • Crawl completes instantly with a robots.txt warning and no pages learned
    • If the crawl card shows a red warning about robots.txt and the job says it completed but zero pages were processed, your site’s robots.txt blocked all candidate URLs.
    • To resolve this:
      • Open your robots.txt file in a browser (for example, https://example.com/robots.txt).
      • Check for Disallow rules that block the sections you want your agent to learn from.
      • Update robots.txt to allow crawling of those pages, then start a new crawl.
      • Alternatively, start from a different URL that is allowed by robots.txt, or use Scrape a single page for just a few important pages.
    • In the onboarding wizard, this state keeps the Next button disabled on the “Learning in progress” step until at least some pages can be learned.
    • For more details, see Crawl your website.

Next steps

© 2025 Babelbeez. All rights reserved.