Crawler Stuck or Over Limit

Fix issues when Discovery/Ingestion seems stuck or you hit plan limits.

What’s happening

Confirm Discovery finished
- After “Add entire website,” you see a confirmation screen with Pages Found and your limits.
- If you haven’t clicked Confirm, Ingestion won’t start.
Check job progress
- In Train Voice Agent, watch the Knowledge Base entries change from “processing” to “ready.”
- Very large sites may take longer.
Reduce scope if needed
- Cancel the running job.
- Restart with a smaller scope (begin at your main site URL and ensure internal links are clear).
- Add critical pages first; add more later.
Handle plan limits
- If over capacity, remove older sources in the Knowledge Base to free space, or upgrade your plan in the Customer Portal.
- See details in Plan limits and upgrades.
Retry unreachable pages
- Some pages may be blocked or require login. Only public pages are included.
- For one page, use Scrape a single page.

New entries appearing in the Knowledge Base with status “processing,” then “ready.”
If you reduced scope, the job should complete faster.

Start from your homepage (https://example.com) so Discovery sees your main navigation.
Add high‑value pages first: FAQ, pricing, services.
For frequent updates, refresh only the pages that changed.

Discovery found more pages than allowed
- On the confirmation screen, reduce the selection, or upgrade your plan.
Job won’t start
- Make sure you clicked Confirm after Discovery.
Entries never become “ready”
- Cancel and retry with fewer pages; check that the pages are publicly accessible.
Crawl completes instantly with a robots.txt warning and no pages learned
- If the crawl card shows a red warning about robots.txt and the job says it completed but zero pages were processed, your site’s robots.txt blocked all candidate URLs.
- To resolve this:
  - Open your robots.txt file in a browser (for example, https://example.com/robots.txt).
  - Check for Disallow rules that block the sections you want your agent to learn from.
  - Update robots.txt to allow crawling of those pages, then start a new crawl.
  - Alternatively, start from a different URL that is allowed by robots.txt, or use Scrape a single page for just a few important pages.
- In the onboarding wizard, this state keeps the Next button disabled on the “Learning in progress” step until at least some pages can be learned.
- For more details, see Crawl your website.