Crawler Stuck or Over Limit
Fix issues when Discovery/Ingestion seems stuck or you hit plan limits.
What’s happening
- Large sites can take time to process.
- Some pages are blocked (robots.txt, login).
- Your plan limits may prevent more pages from being learned.
Quick checks
- Keep the dashboard tab open during Ingestion.
- Test your site is reachable in a normal browser tab.
- Check your plan capacity in Plan limits and upgrades.
Steps
Confirm Discovery finished
- After “Add entire website,” you see a confirmation screen with Pages Found and your limits.
- If you haven’t clicked Confirm, Ingestion won’t start.
Check job progress
- In Train Voice Agent, watch the Knowledge Base entries change from “processing” to “ready.”
- Very large sites may take longer.
Reduce scope if needed
- Cancel the running job.
- Restart with a smaller scope (begin at your main site URL and ensure internal links are clear).
- Add critical pages first; add more later.
Handle plan limits
- If over capacity, remove older sources in the Knowledge Base to free space, or upgrade your plan in the Customer Portal.
- See details in Plan limits and upgrades.
Retry unreachable pages
- Some pages may be blocked or require login. Only public pages are included.
- For one page, use Scrape a single page.
What you should see
- New entries appearing in the Knowledge Base with status “processing,” then “ready.”
- If you reduced scope, the job should complete faster.
Tips
- Start from your homepage (https://example.com) so Discovery sees your main navigation.
- Add high‑value pages first: FAQ, pricing, services.
- For frequent updates, refresh only the pages that changed.
Troubleshooting
- Discovery found more pages than allowed
- On the confirmation screen, reduce the selection, or upgrade your plan.
- Job won’t start
- Make sure you clicked Confirm after Discovery.
- Entries never become “ready”
- Cancel and retry with fewer pages; check that the pages are publicly accessible.
- Crawl completes instantly with a robots.txt warning and no pages learned
- If the crawl card shows a red warning about
robots.txtand the job says it completed but zero pages were processed, your site’srobots.txtblocked all candidate URLs. - To resolve this:
- Open your
robots.txtfile in a browser (for example,https://example.com/robots.txt). - Check for
Disallowrules that block the sections you want your agent to learn from. - Update
robots.txtto allow crawling of those pages, then start a new crawl. - Alternatively, start from a different URL that is allowed by
robots.txt, or use Scrape a single page for just a few important pages.
- Open your
- In the onboarding wizard, this state keeps the Next button disabled on the “Learning in progress” step until at least some pages can be learned.
- For more details, see Crawl your website.
- If the crawl card shows a red warning about
Next steps
- Add single high‑value pages with Scrape a single page.
- Review limits in Plan limits and upgrades.
