Appearance
Crawl Your Website
Find many pages on your site and teach your agent from them.
Before you begin
- The site is public (no login required).
- You know your main site URL (for example: https://example.com).
- You understand that your plan may limit pages per crawl and total pages per billing period.
What this does
Crawling has two phases:
- Discovery: We scan your site to find pages. You see how many pages were found and how this compares to your plan limits.
- Ingestion: After you confirm, we fetch the content from those pages and add it to your agent’s Knowledge Base.
Steps
- Open your agent in the Babelbeez dashboard.
- Go to Train Voice Agent.
- Choose Add entire website.
- Enter your site URL (for example: https://example.com). Click Start.
- Review the Discovery results:
- Pages found on your site.
- Your plan limits (sources per agent and total pages this period).
- What will be ingested if you continue.
- Click Confirm to start Ingestion (learning the content).
- You’ll see progress and each page added to the Knowledge Base as it completes.
What you should see
- A confirmation screen after Discovery with the number of pages found.
- New sources added to the Knowledge Base list with status, then moving to “ready” when complete.
Tips
- Start with your main site URL (https://example.com), not a deep page.
- Use clear internal links on your site so important pages are discovered.
- If you only need a few pages, consider Scrape a single page.
Manage jobs
- Cancel a running crawl
- Use the cancel option shown during Ingestion to stop the job.
- Delete sources
- In the Knowledge Base list, remove individual sources you no longer need.
Troubleshooting
- Too many pages found
- The Discovery found more pages than your plan allows. Reduce scope (link fewer pages), or upgrade your plan. See Plan limits and upgrades.
- Crawl seems stuck
- Check your internet connection and leave the tab open. Very large sites can take time. If needed, cancel and try again with a smaller scope.
- Pages missing
- Some pages may be blocked by robots.txt or require login. Only public pages are included.
FAQ
- Will Discovery count against my limits?
- Discovery finds pages. Content counts toward your limits after you confirm and Ingestion begins.
- Can I crawl subdomains?
- Crawling stays within the domain you provide. To include subdomains, run separate crawls for each (for example: blog.example.com).
Next steps
- Review the Knowledge Base entries when they show “ready.”
- Test answers in Live Preview.
- Add more pages later with another crawl or Scrape a single page.