Skip to content

Crawl Your Website

Find many pages on your site and teach your agent from them.

Before you begin

  • The site is public (no login required).
  • You know your main site URL (for example: https://example.com).
  • You understand that your plan may limit pages per crawl and total pages per billing period.

What this does

Crawling has two phases:

  • Discovery: We scan your site to find pages. You see how many pages were found and how this compares to your plan limits.
  • Ingestion: After you confirm, we fetch the content from those pages and add it to your agent’s Knowledge Base.

Steps

  1. Open your agent in the Babelbeez dashboard.
  2. Go to Train Voice Agent.
  3. Choose Add entire website.
  4. Enter your site URL (for example: https://example.com). Click Start.
  5. Review the Discovery results:
    • Pages found on your site.
    • Your plan limits (sources per agent and total pages this period).
    • What will be ingested if you continue.
  6. Click Confirm to start Ingestion (learning the content).
  7. You’ll see progress and each page added to the Knowledge Base as it completes.

What you should see

  • A confirmation screen after Discovery with the number of pages found.
  • New sources added to the Knowledge Base list with status, then moving to “ready” when complete.

Tips

  • Start with your main site URL (https://example.com), not a deep page.
  • Use clear internal links on your site so important pages are discovered.
  • If you only need a few pages, consider Scrape a single page.

Manage jobs

  • Cancel a running crawl
    • Use the cancel option shown during Ingestion to stop the job.
  • Delete sources
    • In the Knowledge Base list, remove individual sources you no longer need.

Troubleshooting

  • Too many pages found
    • The Discovery found more pages than your plan allows. Reduce scope (link fewer pages), or upgrade your plan. See Plan limits and upgrades.
  • Crawl seems stuck
    • Check your internet connection and leave the tab open. Very large sites can take time. If needed, cancel and try again with a smaller scope.
  • Pages missing
    • Some pages may be blocked by robots.txt or require login. Only public pages are included.

FAQ

  • Will Discovery count against my limits?
    • Discovery finds pages. Content counts toward your limits after you confirm and Ingestion begins.
  • Can I crawl subdomains?
    • Crawling stays within the domain you provide. To include subdomains, run separate crawls for each (for example: blog.example.com).

Next steps

© 2025 Babelbeez. All rights reserved.