User-agent: * # -- Tier A: Indexable SEO pages (index + follow) ---------------------------- Allow: / Allow: /index.html Allow: /features.html Allow: /how-it-works.html Allow: /about.html Allow: /sitemap.html Allow: /sitemap.xml # -- Tier B: App layer (noindex via meta tag; excluded for crawl budget) ------ Disallow: /app/ # -- Backend directory (Python source, not served as HTML) -------------------- Disallow: /backend/ # -- Static assets (no SEO value) -------------------------------------------- Disallow: /static/ # -- API endpoints (hosted on HuggingFace) ------------------------------------ Disallow: /crawl Disallow: /crawl-status Disallow: /crawl-stream/ Disallow: /results Disallow: /analyze Disallow: /optimize Disallow: /optimize-status Disallow: /technical-seo Disallow: /site-audit Disallow: /competitor/ Disallow: /gemini-status Disallow: /ai-config Disallow: /set-api-key Disallow: /export Disallow: /healthz Disallow: /serp Disallow: /serp-track Disallow: /serp_engine Disallow: /docs Disallow: /redoc Disallow: /openapi.json Disallow: /api/ # -- Debug / dev paths -------------------------------------------------------- Disallow: /debug/ Disallow: /?debug= Disallow: /?test= # -- Infinite URL spaces via query parameters --------------------------------- Disallow: /*?job_id= Disallow: /*?format= Sitemap: https://bhavani5a8.github.io/crawliq.io/sitemap.xml # -- Lighthouse audit bot (prevent quota drain) -------------------------------- User-agent: Chrome-Lighthouse Disallow: /