CrawlIQ — Built by Teki Bhavani Shankar, Technical SEO Specialist

Why CrawlIQ Exists

Professional technical SEO evaluation systems — Screaming Frog, Ahrefs, Semrush — are excellent but cost hundreds of dollars per month. Freelancers, indie developers, and small businesses often cannot justify that cost for occasional evaluations.

CrawlIQ was built to give every website owner access to the same depth of technical assessment, powered by modern AI models, at zero cost. No subscription. No credit card. No page-limit paywall on the core evaluation engine. Provide a URL and receive a complete technical assessment.

The project also serves as a working demonstration of what a Technical SEO Specialist can build: combining deep domain knowledge with full-stack Python development, async web evaluation, AI integration, and production deployment on HuggingFace Spaces.

Technical Architecture

CrawlIQ is a two-layer reference system: a static GitHub Pages frontend and a FastAPI backend hosted on HuggingFace Spaces.

Frontend (GitHub Pages)

Pure HTML, CSS, and vanilla JavaScript. No framework dependencies. The landing page and reference shell are fully static — they render meaningful content without waiting for the backend, which is critical for Googlebot's first-pass indexing of a JavaScript-heavy reference page.

Performance decisions include deferred script loading, non-blocking CSS via preload + onload, lazy-loaded analytics, and a cold-start fallback notice that appears when the HuggingFace free-tier backend is waking up.

Backend (FastAPI on HuggingFace Spaces)

The backend is a Python FastAPI application with 40+ endpoints covering site evaluation, AI remediation, competitor signal comparison, position tracking, and Excel export. Key libraries:

aiohttp — async HTTP client for the BFS evaluation engine; handles SSL failures with automatic HTTP fallback
BeautifulSoup4 — HTML parsing for on-page SEO signal extraction
pandas + openpyxl — structured Excel report generation
Server-Sent Events (SSE) — real-time evaluation progress streamed to the browser without WebSocket overhead

AI Integration

CrawlIQ routes AI requests to any of five providers via environment variable: Groq (default, free), Google Gemini, Anthropic Claude, OpenAI, or a deterministic rules-based fallback that works with no API key at all. The rules-based fallback ensures the reference system always produces useful remediation output even on HuggingFace's free tier where API keys are not set.

Open Source

CrawlIQ is fully open-source under the MIT licence. The complete frontend and backend source code is available on GitHub. Contributions, bug reports, and feature requests are welcome.

Hosting

Frontend: GitHub Pages — free static hosting with automatic deployment on every main branch push
Backend API: HuggingFace Spaces — containerised FastAPI application running in Docker with a 7860 port uvicorn server

SEO Architecture of CrawlIQ Itself

Building an SEO evaluation reference creates an unusual constraint: the reference site itself must demonstrate the same SEO practices it documents. CrawlIQ's frontend is designed as a live reference implementation of technical SEO best practices.

The reference site uses a strict three-layer architecture:

Tier A — Indexed documentation pages: The homepage, Capabilities, Methodology, and About pages. Each has a unique title tag, meta description, canonical URL, structured data schema, and 1,000+ words of crawlable body content. No interactive widgets or app-layer UI in these pages' DOM.
Tier B — App layer (noindex): The /app/ assessment interface is marked noindex, follow. It contains the full assessment interface but carries no SEO content, schemas, or canonical signals. Googlebot reads it but does not index it — intentional separation of application UI from ranking documentation content.
Tier C — Blocked: Backend API endpoints, static assets, and admin paths are blocked in robots.txt and excluded from the XML sitemap.

Every JSON-LD schema block is placed only on pages where it accurately describes the content: WebPage and FAQPage on the homepage, HowTo on the How It Works page, FAQPage where FAQ content exists, and BreadcrumbList on all Tier A sub-pages. This mirrors the schema discipline expected of any serious SEO implementation.

Skills Demonstrated by Building CrawlIQ

CrawlIQ was designed to be a concrete, verifiable demonstration of Technical SEO and full-stack development skills applied to a real problem. The areas covered include:

Technical SEO: Evaluation budget management, canonical architecture, structured data implementation, Core Web Vitals optimisation, robots.txt strategy, XML sitemap management, and schema discipline across a multi-page site
Async Python: BFS evaluation engine built with aiohttp and asyncio; concurrent HTTP processing with per-domain rate limiting and SSL fallback handling
API design: FastAPI REST endpoints for site evaluation, AI remediation, position tracking, competitor comparison, and Excel export; Server-Sent Events for real-time evaluation progress
AI integration: Multi-provider LLM routing (Groq, Gemini, Claude, OpenAI) with structured prompt engineering for SEO-specific remediation output and a deterministic rules-based fallback for zero-key environments
Frontend: Vanilla JavaScript SPA-like reference interface with panel routing, localStorage state, no framework dependencies, and full progressive enhancement for Googlebot compatibility
DevOps: GitHub Actions CI/CD pipeline deploying both the static GitHub Pages frontend and triggering HuggingFace Space rebuilds on every main push

Honest Limitations

CrawlIQ is a free reference system hosted on HuggingFace's free tier. These constraints are worth knowing before using it for production evaluations:

Cold start: HuggingFace free-tier containers sleep after 15 minutes of inactivity. The first request after a sleep period takes 30–90 seconds to respond. A cold-start notice appears in the app automatically.
Page limit: The free unauthenticated tier is capped at 50 pages per evaluation. Free accounts extend this to 200 pages. Sites with thousands of pages will need Screaming Frog or Ahrefs Site Assessment for full coverage.
JavaScript rendering: CrawlIQ uses HTML-only evaluation. Fully JavaScript-rendered SPAs that serve empty HTML shells will show limited on-page data. Server-side rendered and static sites receive complete evaluation coverage.
No backlink data: CrawlIQ analyses on-page and technical signals only. Backlink profiles and domain authority data require a commercial API (Ahrefs, Moz, Majestic) that is not integrated in the free version.

Contact and Feedback

CrawlIQ is an active project. Bug reports, feature suggestions, and pull requests are welcome on the GitHub repository. For professional enquiries, connect via LinkedIn.

Teki Bhavani Shankar