CrawlIQ
A free, open-source technical SEO evaluation reference — built as a professional portfolio project to address a real gap in accessible SEO evaluation methodology.
The Creator
Why CrawlIQ Exists
Professional technical SEO evaluation systems — Screaming Frog, Ahrefs, Semrush — are excellent but cost hundreds of dollars per month. Freelancers, indie developers, and small businesses often cannot justify that cost for occasional evaluations.
CrawlIQ was built to give every website owner access to the same depth of technical assessment, powered by modern AI models, at zero cost. No subscription. No credit card. No page-limit paywall on the core evaluation engine. Provide a URL and receive a complete technical assessment.
The project also serves as a working demonstration of what a Technical SEO Specialist can build: combining deep domain knowledge with full-stack Python development, async web evaluation, AI integration, and production deployment on HuggingFace Spaces.
Technical Architecture
CrawlIQ is a two-layer reference system: a static GitHub Pages frontend and a FastAPI backend hosted on HuggingFace Spaces.
Frontend (GitHub Pages)
Pure HTML, CSS, and vanilla JavaScript. No framework dependencies. The landing page and reference shell are fully static — they render meaningful content without waiting for the backend, which is critical for Googlebot's first-pass indexing of a JavaScript-heavy reference page.
Performance decisions include deferred script loading, non-blocking CSS via preload + onload, lazy-loaded analytics, and a cold-start fallback notice that appears when the HuggingFace free-tier backend is waking up.
Backend (FastAPI on HuggingFace Spaces)
The backend is a Python FastAPI application with 40+ endpoints covering site evaluation, AI remediation, competitor signal comparison, position tracking, and Excel export. Key libraries:
- aiohttp — async HTTP client for the BFS evaluation engine; handles SSL failures with automatic HTTP fallback
- BeautifulSoup4 — HTML parsing for on-page SEO signal extraction
- pandas + openpyxl — structured Excel report generation
- Server-Sent Events (SSE) — real-time evaluation progress streamed to the browser without WebSocket overhead
AI Integration
CrawlIQ routes AI requests to any of five providers via environment variable: Groq (default, free), Google Gemini, Anthropic Claude, OpenAI, or a deterministic rules-based fallback that works with no API key at all. The rules-based fallback ensures the reference system always produces useful remediation output even on HuggingFace's free tier where API keys are not set.
Open Source
CrawlIQ is fully open-source under the MIT licence. The complete frontend and backend source code is available on GitHub. Contributions, bug reports, and feature requests are welcome.
Hosting
- Frontend: GitHub Pages — free static hosting with automatic deployment on every
mainbranch push - Backend API: HuggingFace Spaces — containerised FastAPI application running in Docker with a 7860 port uvicorn server
SEO Architecture of CrawlIQ Itself
Building an SEO evaluation reference creates an unusual constraint: the reference site itself must demonstrate the same SEO practices it documents. CrawlIQ's frontend is designed as a live reference implementation of technical SEO best practices.
The reference site uses a strict three-layer architecture:
- Tier A — Indexed documentation pages: The homepage, Capabilities, Methodology, and About pages. Each has a unique title tag, meta description, canonical URL, structured data schema, and 1,000+ words of crawlable body content. No interactive widgets or app-layer UI in these pages' DOM.
- Tier B — App layer (noindex): The
/app/assessment interface is markednoindex, follow. It contains the full assessment interface but carries no SEO content, schemas, or canonical signals. Googlebot reads it but does not index it — intentional separation of application UI from ranking documentation content. - Tier C — Blocked: Backend API endpoints, static assets, and admin paths are blocked in
robots.txtand excluded from the XML sitemap.
Every JSON-LD schema block is placed only on pages where it accurately describes the content: WebPage and FAQPage on the homepage, HowTo on the How It Works page, FAQPage where FAQ content exists, and BreadcrumbList on all Tier A sub-pages. This mirrors the schema discipline expected of any serious SEO implementation.
Skills Demonstrated by Building CrawlIQ
CrawlIQ was designed to be a concrete, verifiable demonstration of Technical SEO and full-stack development skills applied to a real problem. The areas covered include:
- Technical SEO: Evaluation budget management, canonical architecture, structured data implementation, Core Web Vitals optimisation, robots.txt strategy, XML sitemap management, and schema discipline across a multi-page site
- Async Python: BFS evaluation engine built with
aiohttpandasyncio; concurrent HTTP processing with per-domain rate limiting and SSL fallback handling - API design: FastAPI REST endpoints for site evaluation, AI remediation, position tracking, competitor comparison, and Excel export; Server-Sent Events for real-time evaluation progress
- AI integration: Multi-provider LLM routing (Groq, Gemini, Claude, OpenAI) with structured prompt engineering for SEO-specific remediation output and a deterministic rules-based fallback for zero-key environments
- Frontend: Vanilla JavaScript SPA-like reference interface with panel routing, localStorage state, no framework dependencies, and full progressive enhancement for Googlebot compatibility
- DevOps: GitHub Actions CI/CD pipeline deploying both the static GitHub Pages frontend and triggering HuggingFace Space rebuilds on every
mainpush
Honest Limitations
CrawlIQ is a free reference system hosted on HuggingFace's free tier. These constraints are worth knowing before using it for production evaluations:
- Cold start: HuggingFace free-tier containers sleep after 15 minutes of inactivity. The first request after a sleep period takes 30–90 seconds to respond. A cold-start notice appears in the app automatically.
- Page limit: The free unauthenticated tier is capped at 50 pages per evaluation. Free accounts extend this to 200 pages. Sites with thousands of pages will need Screaming Frog or Ahrefs Site Assessment for full coverage.
- JavaScript rendering: CrawlIQ uses HTML-only evaluation. Fully JavaScript-rendered SPAs that serve empty HTML shells will show limited on-page data. Server-side rendered and static sites receive complete evaluation coverage.
- No backlink data: CrawlIQ analyses on-page and technical signals only. Backlink profiles and domain authority data require a commercial API (Ahrefs, Moz, Majestic) that is not integrated in the free version.
Contact and Feedback
CrawlIQ is an active project. Bug reports, feature suggestions, and pull requests are welcome on the GitHub repository. For professional enquiries, connect via LinkedIn.