How I Created This SEO Keyword Research Tool With AI

I built an AI-driven keyword tool by pairing embeddings, fast retrieval, and a clean scoring pipeline.

Readers land here to see the full build: the stack, the pipeline, and the trade-offs. I’ll show the data sources, the scoring logic, and the steps I took so you can ship a lean version without months of trial and error.

Building An AI Keyword Research App: My Stack And Method

I started with a narrow goal: surface topic ideas that match searcher intent, stack rank them, and draft outlines that stay people-first. Everything hangs on three layers—collection, scoring, and generation—wired so each step can be swapped later without breaking the rest.

Layer One: Collection

I ingest seed topics from client briefs, site search logs, and niche forums. For volume, I add suggestions from ad tools and auto-complete feeds. Each term is normalized, de-duplicated, and stored with language, country, and source tags. No black-box scraping loops, just sources that are stable and allowed.

Layer Two: Scoring

The goal here isn’t a perfect number; it’s a quick read on reach, difficulty, and fit. I compute a blended score from frequency signals, link context around ranking pages, and topical match between a term and the site’s proven themes. I also flag low-value traps like brand names we don’t serve or zero-click intents.

Layer Three: Generation

Once a batch passes filters, the system drafts outlines and title ideas, then parks them for a human pass. AI helps, but the human call still decides what ships. The whole loop pushes for people-first pages, not thin rewrites.

Build Phases, Inputs, Outputs
Phase	Key Inputs	Main Outputs
Collection	Seed topics, suggestions, logs	Clean term list with tags
Scoring	Volume hints, SERP scans, site themes	Reach/fit scores, filters
Generation	Approved terms, constraints	Briefs, outlines, titles

Why Embeddings Sit At The Core

Keyword strings can fool you. Two queries may look different yet ask the same thing; others look close but target a different task. I embed each term and the top ranking snippets so the system can judge topical closeness in vector space. That makes clustering tighter and cuts noise in the idea queue.

Model Choice And Settings

I use a small, cost-friendly embedding model for bulk work and a larger one for audits. Vectors live in a local store for small projects and in a managed index for bigger sites. I set a fixed dimension across runs so updates don’t break neighbors. Cosine similarity drives match checks; a simple cap on neighbors keeps clusters sharp.

Clustering And De-duplication

After embedding, I form clusters per intent: informational, transactional, navigational, or mixed. Then I merge near-duplicates, keeping the best representative term. Any cluster with mixed goals gets split so briefs don’t wander.

Signals That Drive The Score

The score blends reach and effort with a site fit multiplier. I avoid fragile one-number magic; each piece is readable and testable on its own.

Reach Signals

Search volume ranges are fine as a hint. I also watch seasonality, query freshness, and the number of distinct subtopics sitting in the cluster. A fat cluster with steady interest tends to beat a single shiny head term.

Effort Signals

I scan the leading pages to see depth, media use, and link profile. Page age helps too. A well aged page with steady links raises the bar, while a field of thin posts opens a path for a thorough guide.

Fit Signals

Here I compare cluster vectors with the site’s top pages to gauge topical fit. If a site has strong coverage in a theme, the multiplier bumps the final score. If the theme is outside the site’s lane, the score drops so we don’t chase off-brand visits.

Data Sources And Guardrails

For ad data, I pull ideas and ranges from public ad tools and verified sources. For rules on people-first content, I align with official guidance and keep claims grounded. That keeps the tool useful for real readers, not just rankings.

What I Avoid

No scraping of private dashboards. No claims that a number is perfect. No blind faith in AI drafts. Each output gets a human pass and a link to sources used. The tool exists to speed good work, not to spit out thin pages.

Pipeline Walkthrough

Here’s the run sequence I ship in production. It’s light, fast, and easy to debug.

1) Intake

Feed seed topics, country, language, and niche. The system tags the job and kicks off collection.

2) Expand

Pull suggestions and related terms. Keep variants that match the niche; drop brand names we don’t serve. Normalize case, trim stop words where safe, and remove dupes.

3) Embed

Create vectors for every term and a sample of ranking snippets. Cache them. Store metadata alongside each vector.

4) Cluster

Group terms by similarity and intent. Pick a representative term for each cluster. Split any group that mixes tasks.

5) Score

Compute reach, effort, and fit. Apply the multiplier and rank. Keep the formula readable so it can be explained to an editor in one minute.

6) Brief

Draft an outline with headings, angle, and link targets. Add a table prompt and any images that aid scan reading. Keep the copy lean and useful.

7) Review And Ship

Human edits the brief, checks sources, and approves a title. The CMS template handles markup, schema, and dates.

Scoring Formula At A Glance

The base idea is simple: Score = Reach × Fit ÷ Effort. Each piece is made from small parts you can test.

Score Components And Simple Scales
Component	Scale	Notes
Reach	1–5	Range, seasonality, cluster width
Effort	1–5	Depth of leaders, links, page age
Fit	0.5–1.5	Topical match with site themes

Engineering Choices That Saved Time

I kept the code in small modules. One file fetches ideas. One file embeds. One file clusters and scores. A config file sets model, index, and cutoffs. Logs print one line per step with timings so it’s easy to spot slow spots.

Stack Notes

Language: TypeScript for the API layer and Python for data steps. Store: Postgres for metadata, a vector index for search. Queue: a tiny job runner for batch work. Front end: a simple dashboard with filters, tables, and a review pane for briefs.

Cost Controls

Embeddings run in batches with retry logic and a cap per minute. I reuse vectors when terms repeat. Heavy jobs run off-peak. I also cache top SERP snippets with short TTLs to cut calls during testing.

Quality Checks Before A Term Ships

Each cluster gets a short review. I read the top pages, confirm the user task, and log gaps the post will fill. If I can’t state the reader payoff in one sentence, the term goes back in the queue.

People-First Checks

Meet the search task early. Keep sections tight and skimmable. Cite a source when a fact isn’t common knowledge. Avoid bloated intros and giant hero images that push the answer down the page.

How I Validate The Tool

I run head-to-head checks against live posts. Pick ten terms, ship two posts with the briefs, and track dwell and links. If a cluster style beats the old approach, I keep it; if not, I adjust the score mix or the brief template.

Where Official Guidance Fits

Two pages sit open while I work. One explains people-first content. The other walks through ad tool keyword ideas and ranges. Both help set guardrails so the tool stays aligned with readers and with search systems.

Practical Tips You Can Steal

Keep Clusters Small

Five to eight terms per cluster is plenty. Bigger than that, briefs drift and pages lose clarity.

Write The Answer First

Open each post with the direct answer in one punchy line. Then expand with steps, tables, and links that help the reader act.

Use Outlines, Not Long Drafts

Short prompts that list tasks and constraints beat long walls of text. Editors stay in control and AI stays on rails.

Deployment And Logging

The API runs on a small container with autoscaling. Jobs push logs to a central store with queryable fields for job id, step, and duration. When a step slows down, I can spot the bottleneck in seconds. Alerts ping before queues pile up.

Tracking Outcomes That Matter

I track leads from organic posts, scroll depth, and repeat visits to topic hubs. Rankings help, but reader actions tell the real story. When a post gets saved or linked, the brief likely hit the mark.

Common Build Mistakes

Giant clusters that mix tasks. Blind faith in a single score. Chasing head terms far outside the site’s lane. Overwriting with AI text instead of helping an editor craft a clear brief. Each of those slows growth.

UI Notes For Editors

Filters by country and theme sit at the top. The table shows the rep term, cluster size, and the three subtopics with the best reach. The brief pane sits on the right with a one-click copy button. Simple beats flashy.

Privacy And Compliance Basics

I keep PII out of logs, rotate keys, and gate projects by role. If a client needs data wiped, a single command drops vectors, metadata, and cache entries tied to that tag. Clear and traceable.

What I’d Do Next

I plan to add change tracking for clusters, a notes field for editor feedback, and a report that shows wins by theme. I may also add a live check that warns when a draft leans into thin repeats.

If you want a tool that ships real briefs, borrow the stack, tweak the scores, and keep a human in the loop. With that, you’ll publish pages readers finish and share.

Guidance I keep handy: people-first content and Keyword Planner. Both set clear rails for content and research.