PostgreSQL Full-Text Search: Implementation Guide
Production-ready guide to PostgreSQL full-text search covering tsvector, tsquery, GIN indexes, weighted ranking, highlighting, fuzzy search with trigrams, and a complete Node.js Express implementation.
PostgreSQL Full-Text Search: Implementation Guide
Overview
PostgreSQL ships with a powerful, built-in full-text search engine that most teams overlook in favor of external services like Elasticsearch or Algolia. For the majority of applications serving fewer than 10 million documents, PostgreSQL FTS delivers sub-millisecond search with ranking, highlighting, phrase matching, and fuzzy search -- all without adding another service to your infrastructure. This guide walks through a production-ready implementation from raw SQL fundamentals to a complete Node.js/Express search API.
Prerequisites
- PostgreSQL 14+ installed and running
- Node.js 18+ with the
pgdriver - Basic SQL knowledge (CREATE TABLE, SELECT, INSERT)
- Familiarity with Express.js routing
npm install pg express
Why PostgreSQL FTS Over Elasticsearch or Algolia
I have shipped Elasticsearch clusters in production. They work. They also require a JVM, a separate cluster, data synchronization pipelines, index management, and an operations team that understands shard allocation. For a search feature on a content site or a SaaS product, that is almost always overkill.
PostgreSQL full-text search gives you:
- Zero infrastructure overhead -- your data and your search index live in the same database. No sync lag, no eventual consistency bugs.
- ACID compliance -- search indexes update within the same transaction as your data writes.
- Lower cost -- no additional hosted search service fees.
- Sufficient performance -- GIN-indexed full-text search on 5 million rows returns results in 2-10ms on modest hardware.
When should you reach for Elasticsearch? When you need distributed search across billions of documents, complex aggregation pipelines, or real-time log analytics. For everything else, start with PostgreSQL.
tsvector and tsquery Fundamentals
PostgreSQL full-text search revolves around two data types: tsvector and tsquery.
A tsvector is a sorted list of normalized lexemes with positional information:
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
Notice: "foxes" became "fox", "jumped" became "jump", and stop words like "the" and "over" were removed. Each lexeme carries its position in the original text.
A tsquery represents a search query with boolean operators:
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs')
@@ to_tsquery('english', 'fox & jump');
-- Returns: true
Text Search Configurations
The 'english' parameter is a text search configuration controlling stemming, stop words, and synonyms. PostgreSQL ships with configurations for 20+ languages.
The 'simple' configuration performs no stemming and no stop word removal. Use it for identifiers, product codes, or proper nouns:
SELECT to_tsvector('simple', 'The Quick Brown Fox');
-- 'brown':3 'fox':4 'quick':2 'the':1
SELECT to_tsvector('english', 'The Quick Brown Fox');
-- 'brown':3 'fox':4 'quick':2
Creating Search-Optimized Columns
Always store tsvectors as a column. Computing to_tsvector on every row during a SELECT is expensive and prevents index usage.
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
slug VARCHAR(500) UNIQUE NOT NULL,
body TEXT NOT NULL,
category VARCHAR(100),
tags TEXT[] DEFAULT '{}',
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED
);
The GENERATED ALWAYS AS ... STORED syntax (PostgreSQL 12+) creates a computed column that automatically updates when title or body change. No triggers needed.
For older PostgreSQL versions, use a trigger:
ALTER TABLE articles ADD COLUMN search_vector tsvector;
CREATE FUNCTION articles_search_vector_update() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW
EXECUTE FUNCTION articles_search_vector_update();
GIN vs GiST Indexes
You must index the search_vector column:
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
| Characteristic | GIN | GiST |
|---|---|---|
| Read performance | Faster | Slower |
| Write performance | Slower (rebalancing) | Faster |
| Index size | Larger | Smaller |
| Best for | Read-heavy workloads | Write-heavy, rarely queried |
Use GIN. If your write volume makes GIN maintenance a bottleneck, consider partitioning or batched inserts first.
Weighted Search: Title vs Body
The setweight function assigns importance levels A through D. When a user searches for "API gateway", a match in the title should rank higher than a match in paragraph 47.
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(subtitle, '')), 'B') ||
setweight(to_tsvector('english', coalesce(body, '')), 'C') ||
setweight(to_tsvector('english', coalesce(tags_text, '')), 'D')
Ranking with ts_rank and ts_rank_cd
SELECT title,
ts_rank_cd(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'api & gateway') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
ts_rank_cd uses cover density ranking, which considers the proximity of matching lexemes. It generally produces more intuitive results than ts_rank.
Pass weight arrays to control each weight class contribution. The array maps to {D, C, B, A}:
ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', search_vector, query)
Normalize by document length to prevent long documents from dominating:
ts_rank_cd(search_vector, query, 2) -- Normalize by document length
Search Highlighting with ts_headline
SELECT title,
ts_headline('english', body, query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20, MaxFragments=3'
) AS snippet
FROM articles, to_tsquery('english', 'api & gateway') AS query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC;
Performance warning: ts_headline is expensive. Always apply it after filtering and limiting results in a CTE:
WITH ranked AS (
SELECT id, title, body,
ts_rank_cd(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'api & gateway') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20
)
SELECT id, title, rank,
ts_headline('english', body, to_tsquery('english', 'api & gateway'),
'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20'
) AS snippet
FROM ranked;
Query Parsing Functions
websearch_to_tsquery (PostgreSQL 11+) -- best for user-facing search. Supports Google-like syntax:
SELECT websearch_to_tsquery('english', '"api gateway" OR graphql -soap');
-- 'api' <-> 'gateway' | 'graphql' & !'soap'
plainto_tsquery -- safe fallback. Converts plain text to AND query:
SELECT plainto_tsquery('english', 'api gateway design');
-- 'api' & 'gateway' & 'design'
phraseto_tsquery -- treats input as a phrase:
SELECT phraseto_tsquery('english', 'api gateway design');
-- 'api' <-> 'gateway' <-> 'design'
Never pass raw user input to to_tsquery. Use websearch_to_tsquery for user-facing search.
Trigram Matching for Fuzzy Search
Full-text search requires lexeme matches. If a user misspells "postgrsql", FTS returns nothing. For typo tolerance, use the pg_trgm extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);
SELECT title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC;
Combine trigram fuzzy matching with full-text search: first try FTS, then fall back to trigram if FTS returns no results.
Complete Working Example: Node.js/Express Search API
var express = require('express');
var { Pool } = require('pg');
var app = express();
app.use(express.json());
var pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
});
app.get('/api/search', async function(req, res) {
var q = req.query.q;
var category = req.query.category;
var page = Math.max(1, parseInt(req.query.page, 10) || 1);
var limit = Math.min(100, Math.max(1, parseInt(req.query.limit, 10) || 20));
var offset = (page - 1) * limit;
if (!q || q.trim().length === 0) {
return res.status(400).json({ error: 'Query parameter "q" is required' });
}
var sanitizedQuery = q.trim().slice(0, 200);
try {
var sql = `
WITH search_query AS (
SELECT websearch_to_tsquery('english', $1) AS query
),
matched AS (
SELECT
a.id, a.title, a.slug, a.category, a.tags,
a.published_at, a.body,
ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', a.search_vector, sq.query, 2) AS rank,
COUNT(*) OVER() AS total_count
FROM articles a, search_query sq
WHERE a.search_vector @@ sq.query
AND a.published_at IS NOT NULL
AND ($2::varchar IS NULL OR a.category = $2)
ORDER BY rank DESC
LIMIT $3 OFFSET $4
)
SELECT m.id, m.title, m.slug, m.category, m.tags,
m.published_at, m.rank, m.total_count,
ts_headline('english', m.body, sq.query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=40, MinWords=15, MaxFragments=2, FragmentDelimiter= ... '
) AS snippet
FROM matched m, search_query sq
ORDER BY m.rank DESC
`;
var params = [sanitizedQuery, category || null, limit, offset];
var result = await pool.query(sql, params);
var totalCount = result.rows.length > 0 ? parseInt(result.rows[0].total_count, 10) : 0;
var totalPages = Math.ceil(totalCount / limit);
res.json({
query: sanitizedQuery,
results: result.rows.map(function(row) {
return {
id: row.id,
title: row.title,
slug: row.slug,
category: row.category,
tags: row.tags,
publishedAt: row.published_at,
rank: parseFloat(row.rank),
snippet: row.snippet,
};
}),
pagination: {
page: page,
limit: limit,
totalCount: totalCount,
totalPages: totalPages,
hasNext: page < totalPages,
hasPrev: page > 1,
},
});
} catch (err) {
console.error('Search error:', err.message);
if (err.message.includes('syntax error in tsquery')) {
return res.status(400).json({ error: 'Invalid search query syntax' });
}
res.status(500).json({ error: 'Internal server error' });
}
});
app.get('/api/autocomplete', async function(req, res) {
var q = req.query.q;
if (!q || q.trim().length < 2) {
return res.status(400).json({ error: 'Query must be at least 2 characters' });
}
var sanitizedQuery = q.trim().slice(0, 100);
try {
var result = await pool.query(`
SELECT id, title, slug, similarity(title, $1) AS sim
FROM articles
WHERE published_at IS NOT NULL
AND (title % $1 OR title ILIKE $2)
ORDER BY sim DESC
LIMIT 10
`, [sanitizedQuery, '%' + sanitizedQuery + '%']);
res.json({
suggestions: result.rows.map(function(row) {
return { id: row.id, title: row.title, slug: row.slug, similarity: parseFloat(row.sim) };
}),
});
} catch (err) {
console.error('Autocomplete error:', err.message);
res.status(500).json({ error: 'Internal server error' });
}
});
var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
console.log('Search API running on port ' + PORT);
});
Common Issues and Troubleshooting
1. Empty search results when data clearly matches
Cause: The search vector column was not populated. If you added the column after inserting data, existing rows need updating.
UPDATE articles SET updated_at = NOW();
2. syntax error in tsquery on user input
ERROR: syntax error in tsquery: "hello world & "
Fix: Use websearch_to_tsquery() instead of to_tsquery(). It safely handles arbitrary user input.
3. Search is slow despite having a GIN index
Cause: You are calling to_tsvector() at query time instead of using the stored column.
-- WRONG (cannot use index):
WHERE to_tsvector('english', body) @@ query
-- RIGHT (uses the GIN index):
WHERE search_vector @@ query
4. ts_headline degrades performance
Fix: Apply ts_headline only to the final paginated result set using a CTE.
Best Practices
- Always use
websearch_to_tsqueryfor user-facing search. It handles quoted phrases, OR, and negation safely. - Store the tsvector as a generated column or trigger. Never compute at query time.
- Apply
ts_headlinein a CTE after filtering and limiting. This is the single biggest performance mistake in PostgreSQL search. - Use
ts_rank_cdoverts_rank. Cover density ranking produces more intuitive results. - Normalize ranks for documents of varying lengths.
- Create partial indexes for published content:
CREATE INDEX ... WHERE published_at IS NOT NULL; - Implement two-tier search. Full-text first, trigram fallback for zero results.
- Limit input length. Truncate user queries to 100-200 characters.
