PostgreSQL Full-Text Search: Implementation Guide

Shane

2/8/2026

10 min read

Production-ready guide to PostgreSQL full-text search covering tsvector, tsquery, GIN indexes, weighted ranking, highlighting, fuzzy search with trigrams, and a complete Node.js Express implementation.

nodejs postgresql full-text-search database sql performance

PostgreSQL Full-Text Search: Implementation Guide

Overview

PostgreSQL ships with a powerful, built-in full-text search engine that most teams overlook in favor of external services like Elasticsearch or Algolia. For the majority of applications serving fewer than 10 million documents, PostgreSQL FTS delivers sub-millisecond search with ranking, highlighting, phrase matching, and fuzzy search -- all without adding another service to your infrastructure. This guide walks through a production-ready implementation from raw SQL fundamentals to a complete Node.js/Express search API.

Prerequisites

PostgreSQL 14+ installed and running
Node.js 18+ with the pg driver
Basic SQL knowledge (CREATE TABLE, SELECT, INSERT)
Familiarity with Express.js routing

npm install pg express

Why PostgreSQL FTS Over Elasticsearch or Algolia

I have shipped Elasticsearch clusters in production. They work. They also require a JVM, a separate cluster, data synchronization pipelines, index management, and an operations team that understands shard allocation. For a search feature on a content site or a SaaS product, that is almost always overkill.

PostgreSQL full-text search gives you:

Zero infrastructure overhead -- your data and your search index live in the same database. No sync lag, no eventual consistency bugs.
ACID compliance -- search indexes update within the same transaction as your data writes.
Lower cost -- no additional hosted search service fees.
Sufficient performance -- GIN-indexed full-text search on 5 million rows returns results in 2-10ms on modest hardware.

When should you reach for Elasticsearch? When you need distributed search across billions of documents, complex aggregation pipelines, or real-time log analytics. For everything else, start with PostgreSQL.

tsvector and tsquery Fundamentals

PostgreSQL full-text search revolves around two data types: tsvector and tsquery.

A tsvector is a sorted list of normalized lexemes with positional information:

SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Notice: "foxes" became "fox", "jumped" became "jump", and stop words like "the" and "over" were removed. Each lexeme carries its position in the original text.

A tsquery represents a search query with boolean operators:

SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs')
       @@ to_tsquery('english', 'fox & jump');
-- Returns: true

Text Search Configurations

The 'english' parameter is a text search configuration controlling stemming, stop words, and synonyms. PostgreSQL ships with configurations for 20+ languages.

The 'simple' configuration performs no stemming and no stop word removal. Use it for identifiers, product codes, or proper nouns:

SELECT to_tsvector('simple', 'The Quick Brown Fox');
-- 'brown':3 'fox':4 'quick':2 'the':1

SELECT to_tsvector('english', 'The Quick Brown Fox');
-- 'brown':3 'fox':4 'quick':2

Creating Search-Optimized Columns

Always store tsvectors as a column. Computing to_tsvector on every row during a SELECT is expensive and prevents index usage.

CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    slug VARCHAR(500) UNIQUE NOT NULL,
    body TEXT NOT NULL,
    category VARCHAR(100),
    tags TEXT[] DEFAULT '{}',
    published_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED
);

The GENERATED ALWAYS AS ... STORED syntax (PostgreSQL 12+) creates a computed column that automatically updates when title or body change. No triggers needed.

For older PostgreSQL versions, use a trigger:

ALTER TABLE articles ADD COLUMN search_vector tsvector;

CREATE FUNCTION articles_search_vector_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_vector_trigger
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW
    EXECUTE FUNCTION articles_search_vector_update();

GIN vs GiST Indexes

You must index the search_vector column:

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Characteristic	GIN	GiST
Read performance	Faster	Slower
Write performance	Slower (rebalancing)	Faster
Index size	Larger	Smaller
Best for	Read-heavy workloads	Write-heavy, rarely queried

Use GIN. If your write volume makes GIN maintenance a bottleneck, consider partitioning or batched inserts first.

Weighted Search: Title vs Body

The setweight function assigns importance levels A through D. When a user searches for "API gateway", a match in the title should rank higher than a match in paragraph 47.

setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(subtitle, '')), 'B') ||
setweight(to_tsvector('english', coalesce(body, '')), 'C') ||
setweight(to_tsvector('english', coalesce(tags_text, '')), 'D')

Ranking with ts_rank and ts_rank_cd

SELECT title,
    ts_rank_cd(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'api & gateway') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

ts_rank_cd uses cover density ranking, which considers the proximity of matching lexemes. It generally produces more intuitive results than ts_rank.

Pass weight arrays to control each weight class contribution. The array maps to {D, C, B, A}:

ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', search_vector, query)

Normalize by document length to prevent long documents from dominating:

ts_rank_cd(search_vector, query, 2)  -- Normalize by document length

Search Highlighting with ts_headline

SELECT title,
    ts_headline('english', body, query,
        'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20, MaxFragments=3'
    ) AS snippet
FROM articles, to_tsquery('english', 'api & gateway') AS query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC;

Performance warning: ts_headline is expensive. Always apply it after filtering and limiting results in a CTE:

WITH ranked AS (
    SELECT id, title, body,
           ts_rank_cd(search_vector, query) AS rank
    FROM articles, to_tsquery('english', 'api & gateway') AS query
    WHERE search_vector @@ query
    ORDER BY rank DESC
    LIMIT 20
)
SELECT id, title, rank,
    ts_headline('english', body, to_tsquery('english', 'api & gateway'),
        'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20'
    ) AS snippet
FROM ranked;

Query Parsing Functions

websearch_to_tsquery (PostgreSQL 11+) -- best for user-facing search. Supports Google-like syntax:

SELECT websearch_to_tsquery('english', '"api gateway" OR graphql -soap');
-- 'api' <-> 'gateway' | 'graphql' & !'soap'

plainto_tsquery -- safe fallback. Converts plain text to AND query:

SELECT plainto_tsquery('english', 'api gateway design');
-- 'api' & 'gateway' & 'design'

phraseto_tsquery -- treats input as a phrase:

SELECT phraseto_tsquery('english', 'api gateway design');
-- 'api' <-> 'gateway' <-> 'design'

Never pass raw user input to to_tsquery. Use websearch_to_tsquery for user-facing search.

Trigram Matching for Fuzzy Search

Full-text search requires lexeme matches. If a user misspells "postgrsql", FTS returns nothing. For typo tolerance, use the pg_trgm extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);

SELECT title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC;

Combine trigram fuzzy matching with full-text search: first try FTS, then fall back to trigram if FTS returns no results.

Complete Working Example: Node.js/Express Search API

var express = require('express');
var { Pool } = require('pg');

var app = express();
app.use(express.json());

var pool = new Pool({
    connectionString: process.env.DATABASE_URL,
    max: 20,
});

app.get('/api/search', async function(req, res) {
    var q = req.query.q;
    var category = req.query.category;
    var page = Math.max(1, parseInt(req.query.page, 10) || 1);
    var limit = Math.min(100, Math.max(1, parseInt(req.query.limit, 10) || 20));
    var offset = (page - 1) * limit;

    if (!q || q.trim().length === 0) {
        return res.status(400).json({ error: 'Query parameter "q" is required' });
    }

    var sanitizedQuery = q.trim().slice(0, 200);

    try {
        var sql = `
            WITH search_query AS (
                SELECT websearch_to_tsquery('english', $1) AS query
            ),
            matched AS (
                SELECT
                    a.id, a.title, a.slug, a.category, a.tags,
                    a.published_at, a.body,
                    ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', a.search_vector, sq.query, 2) AS rank,
                    COUNT(*) OVER() AS total_count
                FROM articles a, search_query sq
                WHERE a.search_vector @@ sq.query
                  AND a.published_at IS NOT NULL
                  AND ($2::varchar IS NULL OR a.category = $2)
                ORDER BY rank DESC
                LIMIT $3 OFFSET $4
            )
            SELECT m.id, m.title, m.slug, m.category, m.tags,
                m.published_at, m.rank, m.total_count,
                ts_headline('english', m.body, sq.query,
                    'StartSel=<mark>, StopSel=</mark>, MaxWords=40, MinWords=15, MaxFragments=2, FragmentDelimiter= ... '
                ) AS snippet
            FROM matched m, search_query sq
            ORDER BY m.rank DESC
        `;

        var params = [sanitizedQuery, category || null, limit, offset];
        var result = await pool.query(sql, params);

        var totalCount = result.rows.length > 0 ? parseInt(result.rows[0].total_count, 10) : 0;
        var totalPages = Math.ceil(totalCount / limit);

        res.json({
            query: sanitizedQuery,
            results: result.rows.map(function(row) {
                return {
                    id: row.id,
                    title: row.title,
                    slug: row.slug,
                    category: row.category,
                    tags: row.tags,
                    publishedAt: row.published_at,
                    rank: parseFloat(row.rank),
                    snippet: row.snippet,
                };
            }),
            pagination: {
                page: page,
                limit: limit,
                totalCount: totalCount,
                totalPages: totalPages,
                hasNext: page < totalPages,
                hasPrev: page > 1,
            },
        });
    } catch (err) {
        console.error('Search error:', err.message);
        if (err.message.includes('syntax error in tsquery')) {
            return res.status(400).json({ error: 'Invalid search query syntax' });
        }
        res.status(500).json({ error: 'Internal server error' });
    }
});

app.get('/api/autocomplete', async function(req, res) {
    var q = req.query.q;
    if (!q || q.trim().length < 2) {
        return res.status(400).json({ error: 'Query must be at least 2 characters' });
    }

    var sanitizedQuery = q.trim().slice(0, 100);

    try {
        var result = await pool.query(`
            SELECT id, title, slug, similarity(title, $1) AS sim
            FROM articles
            WHERE published_at IS NOT NULL
              AND (title % $1 OR title ILIKE $2)
            ORDER BY sim DESC
            LIMIT 10
        `, [sanitizedQuery, '%' + sanitizedQuery + '%']);

        res.json({
            suggestions: result.rows.map(function(row) {
                return { id: row.id, title: row.title, slug: row.slug, similarity: parseFloat(row.sim) };
            }),
        });
    } catch (err) {
        console.error('Autocomplete error:', err.message);
        res.status(500).json({ error: 'Internal server error' });
    }
});

var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
    console.log('Search API running on port ' + PORT);
});

Common Issues and Troubleshooting

1. Empty search results when data clearly matches

Cause: The search vector column was not populated. If you added the column after inserting data, existing rows need updating.

UPDATE articles SET updated_at = NOW();

2. `syntax error in tsquery` on user input

ERROR:  syntax error in tsquery: "hello world & "

Fix: Use websearch_to_tsquery() instead of to_tsquery(). It safely handles arbitrary user input.

3. Search is slow despite having a GIN index

Cause: You are calling to_tsvector() at query time instead of using the stored column.

-- WRONG (cannot use index):
WHERE to_tsvector('english', body) @@ query

-- RIGHT (uses the GIN index):
WHERE search_vector @@ query

4. ts_headline degrades performance

Fix: Apply ts_headline only to the final paginated result set using a CTE.

Best Practices

Always use websearch_to_tsquery for user-facing search. It handles quoted phrases, OR, and negation safely.
Store the tsvector as a generated column or trigger. Never compute at query time.
Apply ts_headline in a CTE after filtering and limiting. This is the single biggest performance mistake in PostgreSQL search.
Use ts_rank_cd over ts_rank. Cover density ranking produces more intuitive results.
Normalize ranks for documents of varying lengths.
Create partial indexes for published content: CREATE INDEX ... WHERE published_at IS NOT NULL;
Implement two-tier search. Full-text first, trigram fallback for zero results.
Limit input length. Truncate user queries to 100-200 characters.

PostgreSQL Full-Text Search: Implementation Guide

PostgreSQL Full-Text Search: Implementation Guide

Overview

Prerequisites

Why PostgreSQL FTS Over Elasticsearch or Algolia

tsvector and tsquery Fundamentals

Text Search Configurations

Creating Search-Optimized Columns

GIN vs GiST Indexes

Weighted Search: Title vs Body

Ranking with ts_rank and ts_rank_cd

Search Highlighting with ts_headline

Query Parsing Functions

Trigram Matching for Fuzzy Search

Complete Working Example: Node.js/Express Search API

Common Issues and Troubleshooting

1. Empty search results when data clearly matches

2. `syntax error in tsquery` on user input

3. Search is slow despite having a GIN index

4. ts_headline degrades performance

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?

PostgreSQL Full-Text Search: Implementation Guide

Overview

Prerequisites

Why PostgreSQL FTS Over Elasticsearch or Algolia

tsvector and tsquery Fundamentals

Text Search Configurations

Creating Search-Optimized Columns

GIN vs GiST Indexes

Weighted Search: Title vs Body

Ranking with ts_rank and ts_rank_cd

Search Highlighting with ts_headline

Query Parsing Functions

Trigram Matching for Fuzzy Search

Complete Working Example: Node.js/Express Search API

Common Issues and Troubleshooting

1. Empty search results when data clearly matches

2. syntax error in tsquery on user input

3. Search is slow despite having a GIN index

4. ts_headline degrades performance

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?

2. `syntax error in tsquery` on user input