Implementing Powerful Full-Text Search in PostgreSQL: A Beginner’s Guide

Introduction

In today’s digital world, efficient search functionality is necessary for databases storing large amounts of textual data. Traditional SQL search methods like LIKE and ILIKE are not optimized for large datasets, making them slow and inefficient.

This is where Full-Text Search (FTS) in PostgreSQL comes in. It enables users to search for meaningful content rather than just matching exact words, making searches faster, more accurate, and more intelligent. This searching (or just text search) provides the capability to identify natural-language documents that satisfy a query and optionally sort them by relevance to the query.

The Limitations of Basic SQL Search (Or Textual Search)

Textual search operators have existed in databases for years. PostgreSQL has ~, ~*, LIKE, and ILIKE operators for textual data types, but they lack many essential properties required by modern information systems.

SELECT * FROM products WHERE name LIKE '%chair%';

Why is this inefficient?

Slow – Poor performance on large datasets due to insufficiently optimized indexing.
No Meaningful Search – Can’t handle plural forms, word variations, or synonyms like “chairs”, “seating”, or “armchair”
No Ranking – Returns results in any order and ignores the importance or relevance of the result.

To overcome these issues, PostgreSQL provides Full-Text Search, which is optimized for natural-language search.

What is Full-Text Search?

Full-Text Search (FTS) allows databases to find relevant matches inside text data based on meaning, not just exact words.

Key features:

Breaks text into searchable words.
Search by meaning, not just keywords.
Language-aware parsing and stemming.
Complex queries using AND, OR, and NOT.
Converts words to their root form (e.g., “running” → “run”).
Ignores common words like “is”, “the”, “a”, etc.
Fast querying with GIN or GiST indexes.
Custom dictionaries and synonym handling.
Sorts results based on importance
Handles Multi-Language searches like English, French, Spanish, and more

Example: Searching for “reclining sofas” should return results that mention “recline”, “sofa”, “couch”, etc., even if the words aren’t an exact match.

Prerequisites for full-text search setup

Before you implement FTS, ensure:

PostgreSQL version is 12 or higher (ideally 16+).
Tables have fields you want to search (e.g., name, description, etc.).
You have basic SQL knowledge.
Extensions like pg_trgm and unaccent are enabled (optional for enhancements).

How Full-Text Search Works in PostgreSQL

1. Tokenization: Text is split into words or lexemes using the to_tsvector() function.

SELECT to_tsvector('english', 'PostgreSQL is a powerful database system');

Result: ‘databas’:5 ‘power’:4 ‘postgresql’:1 ‘system’:6

2. Lexeme Normalization: Stem words to their base form and removes stop words (e.g., “is”, “the”).

3. Storage with tsvector: PostgreSQL stores normalized tokens as a tsvector, optionally with positions.

ALTER TABLE products ADD COLUMN fts_doc tsvector;
UPDATE products SET fts_doc = to_tsvector('english', name || ' ' || description);

Now, search queries are much faster!

4.Search with to_tsquery() or websearch_to_tsquery(): Finds matches based on lexemes and operators.

SELECT * FROM products WHERE fts_doc @@ to_tsquery('english', 'table & wood');

Finds “wooden table”, “tables made of wood”, etc.

5. Ranking and Indexing: Results are ranked and can be indexed using GIN (for speed) or GiST (for flexibility).

SELECT *, ts_rank(fts_doc, to_tsquery('english', 'table & wood')) AS rank FROM products ORDER BY rank DESC;

Higher-ranked results appear at the top.

What is tsvector in PostgreSQL?

In PostgreSQL, tsvector is a special data type designed to support Full-Text Search (FTS) operations. It stores a normalized representation of text, breaking the content into searchable tokens (called lexemes) and optionally recording their positions within the text. By converting raw text into a tsvector, PostgreSQL can search and rank documents more efficiently and accurately.

Key Features of tsvector

Normalization:

Removes stop words like “is”, “the”, and “a” to focus only on meaningful content.
Converts words into their lexeme form (the root/base form of the word).

Storage:

Stores lexemes in a sorted order, making them quick to search and compare.
Can include position information to enable advanced features like phrase search and proximity matching.

Indexing:

Optimized for search using GIN (Generalized Inverted Index) or GiST (Generalized Search Tree) indexes.
Indexes dramatically speed up search queries over large amounts of text.

Language Support:

tsvector works with multiple languages via PostgreSQL’s text search configurations (like English, French, etc.).
Language rules determine how text is tokenized and which stop words are removed.

Example: How tsvector Works

Imagine you have the following input text: When you convert this text into a tsvector (assuming the English text search configuration).

"PostgreSQL is a powerful database system."

Converted tsvector:

'databas':5 'power':4 'postgresql':1 'system':6

What’s happening here?

Words like “is” and “a” are considered stop words and are removed.
The remaining important words are:
- Postgresql
- Power
- Database
- System
Each word is stored with its position in the sentence:
- ‘postgresql’:1 – PostgreSQL is the 1st word
- ‘power’:4 – Power is the 4th word
- etc.

In short, tsvector is the foundation that makes Full-Text Search powerful, flexible, and fast in PostgreSQL.

Step-by-Step Implementation of Full-Text Search

1. Create a tsvector Column: Adds a new column named fts_doc to the item table. The column’s data type is tsvector, which is a special format optimized for searching text efficiently.

ALTER TABLE item ADD COLUMN fts_doc tsvector;

2. Create an Index for Faster Searches: Creates an index using GIN (Generalized Inverted Index) on the fts_doc column. Without an index, searching will require full table scans, making it very slow for large tables.

CREATE INDEX idx_fts_doc ON item USING GIN (fts_doc);

3. Create a function that automatically updates the tsvector on Insert/Update: Creates a PostgreSQL function that auto-generates a tsvector for every new or updated row. It combines multiple text columns (name, description, item_category.name) into one searchable field. COALESCE() ensures NULL values are treated as empty strings to avoid errors.

CREATE OR REPLACE FUNCTION update_fts_doc_item()                                                  
RETURNS TRIGGER AS $$
BEGIN
  NEW.fts_doc := to_tsvector('english',
    COALESCE(NEW.name, '') || ' ' || 
    COALESCE(NEW.description, '') || ' ' || 
    COALESCE((SELECT ic.name FROM item_category ic WHERE ic.id = NEW.item_category), '')
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

4. Attach a Trigger to Keep Data Updated: Creates a trigger that fires before every INSERT or UPDATE on the item table. This trigger calls the update_fts_doc_item() function automatically. Ensures that the fts_doc column is always updated whenever the row changes.

CREATE TRIGGER trigger_update_fts                                                                  
BEFORE INSERT OR UPDATE ON item
FOR EACH ROW EXECUTE FUNCTION update_fts_doc_item();

5. Perform a Search: Uses websearch_to_tsquery() to convert the search phrase into a form that can match the fts_doc. @@ is the match operator in PostgreSQL Full-Text Search. You can now perform Google-like natural searches instead of complicated SQL LIKE queries. websearch_to_tsquery() automatically understands AND, OR, and phrase search without needing special syntax.

SELECT * FROM item WHERE fts_doc @@ websearch_to_tsquery('english', 'reclining sofa');

Final Summary

Add a tsvector column to your table.
Create a GIN index on that column for fast searches.
Create a function and a trigger to auto-update the searchable text.
Use websearch_to_tsquery() to perform natural searches.

Advanced Features

Using websearch_to_tsquery() for User-Friendly Queries: This allows natural Google-like searches:

SELECT * FROM item WHERE fts_doc @@ websearch_to_tsquery('english', 'modern sofa');

No need for &, |, or ! operators!

Exclude Unwanted Words (Stop Words):

Edit: /usr/share/pgsql/tsearch_data/english.stop(For Linux).
Add the words you want to stop in english.stop file.
Restart PostgreSQL to apply changes.

furniture                                                                                         
color

Add Synonyms for Better Matching:

1. Create a Synonym Dictionary File in a ‘$SHAREDIR/tsearch_data/my_synonyms.syn’ location (where $SHAREDIR means the PostgreSQL installation’s shared-data directory). The file format is just one line per word to be substituted, with the word followed by its synonym, separated by white space. Blank lines and trailing spaces are ignored. For example:

couch sofa                                                                                       
settee sofa
Knole sofa
seat chair
timepiece clock

2. After adding synonyms words in the file, restart the PostgreSQL server and access my_synonyms.syn directly.

3. Define a Synonym Dictionary in PostgreSQL: You can then define a custom dictionary in PostgreSQL that uses this synonym file.

CREATE TEXT SEARCH DICTIONARY synonym_dict(                                                       
	template=synonym, 
	synonyms='my_synonyms'
);

4. Create a Text Search Configuration with the custom Synonym Dictionary and the default dictionary. Next, you create a text search configuration that uses the synonym dictionary.

CREATE TEXT SEARCH CONFIGURATION custom_search_configuration (copy=english);                      

ALTER TEXT SEARCH CONFIGURATION custom_search_configuration 
	ALTER MAPPING FOR asciiword WITH english_stem, synonym_dict;

5. Check your changes:

SELECT ts_lexize('synonym_dict', 'settee');
SELECT to_tsvector('custom_search_configuration', 'this is my timepiece');

Now, searching “couch” will return “sofa” too!

Common Mistakes to Avoid

Forgetting to create an Index: Without a GIN index, searches will be slow.
Using the Wrong Search Function: Use websearch_to_tsquery() for simple user input.
Ignoring Language Configuration: Specify the correct language in to_tsvector().

Conclusion

PostgreSQL Full-Text Search is a powerful, fast, and flexible solution to building smart search engines in your applications. It transforms basic text search into meaningful, indexed, and user-friendly functionality. Whether you’re building a product search, document indexer, or content platform, FTS is a tool worth mastering.

Start small, build your queries, and optimize with synonyms, dictionaries, and indexes!

Now it’s your turn!

Try implementing Full-Text Search in your PostgreSQL database and share your experience in the comments!

3Likes

Let’s make something together

Find us

Email us

Phone support