To_tsquery creates a tsquery value from querytext, which must consist of single tokens separated by the tsquery operators & (AND), | (OR), ! (NOT), and (FOLLOWED BY), possibly grouped using parentheses. To_tsquery( querytext text) returns tsquery websearch_to_tsquery is a simplified version of to_tsquery with an alternative syntax, similar to the one used by web search engines. to_tsquery offers access to more features than either plainto_tsquery or phraseto_tsquery, but it is less forgiving about its input. PostgreSQL provides the functions to_tsquery, plainto_tsquery, phraseto_tsquery and websearch_to_tsquery for converting a query to the tsquery data type. ( Section 12.4.1 gives details about these operations.) Here we have used setweight to label the source of each lexeme in the finished tsvector, and then merged the labeled tsvector values using the tsvector concatenation operator ||. Setweight(to_tsvector(coalesce(body,'')), 'D') Setweight(to_tsvector(coalesce(abstract,'')), 'C') || Setweight(to_tsvector(coalesce(keyword,'')), 'B') || Setweight(to_tsvector(coalesce(title,'')), 'A') || Here is the recommended method for creating a tsvector from a structured document: Later, this information can be used for ranking of search results.īecause to_tsvector( NULL) will return NULL, it is recommended to use coalesce whenever a field might be null. This is typically used to mark entries coming from different parts of a document, such as title versus body. The function setweight can be used to label the entries of a tsvector with a given weight, where a weight is one of the letters A, B, C, or D. In our example we used the default configuration english for the English language. It is possible to have many different configurations in the same database, and predefined configurations are available for various languages. The choices of parser, dictionaries and which types of tokens to index are determined by the selected text search configuration ( Section 12.7). In this example that happened to the punctuation sign - because there are in fact no dictionaries assigned for its token type ( Space symbols), meaning space tokens will never be indexed. If no dictionary in the list recognizes the token then it is also ignored. Some words are recognized as stop words ( Section 12.6.1), which causes them to be ignored since they occur too frequently to be useful in searching. For example, rats became rat because one of the dictionaries recognized that the word rats is a plural form of rat. The first dictionary that recognizes the token emits one or more normalized lexemes to represent the token. For each token, a list of dictionaries ( Section 12.6) is consulted, where the list can vary depending on the token type. The to_tsvector function internally calls a parser which breaks the document text into tokens and assigns a type to each token. In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored. SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate a fat rats') The document is processed according to the specified or default text search configuration. To_tsvector parses a textual document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document. To_tsvector( document text) returns tsvector PostgreSQL provides the function to_tsvector for converting a document to the tsvector data type.