1. ================
    
  2. Full text search
    
  3. ================
    
  4. 
    
  5. The database functions in the ``django.contrib.postgres.search`` module ease
    
  6. the use of PostgreSQL's `full text search engine
    
  7. <https://www.postgresql.org/docs/current/textsearch.html>`_.
    
  8. 
    
  9. For the examples in this document, we'll use the models defined in
    
  10. :doc:`/topics/db/queries`.
    
  11. 
    
  12. .. seealso::
    
  13. 
    
  14.     For a high-level overview of searching, see the :doc:`topic documentation
    
  15.     </topics/db/search>`.
    
  16. 
    
  17. .. currentmodule:: django.contrib.postgres.search
    
  18. 
    
  19. The ``search`` lookup
    
  20. =====================
    
  21. 
    
  22. .. fieldlookup:: search
    
  23. 
    
  24. A common way to use full text search is to search a single term against a
    
  25. single column in the database. For example::
    
  26. 
    
  27.     >>> Entry.objects.filter(body_text__search='Cheese')
    
  28.     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
    
  29. 
    
  30. This creates a ``to_tsvector`` in the database from the ``body_text`` field
    
  31. and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
    
  32. default database search configuration. The results are obtained by matching the
    
  33. query and the vector.
    
  34. 
    
  35. To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
    
  36. :setting:`INSTALLED_APPS`.
    
  37. 
    
  38. ``SearchVector``
    
  39. ================
    
  40. 
    
  41. .. class:: SearchVector(*expressions, config=None, weight=None)
    
  42. 
    
  43. Searching against a single field is great but rather limiting. The ``Entry``
    
  44. instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
    
  45. To query against both fields, use a ``SearchVector``::
    
  46. 
    
  47.     >>> from django.contrib.postgres.search import SearchVector
    
  48.     >>> Entry.objects.annotate(
    
  49.     ...     search=SearchVector('body_text', 'blog__tagline'),
    
  50.     ... ).filter(search='Cheese')
    
  51.     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
    
  52. 
    
  53. The arguments to ``SearchVector`` can be any
    
  54. :class:`~django.db.models.Expression` or the name of a field. Multiple
    
  55. arguments will be concatenated together using a space so that the search
    
  56. document includes them all.
    
  57. 
    
  58. ``SearchVector`` objects can be combined together, allowing you to reuse them.
    
  59. For example::
    
  60. 
    
  61.     >>> Entry.objects.annotate(
    
  62.     ...     search=SearchVector('body_text') + SearchVector('blog__tagline'),
    
  63.     ... ).filter(search='Cheese')
    
  64.     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
    
  65. 
    
  66. See :ref:`postgresql-fts-search-configuration` and
    
  67. :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
    
  68. and ``weight`` parameters.
    
  69. 
    
  70. ``SearchQuery``
    
  71. ===============
    
  72. 
    
  73. .. class:: SearchQuery(value, config=None, search_type='plain')
    
  74. 
    
  75. ``SearchQuery`` translates the terms the user provides into a search query
    
  76. object that the database compares to a search vector. By default, all the words
    
  77. the user provides are passed through the stemming algorithms, and then it
    
  78. looks for matches for all of the resulting terms.
    
  79. 
    
  80. If ``search_type`` is ``'plain'``, which is the default, the terms are treated
    
  81. as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
    
  82. as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
    
  83. formatted search query with terms and operators. If ``search_type`` is
    
  84. ``'websearch'``, then you can provide a formatted search query, similar to the
    
  85. one used by web search engines. ``'websearch'`` requires PostgreSQL ≥ 11. Read
    
  86. PostgreSQL's `Full Text Search docs`_ to learn about differences and syntax.
    
  87. Examples:
    
  88. 
    
  89. .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
    
  90. 
    
  91.     >>> from django.contrib.postgres.search import SearchQuery
    
  92.     >>> SearchQuery('red tomato')  # two keywords
    
  93.     >>> SearchQuery('tomato red')  # same results as above
    
  94.     >>> SearchQuery('red tomato', search_type='phrase')  # a phrase
    
  95.     >>> SearchQuery('tomato red', search_type='phrase')  # a different phrase
    
  96.     >>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw')  # boolean operators
    
  97.     >>> SearchQuery("'tomato' ('red' OR 'green')", search_type='websearch')  # websearch operators
    
  98. 
    
  99. ``SearchQuery`` terms can be combined logically to provide more flexibility::
    
  100. 
    
  101.     >>> from django.contrib.postgres.search import SearchQuery
    
  102.     >>> SearchQuery('meat') & SearchQuery('cheese')  # AND
    
  103.     >>> SearchQuery('meat') | SearchQuery('cheese')  # OR
    
  104.     >>> ~SearchQuery('meat')  # NOT
    
  105. 
    
  106. See :ref:`postgresql-fts-search-configuration` for an explanation of the
    
  107. ``config`` parameter.
    
  108. 
    
  109. ``SearchRank``
    
  110. ==============
    
  111. 
    
  112. .. class:: SearchRank(vector, query, weights=None, normalization=None, cover_density=False)
    
  113. 
    
  114. So far, we've returned the results for which any match between the vector and
    
  115. the query are possible. It's likely you may wish to order the results by some
    
  116. sort of relevancy. PostgreSQL provides a ranking function which takes into
    
  117. account how often the query terms appear in the document, how close together
    
  118. the terms are in the document, and how important the part of the document is
    
  119. where they occur. The better the match, the higher the value of the rank. To
    
  120. order by relevancy::
    
  121. 
    
  122.     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
    
  123.     >>> vector = SearchVector('body_text')
    
  124.     >>> query = SearchQuery('cheese')
    
  125.     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
    
  126.     [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
    
  127. 
    
  128. See :ref:`postgresql-fts-weighting-queries` for an explanation of the
    
  129. ``weights`` parameter.
    
  130. 
    
  131. Set the ``cover_density`` parameter to ``True`` to enable the cover density
    
  132. ranking, which means that the proximity of matching query terms is taken into
    
  133. account.
    
  134. 
    
  135. Provide an integer to the ``normalization`` parameter to control rank
    
  136. normalization. This integer is a bit mask, so you can combine multiple
    
  137. behaviors::
    
  138. 
    
  139.     >>> from django.db.models import Value
    
  140.     >>> Entry.objects.annotate(
    
  141.     ...     rank=SearchRank(
    
  142.     ...         vector,
    
  143.     ...         query,
    
  144.     ...         normalization=Value(2).bitor(Value(4)),
    
  145.     ...     )
    
  146.     ... )
    
  147. 
    
  148. The PostgreSQL documentation has more details about `different rank
    
  149. normalization options`_.
    
  150. 
    
  151. .. _different rank normalization options: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
    
  152. 
    
  153. ``SearchHeadline``
    
  154. ==================
    
  155. 
    
  156. .. class:: SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
    
  157. 
    
  158. Accepts a single text field or an expression, a query, a config, and a set of
    
  159. options. Returns highlighted search results.
    
  160. 
    
  161. Set the ``start_sel`` and ``stop_sel`` parameters to the string values to be
    
  162. used to wrap highlighted query terms in the document. PostgreSQL's defaults are
    
  163. ``<b>`` and ``</b>``.
    
  164. 
    
  165. Provide integer values to the ``max_words`` and ``min_words`` parameters to
    
  166. determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
    
  167. 15.
    
  168. 
    
  169. Provide an integer value to the ``short_word`` parameter to discard words of
    
  170. this length or less in each headline. PostgreSQL's default is 3.
    
  171. 
    
  172. Set the ``highlight_all`` parameter to ``True`` to use the whole document in
    
  173. place of a fragment and ignore ``max_words``, ``min_words``, and ``short_word``
    
  174. parameters. That's disabled by default in PostgreSQL.
    
  175. 
    
  176. Provide a non-zero integer value to the ``max_fragments`` to set the maximum
    
  177. number of fragments to display. That's disabled by default in PostgreSQL.
    
  178. 
    
  179. Set the ``fragment_delimiter`` string parameter to configure the delimiter
    
  180. between fragments. PostgreSQL's default is ``" ... "``.
    
  181. 
    
  182. The PostgreSQL documentation has more details on `highlighting search
    
  183. results`_.
    
  184. 
    
  185. Usage example::
    
  186. 
    
  187.     >>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
    
  188.     >>> query = SearchQuery('red tomato')
    
  189.     >>> entry = Entry.objects.annotate(
    
  190.     ...     headline=SearchHeadline(
    
  191.     ...         'body_text',
    
  192.     ...         query,
    
  193.     ...         start_sel='<span>',
    
  194.     ...         stop_sel='</span>',
    
  195.     ...     ),
    
  196.     ... ).get()
    
  197.     >>> print(entry.headline)
    
  198.     Sandwich with <span>tomato</span> and <span>red</span> cheese.
    
  199. 
    
  200. See :ref:`postgresql-fts-search-configuration` for an explanation of the
    
  201. ``config`` parameter.
    
  202. 
    
  203. .. _highlighting search results: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
    
  204. 
    
  205. .. _postgresql-fts-search-configuration:
    
  206. 
    
  207. Changing the search configuration
    
  208. =================================
    
  209. 
    
  210. You can specify the ``config`` attribute to a :class:`SearchVector` and
    
  211. :class:`SearchQuery` to use a different search configuration. This allows using
    
  212. different language parsers and dictionaries as defined by the database::
    
  213. 
    
  214.     >>> from django.contrib.postgres.search import SearchQuery, SearchVector
    
  215.     >>> Entry.objects.annotate(
    
  216.     ...     search=SearchVector('body_text', config='french'),
    
  217.     ... ).filter(search=SearchQuery('œuf', config='french'))
    
  218.     [<Entry: Pain perdu>]
    
  219. 
    
  220. The value of ``config`` could also be stored in another column::
    
  221. 
    
  222.     >>> from django.db.models import F
    
  223.     >>> Entry.objects.annotate(
    
  224.     ...     search=SearchVector('body_text', config=F('blog__language')),
    
  225.     ... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
    
  226.     [<Entry: Pain perdu>]
    
  227. 
    
  228. .. _postgresql-fts-weighting-queries:
    
  229. 
    
  230. Weighting queries
    
  231. =================
    
  232. 
    
  233. Every field may not have the same relevance in a query, so you can set weights
    
  234. of various vectors before you combine them::
    
  235. 
    
  236.     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
    
  237.     >>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
    
  238.     >>> query = SearchQuery('cheese')
    
  239.     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
    
  240. 
    
  241. The weight should be one of the following letters: D, C, B, A. By default,
    
  242. these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
    
  243. respectively. If you wish to weight them differently, pass a list of four
    
  244. floats to :class:`SearchRank` as ``weights`` in the same order above::
    
  245. 
    
  246.     >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
    
  247.     >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
    
  248. 
    
  249. Performance
    
  250. ===========
    
  251. 
    
  252. Special database configuration isn't necessary to use any of these functions,
    
  253. however, if you're searching more than a few hundred records, you're likely to
    
  254. run into performance problems. Full text search is a more intensive process
    
  255. than comparing the size of an integer, for example.
    
  256. 
    
  257. In the event that all the fields you're querying on are contained within one
    
  258. particular model, you can create a functional
    
  259. :class:`GIN <django.contrib.postgres.indexes.GinIndex>` or
    
  260. :class:`GiST <django.contrib.postgres.indexes.GistIndex>` index which matches
    
  261. the search vector you wish to use. For example::
    
  262. 
    
  263.     GinIndex(
    
  264.         SearchVector('body_text', 'headline', config='english'),
    
  265.         name='search_vector_idx',
    
  266.     )
    
  267. 
    
  268. The PostgreSQL documentation has details on
    
  269. `creating indexes for full text search
    
  270. <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
    
  271. 
    
  272. ``SearchVectorField``
    
  273. ---------------------
    
  274. 
    
  275. .. class:: SearchVectorField
    
  276. 
    
  277. If this approach becomes too slow, you can add a ``SearchVectorField`` to your
    
  278. model. You'll need to keep it populated with triggers, for example, as
    
  279. described in the `PostgreSQL documentation`_. You can then query the field as
    
  280. if it were an annotated ``SearchVector``::
    
  281. 
    
  282.     >>> Entry.objects.update(search_vector=SearchVector('body_text'))
    
  283.     >>> Entry.objects.filter(search_vector='cheese')
    
  284.     [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
    
  285. 
    
  286. .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
    
  287. 
    
  288. Trigram similarity
    
  289. ==================
    
  290. 
    
  291. Another approach to searching is trigram similarity. A trigram is a group of
    
  292. three consecutive characters. In addition to the :lookup:`trigram_similar` and
    
  293. :lookup:`trigram_word_similar` lookups, you can use a couple of other
    
  294. expressions.
    
  295. 
    
  296. To use them, you need to activate the `pg_trgm extension
    
  297. <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
    
  298. install it using the
    
  299. :class:`~django.contrib.postgres.operations.TrigramExtension` migration
    
  300. operation.
    
  301. 
    
  302. ``TrigramSimilarity``
    
  303. ---------------------
    
  304. 
    
  305. .. class:: TrigramSimilarity(expression, string, **extra)
    
  306. 
    
  307. Accepts a field name or expression, and a string or expression. Returns the
    
  308. trigram similarity between the two arguments.
    
  309. 
    
  310. Usage example::
    
  311. 
    
  312.     >>> from django.contrib.postgres.search import TrigramSimilarity
    
  313.     >>> Author.objects.create(name='Katy Stevens')
    
  314.     >>> Author.objects.create(name='Stephen Keats')
    
  315.     >>> test = 'Katie Stephens'
    
  316.     >>> Author.objects.annotate(
    
  317.     ...     similarity=TrigramSimilarity('name', test),
    
  318.     ... ).filter(similarity__gt=0.3).order_by('-similarity')
    
  319.     [<Author: Katy Stevens>, <Author: Stephen Keats>]
    
  320. 
    
  321. ``TrigramWordSimilarity``
    
  322. -------------------------
    
  323. 
    
  324. .. versionadded:: 4.0
    
  325. 
    
  326. .. class:: TrigramWordSimilarity(string, expression, **extra)
    
  327. 
    
  328. Accepts a string or expression, and a field name or expression. Returns the
    
  329. trigram word similarity between the two arguments.
    
  330. 
    
  331. Usage example::
    
  332. 
    
  333.     >>> from django.contrib.postgres.search import TrigramWordSimilarity
    
  334.     >>> Author.objects.create(name='Katy Stevens')
    
  335.     >>> Author.objects.create(name='Stephen Keats')
    
  336.     >>> test = 'Kat'
    
  337.     >>> Author.objects.annotate(
    
  338.     ...     similarity=TrigramWordSimilarity(test, 'name'),
    
  339.     ... ).filter(similarity__gt=0.3).order_by('-similarity')
    
  340.     [<Author: Katy Stevens>]
    
  341. 
    
  342. ``TrigramDistance``
    
  343. -------------------
    
  344. 
    
  345. .. class:: TrigramDistance(expression, string, **extra)
    
  346. 
    
  347. Accepts a field name or expression, and a string or expression. Returns the
    
  348. trigram distance between the two arguments.
    
  349. 
    
  350. Usage example::
    
  351. 
    
  352.     >>> from django.contrib.postgres.search import TrigramDistance
    
  353.     >>> Author.objects.create(name='Katy Stevens')
    
  354.     >>> Author.objects.create(name='Stephen Keats')
    
  355.     >>> test = 'Katie Stephens'
    
  356.     >>> Author.objects.annotate(
    
  357.     ...     distance=TrigramDistance('name', test),
    
  358.     ... ).filter(distance__lte=0.7).order_by('distance')
    
  359.     [<Author: Katy Stevens>, <Author: Stephen Keats>]
    
  360. 
    
  361. ``TrigramWordDistance``
    
  362. -----------------------
    
  363. 
    
  364. .. versionadded:: 4.0
    
  365. 
    
  366. .. class:: TrigramWordDistance(string, expression, **extra)
    
  367. 
    
  368. Accepts a string or expression, and a field name or expression. Returns the
    
  369. trigram word distance between the two arguments.
    
  370. 
    
  371. Usage example::
    
  372. 
    
  373.     >>> from django.contrib.postgres.search import TrigramWordDistance
    
  374.     >>> Author.objects.create(name='Katy Stevens')
    
  375.     >>> Author.objects.create(name='Stephen Keats')
    
  376.     >>> test = 'Kat'
    
  377.     >>> Author.objects.annotate(
    
  378.     ...     distance=TrigramWordDistance(test, 'name'),
    
  379.     ... ).filter(distance__lte=0.7).order_by('distance')
    
  380.     [<Author: Katy Stevens>]