1. ============================
    
  2. Database access optimization
    
  3. ============================
    
  4. 
    
  5. Django's database layer provides various ways to help developers get the most
    
  6. out of their databases. This document gathers together links to the relevant
    
  7. documentation, and adds various tips, organized under a number of headings that
    
  8. outline the steps to take when attempting to optimize your database usage.
    
  9. 
    
  10. Profile first
    
  11. =============
    
  12. 
    
  13. As general programming practice, this goes without saying. Find out :ref:`what
    
  14. queries you are doing and what they are costing you <faq-see-raw-sql-queries>`.
    
  15. Use :meth:`.QuerySet.explain` to understand how specific ``QuerySet``\s are
    
  16. executed by your database. You may also want to use an external project like
    
  17. django-debug-toolbar_, or a tool that monitors your database directly.
    
  18. 
    
  19. Remember that you may be optimizing for speed or memory or both, depending on
    
  20. your requirements. Sometimes optimizing for one will be detrimental to the
    
  21. other, but sometimes they will help each other. Also, work that is done by the
    
  22. database process might not have the same cost (to you) as the same amount of
    
  23. work done in your Python process. It is up to you to decide what your
    
  24. priorities are, where the balance must lie, and profile all of these as required
    
  25. since this will depend on your application and server.
    
  26. 
    
  27. With everything that follows, remember to profile after every change to ensure
    
  28. that the change is a benefit, and a big enough benefit given the decrease in
    
  29. readability of your code. **All** of the suggestions below come with the caveat
    
  30. that in your circumstances the general principle might not apply, or might even
    
  31. be reversed.
    
  32. 
    
  33. .. _django-debug-toolbar: https://github.com/jazzband/django-debug-toolbar/
    
  34. 
    
  35. Use standard DB optimization techniques
    
  36. =======================================
    
  37. 
    
  38. ...including:
    
  39. 
    
  40. * Indexes_. This is a number one priority, *after* you have determined from
    
  41.   profiling what indexes should be added. Use
    
  42.   :attr:`Meta.indexes <django.db.models.Options.indexes>` or
    
  43.   :attr:`Field.db_index <django.db.models.Field.db_index>` to add these from
    
  44.   Django. Consider adding indexes to fields that you frequently query using
    
  45.   :meth:`~django.db.models.query.QuerySet.filter()`,
    
  46.   :meth:`~django.db.models.query.QuerySet.exclude()`,
    
  47.   :meth:`~django.db.models.query.QuerySet.order_by()`, etc. as indexes may help
    
  48.   to speed up lookups. Note that determining the best indexes is a complex
    
  49.   database-dependent topic that will depend on your particular application.
    
  50.   The overhead of maintaining an index may outweigh any gains in query speed.
    
  51. 
    
  52. .. _Indexes: https://en.wikipedia.org/wiki/Database_index
    
  53. 
    
  54. * Appropriate use of field types.
    
  55. 
    
  56. We will assume you have done the things listed above. The rest of this document
    
  57. focuses on how to use Django in such a way that you are not doing unnecessary
    
  58. work. This document also does not address other optimization techniques that
    
  59. apply to all expensive operations, such as :doc:`general purpose caching
    
  60. </topics/cache>`.
    
  61. 
    
  62. Understand ``QuerySet``\s
    
  63. =========================
    
  64. 
    
  65. Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
    
  66. performance with simple code. In particular:
    
  67. 
    
  68. Understand ``QuerySet`` evaluation
    
  69. ----------------------------------
    
  70. 
    
  71. To avoid performance problems, it is important to understand:
    
  72. 
    
  73. * that :ref:`QuerySets are lazy <querysets-are-lazy>`.
    
  74. 
    
  75. * when :ref:`they are evaluated <when-querysets-are-evaluated>`.
    
  76. 
    
  77. * how :ref:`the data is held in memory <caching-and-querysets>`.
    
  78. 
    
  79. Understand cached attributes
    
  80. ----------------------------
    
  81. 
    
  82. As well as caching of the whole ``QuerySet``, there is caching of the result of
    
  83. attributes on ORM objects. In general, attributes that are not callable will be
    
  84. cached. For example, assuming the :ref:`example blog models
    
  85. <queryset-model-example>`::
    
  86. 
    
  87.   >>> entry = Entry.objects.get(id=1)
    
  88.   >>> entry.blog   # Blog object is retrieved at this point
    
  89.   >>> entry.blog   # cached version, no DB access
    
  90. 
    
  91. But in general, callable attributes cause DB lookups every time::
    
  92. 
    
  93.   >>> entry = Entry.objects.get(id=1)
    
  94.   >>> entry.authors.all()   # query performed
    
  95.   >>> entry.authors.all()   # query performed again
    
  96. 
    
  97. Be careful when reading template code - the template system does not allow use
    
  98. of parentheses, but will call callables automatically, hiding the above
    
  99. distinction.
    
  100. 
    
  101. Be careful with your own custom properties - it is up to you to implement
    
  102. caching when required, for example using the
    
  103. :class:`~django.utils.functional.cached_property` decorator.
    
  104. 
    
  105. Use the ``with`` template tag
    
  106. -----------------------------
    
  107. 
    
  108. To make use of the caching behavior of ``QuerySet``, you may need to use the
    
  109. :ttag:`with` template tag.
    
  110. 
    
  111. Use ``iterator()``
    
  112. ------------------
    
  113. 
    
  114. When you have a lot of objects, the caching behavior of the ``QuerySet`` can
    
  115. cause a large amount of memory to be used. In this case,
    
  116. :meth:`~django.db.models.query.QuerySet.iterator()` may help.
    
  117. 
    
  118. Use ``explain()``
    
  119. -----------------
    
  120. 
    
  121. :meth:`.QuerySet.explain` gives you detailed information about how the database
    
  122. executes a query, including indexes and joins that are used. These details may
    
  123. help you find queries that could be rewritten more efficiently, or identify
    
  124. indexes that could be added to improve performance.
    
  125. 
    
  126. Do database work in the database rather than in Python
    
  127. ======================================================
    
  128. 
    
  129. For instance:
    
  130. 
    
  131. * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
    
  132.   filtering in the database.
    
  133. 
    
  134. * Use :class:`F expressions <django.db.models.F>` to filter
    
  135.   based on other fields within the same model.
    
  136. 
    
  137. * Use :doc:`annotate to do aggregation in the database
    
  138.   </topics/db/aggregation>`.
    
  139. 
    
  140. If these aren't enough to generate the SQL you need:
    
  141. 
    
  142. Use ``RawSQL``
    
  143. --------------
    
  144. 
    
  145. A less portable but more powerful method is the
    
  146. :class:`~django.db.models.expressions.RawSQL` expression, which allows some SQL
    
  147. to be explicitly added to the query. If that still isn't powerful enough:
    
  148. 
    
  149. Use raw SQL
    
  150. -----------
    
  151. 
    
  152. Write your own :doc:`custom SQL to retrieve data or populate models
    
  153. </topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
    
  154. is writing for you and start from there.
    
  155. 
    
  156. Retrieve individual objects using a unique, indexed column
    
  157. ==========================================================
    
  158. 
    
  159. There are two reasons to use a column with
    
  160. :attr:`~django.db.models.Field.unique` or
    
  161. :attr:`~django.db.models.Field.db_index` when using
    
  162. :meth:`~django.db.models.query.QuerySet.get` to retrieve individual objects.
    
  163. First, the query will be quicker because of the underlying database index.
    
  164. Also, the query could run much slower if multiple objects match the lookup;
    
  165. having a unique constraint on the column guarantees this will never happen.
    
  166. 
    
  167. So using the :ref:`example blog models <queryset-model-example>`::
    
  168. 
    
  169.   >>> entry = Entry.objects.get(id=10)
    
  170. 
    
  171. will be quicker than:
    
  172. 
    
  173.   >>> entry = Entry.objects.get(headline="News Item Title")
    
  174. 
    
  175. because ``id`` is indexed by the database and is guaranteed to be unique.
    
  176. 
    
  177. Doing the following is potentially quite slow:
    
  178. 
    
  179.   >>> entry = Entry.objects.get(headline__startswith="News")
    
  180. 
    
  181. First of all, ``headline`` is not indexed, which will make the underlying
    
  182. database fetch slower.
    
  183. 
    
  184. Second, the lookup doesn't guarantee that only one object will be returned.
    
  185. If the query matches more than one object, it will retrieve and transfer all of
    
  186. them from the database. This penalty could be substantial if hundreds or
    
  187. thousands of records are returned. The penalty will be compounded if the
    
  188. database lives on a separate server, where network overhead and latency also
    
  189. play a factor.
    
  190. 
    
  191. Retrieve everything at once if you know you will need it
    
  192. ========================================================
    
  193. 
    
  194. Hitting the database multiple times for different parts of a single 'set' of
    
  195. data that you will need all parts of is, in general, less efficient than
    
  196. retrieving it all in one query. This is particularly important if you have a
    
  197. query that is executed in a loop, and could therefore end up doing many database
    
  198. queries, when only one was needed. So:
    
  199. 
    
  200. Use ``QuerySet.select_related()`` and ``prefetch_related()``
    
  201. ------------------------------------------------------------
    
  202. 
    
  203. Understand :meth:`~django.db.models.query.QuerySet.select_related` and
    
  204. :meth:`~django.db.models.query.QuerySet.prefetch_related` thoroughly, and use
    
  205. them:
    
  206. 
    
  207. * in :doc:`managers and default managers </topics/db/managers>` where
    
  208.   appropriate. Be aware when your manager is and is not used; sometimes this is
    
  209.   tricky so don't make assumptions.
    
  210. 
    
  211. * in view code or other layers, possibly making use of
    
  212.   :func:`~django.db.models.prefetch_related_objects` where needed.
    
  213. 
    
  214. Don't retrieve things you don't need
    
  215. ====================================
    
  216. 
    
  217. Use ``QuerySet.values()`` and ``values_list()``
    
  218. -----------------------------------------------
    
  219. 
    
  220. When you only want a ``dict`` or ``list`` of values, and don't need ORM model
    
  221. objects, make appropriate usage of
    
  222. :meth:`~django.db.models.query.QuerySet.values()`.
    
  223. These can be useful for replacing model objects in template code - as long as
    
  224. the dicts you supply have the same attributes as those used in the template,
    
  225. you are fine.
    
  226. 
    
  227. Use ``QuerySet.defer()`` and ``only()``
    
  228. ---------------------------------------
    
  229. 
    
  230. Use :meth:`~django.db.models.query.QuerySet.defer()` and
    
  231. :meth:`~django.db.models.query.QuerySet.only()` if there are database columns
    
  232. you know that you won't need (or won't need in most cases) to avoid loading
    
  233. them. Note that if you *do* use them, the ORM will have to go and get them in
    
  234. a separate query, making this a pessimization if you use it inappropriately.
    
  235. 
    
  236. Don't be too aggressive in deferring fields without profiling as the database
    
  237. has to read most of the non-text, non-``VARCHAR`` data from the disk for a
    
  238. single row in the results, even if it ends up only using a few columns. The
    
  239. ``defer()`` and ``only()`` methods are most useful when you can avoid loading a
    
  240. lot of text data or for fields that might take a lot of processing to convert
    
  241. back to Python. As always, profile first, then optimize.
    
  242. 
    
  243. Use ``QuerySet.contains(obj)``
    
  244. ------------------------------
    
  245. 
    
  246. ...if you only want to find out if ``obj`` is in the queryset, rather than
    
  247. ``if obj in queryset``.
    
  248. 
    
  249. Use ``QuerySet.count()``
    
  250. ------------------------
    
  251. 
    
  252. ...if you only want the count, rather than doing ``len(queryset)``.
    
  253. 
    
  254. Use ``QuerySet.exists()``
    
  255. -------------------------
    
  256. 
    
  257. ...if you only want to find out if at least one result exists, rather than ``if
    
  258. queryset``.
    
  259. 
    
  260. But:
    
  261. 
    
  262. .. _overuse_of_count_and_exists:
    
  263. 
    
  264. Don't overuse ``contains()``, ``count()``, and ``exists()``
    
  265. -----------------------------------------------------------
    
  266. 
    
  267. If you are going to need other data from the QuerySet, evaluate it immediately.
    
  268. 
    
  269. For example, assuming a ``Group`` model that has a many-to-many relation to
    
  270. ``User``, the following code is optimal::
    
  271. 
    
  272.     members = group.members.all()
    
  273. 
    
  274.     if display_group_members:
    
  275.         if members:
    
  276.             if current_user in members:
    
  277.                 print("You and", len(members) - 1, "other users are members of this group.")
    
  278.             else:
    
  279.                 print("There are", len(members), "members in this group.")
    
  280. 
    
  281.             for member in members:
    
  282.                 print(member.username)
    
  283.         else:
    
  284.             print("There are no members in this group.")
    
  285. 
    
  286. It is optimal because:
    
  287. 
    
  288. #. Since QuerySets are lazy, this does no database queries if
    
  289.    ``display_group_members`` is ``False``.
    
  290. 
    
  291. #. Storing ``group.members.all()`` in the ``members`` variable allows its
    
  292.    result cache to be reused.
    
  293. 
    
  294. #. The line ``if members:`` causes ``QuerySet.__bool__()`` to be called, which
    
  295.    causes the ``group.members.all()`` query to be run on the database. If there
    
  296.    aren't any results, it will return ``False``, otherwise ``True``.
    
  297. 
    
  298. #. The line ``if current_user in members:`` checks if the user is in the result
    
  299.    cache, so no additional database queries are issued.
    
  300. 
    
  301. #. The use of ``len(members)`` calls ``QuerySet.__len__()``, reusing the result
    
  302.    cache, so again, no database queries are issued.
    
  303. 
    
  304. #. The ``for member`` loop iterates over the result cache.
    
  305. 
    
  306. In total, this code does either one or zero database queries. The only
    
  307. deliberate optimization performed is using the ``members`` variable. Using
    
  308. ``QuerySet.exists()`` for the ``if``, ``QuerySet.contains()`` for the ``in``,
    
  309. or ``QuerySet.count()`` for the count would each cause additional queries.
    
  310. 
    
  311. Use ``QuerySet.update()`` and ``delete()``
    
  312. ------------------------------------------
    
  313. 
    
  314. Rather than retrieve a load of objects, set some values, and save them
    
  315. individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
    
  316. <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
    
  317. <topics-db-queries-delete>` where possible.
    
  318. 
    
  319. Note, however, that these bulk update methods cannot call the ``save()`` or
    
  320. ``delete()`` methods of individual instances, which means that any custom
    
  321. behavior you have added for these methods will not be executed, including
    
  322. anything driven from the normal database object :doc:`signals </ref/signals>`.
    
  323. 
    
  324. Use foreign key values directly
    
  325. -------------------------------
    
  326. 
    
  327. If you only need a foreign key value, use the foreign key value that is already on
    
  328. the object you've got, rather than getting the whole related object and taking
    
  329. its primary key. i.e. do::
    
  330. 
    
  331.    entry.blog_id
    
  332. 
    
  333. instead of::
    
  334. 
    
  335.    entry.blog.id
    
  336. 
    
  337. Don't order results if you don't care
    
  338. -------------------------------------
    
  339. 
    
  340. Ordering is not free; each field to order by is an operation the database must
    
  341. perform. If a model has a default ordering (:attr:`Meta.ordering
    
  342. <django.db.models.Options.ordering>`) and you don't need it, remove
    
  343. it on a ``QuerySet`` by calling
    
  344. :meth:`~django.db.models.query.QuerySet.order_by()` with no parameters.
    
  345. 
    
  346. Adding an index to your database may help to improve ordering performance.
    
  347. 
    
  348. Use bulk methods
    
  349. ================
    
  350. 
    
  351. Use bulk methods to reduce the number of SQL statements.
    
  352. 
    
  353. Create in bulk
    
  354. --------------
    
  355. 
    
  356. When creating objects, where possible, use the
    
  357. :meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
    
  358. number of SQL queries. For example::
    
  359. 
    
  360.     Entry.objects.bulk_create([
    
  361.         Entry(headline='This is a test'),
    
  362.         Entry(headline='This is only a test'),
    
  363.     ])
    
  364. 
    
  365. ...is preferable to::
    
  366. 
    
  367.     Entry.objects.create(headline='This is a test')
    
  368.     Entry.objects.create(headline='This is only a test')
    
  369. 
    
  370. Note that there are a number of :meth:`caveats to this method
    
  371. <django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
    
  372. for your use case.
    
  373. 
    
  374. Update in bulk
    
  375. --------------
    
  376. 
    
  377. When updating objects, where possible, use the
    
  378. :meth:`~django.db.models.query.QuerySet.bulk_update()` method to reduce the
    
  379. number of SQL queries. Given a list or queryset of objects::
    
  380. 
    
  381.     entries = Entry.objects.bulk_create([
    
  382.         Entry(headline='This is a test'),
    
  383.         Entry(headline='This is only a test'),
    
  384.     ])
    
  385. 
    
  386. The following example::
    
  387. 
    
  388.     entries[0].headline = 'This is not a test'
    
  389.     entries[1].headline = 'This is no longer a test'
    
  390.     Entry.objects.bulk_update(entries, ['headline'])
    
  391. 
    
  392. ...is preferable to::
    
  393. 
    
  394.     entries[0].headline = 'This is not a test'
    
  395.     entries[0].save()
    
  396.     entries[1].headline = 'This is no longer a test'
    
  397.     entries[1].save()
    
  398. 
    
  399. Note that there are a number of :meth:`caveats to this method
    
  400. <django.db.models.query.QuerySet.bulk_update>`, so make sure it's appropriate
    
  401. for your use case.
    
  402. 
    
  403. Insert in bulk
    
  404. --------------
    
  405. 
    
  406. When inserting objects into :class:`ManyToManyFields
    
  407. <django.db.models.ManyToManyField>`, use
    
  408. :meth:`~django.db.models.fields.related.RelatedManager.add` with multiple
    
  409. objects to reduce the number of SQL queries. For example::
    
  410. 
    
  411.     my_band.members.add(me, my_friend)
    
  412. 
    
  413. ...is preferable to::
    
  414. 
    
  415.     my_band.members.add(me)
    
  416.     my_band.members.add(my_friend)
    
  417. 
    
  418. ...where ``Bands`` and ``Artists`` have a many-to-many relationship.
    
  419. 
    
  420. When inserting different pairs of objects into
    
  421. :class:`~django.db.models.ManyToManyField` or when the custom
    
  422. :attr:`~django.db.models.ManyToManyField.through` table is defined, use
    
  423. :meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
    
  424. number of SQL queries. For example::
    
  425. 
    
  426.     PizzaToppingRelationship = Pizza.toppings.through
    
  427.     PizzaToppingRelationship.objects.bulk_create([
    
  428.         PizzaToppingRelationship(pizza=my_pizza, topping=pepperoni),
    
  429.         PizzaToppingRelationship(pizza=your_pizza, topping=pepperoni),
    
  430.         PizzaToppingRelationship(pizza=your_pizza, topping=mushroom),
    
  431.     ], ignore_conflicts=True)
    
  432. 
    
  433. ...is preferable to::
    
  434. 
    
  435.     my_pizza.toppings.add(pepperoni)
    
  436.     your_pizza.toppings.add(pepperoni, mushroom)
    
  437. 
    
  438. ...where ``Pizza`` and ``Topping`` have a many-to-many relationship. Note that
    
  439. there are a number of :meth:`caveats to this method
    
  440. <django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
    
  441. for your use case.
    
  442. 
    
  443. Remove in bulk
    
  444. --------------
    
  445. 
    
  446. When removing objects from :class:`ManyToManyFields
    
  447. <django.db.models.ManyToManyField>`, use
    
  448. :meth:`~django.db.models.fields.related.RelatedManager.remove` with multiple
    
  449. objects to reduce the number of SQL queries. For example::
    
  450. 
    
  451.     my_band.members.remove(me, my_friend)
    
  452. 
    
  453. ...is preferable to::
    
  454. 
    
  455.     my_band.members.remove(me)
    
  456.     my_band.members.remove(my_friend)
    
  457. 
    
  458. ...where ``Bands`` and ``Artists`` have a many-to-many relationship.
    
  459. 
    
  460. When removing different pairs of objects from :class:`ManyToManyFields
    
  461. <django.db.models.ManyToManyField>`, use
    
  462. :meth:`~django.db.models.query.QuerySet.delete` on a
    
  463. :class:`~django.db.models.Q` expression with multiple
    
  464. :attr:`~django.db.models.ManyToManyField.through`  model instances to reduce
    
  465. the number of SQL queries. For example::
    
  466. 
    
  467.     from django.db.models import Q
    
  468.     PizzaToppingRelationship = Pizza.toppings.through
    
  469.     PizzaToppingRelationship.objects.filter(
    
  470.         Q(pizza=my_pizza, topping=pepperoni) |
    
  471.         Q(pizza=your_pizza, topping=pepperoni) |
    
  472.         Q(pizza=your_pizza, topping=mushroom)
    
  473.     ).delete()
    
  474. 
    
  475. ...is preferable to::
    
  476. 
    
  477.     my_pizza.toppings.remove(pepperoni)
    
  478.     your_pizza.toppings.remove(pepperoni, mushroom)
    
  479. 
    
  480. ...where ``Pizza`` and ``Topping`` have a many-to-many relationship.