dbcore.Results: Avoid duplicate construction

Iterating over a results set multiple times should not take the same amount
each time. We now keep around the materialized objects and re-use them with
iteration. This solves a performance problem in the `play` plugin, which uses
len(results) multiple times and was therefore taking an unnecessary
performance hit when the query was slow.
This commit is contained in:
Adrian Sampson 2014-10-11 12:08:27 -07:00
parent a2ce367c64
commit ea94ce5eef
2 changed files with 30 additions and 9 deletions

View file

@ -459,14 +459,18 @@ class Results(object):
"""
def __init__(self, model_class, rows, db, query=None, sort=None):
"""Create a result set that will construct objects of type
`model_class`, which should be a subclass of `LibModel`, out of
the query result mapping in `rows`. The new objects are
associated with the database `db`.
If `query` is provided, it is used as a predicate to filter the results
for a "slow query" that cannot be evaluated by the database directly.
If `sort` is provided, it is used to sort the full list of results
before returning. This means it is a "slow sort" and all objects must
be built before returning the first one.
`model_class`.
`model_class` is a subclass of `LibModel` that will be
constructed. `rows` is a query result: a list of mappings. The
new objects will be associated with the database `db`.
If `query` is provided, it is used as a predicate to filter the
results for a "slow query" that cannot be evaluated by the
database directly. If `sort` is provided, it is used to sort the
full list of results before returning. This means it is a "slow
sort" and all objects must be built before returning the first
one.
"""
self.model_class = model_class
self.rows = rows
@ -474,16 +478,31 @@ class Results(object):
self.query = query
self.sort = sort
self._objects = [] # Model objects materialized *so far*.
self._row_iter = iter(self.rows) # Indicate next row to materialize.
def _get_objects(self):
"""Construct and generate Model objects for they query. The
objects are returned in the order emitted from the database; no
slow sort is applied.
For performance, this generator caches materialized objects to
avoid constructing them more than once. This way, iterating over
a `Results` object a second time should be much faster than the
first.
"""
for row in self.rows:
# Get the previously-materialized objects.
for object in self._objects:
yield object
# Now, for the rows that have not yet been processed, materialize
# objects and add them to the list.
for row in self._row_iter:
obj = self._make_model(row)
# If there is a slow-query predicate, ensurer that the
# object passes it.
if not self.query or self.query.match(obj):
self._objects.append(obj)
yield obj
def __iter__(self):

View file

@ -28,6 +28,8 @@ Fixes:
quantities (track numbers and durations), which was often confusing.
* Date-based queries that are malformed (not parse-able) no longer crash
beets and instead fail silently.
* Slow queries, such as those over flexible attributes, should now be much
faster when used with certain commands---notably, the :doc:`/plugins/play`.
1.3.8 (September 17, 2014)