flexattr blog post

2026-02-24 16:23:04 +01:00 · 2013-09-02 18:18:46 -07:00 · 2013-09-02 18:18:46 -07:00 · 701b856874
commit 701b856874
parent ec4ef23733
1 changed files with 80 additions and 0 deletions
--- a/_posts/2013-09-02-flexattr.md
+++ b/_posts/2013-09-02-flexattr.md
@ -0,0 +1,80 @@
+---
+title: 'flexible attributes, in which schemaless and schemaful coexist peacefully'
+layout: main
+section: blog
+---
+Since [the beginning][first commit], beets has been conceptually built around an object--field data model. Your library is made up of *item* objects, which represent audio files, and each item has a bunch of *fields* corresponding to tags on those files: "artist," "title," and "year," for example.
+
+[first commit]: https://github.com/sampsyo/beets/commit/ee7bb4b9e8932cc186e46e7846a2d0006535c0c5
+
+Over the years, beets has grown to support a long [list of fields][]: by my count, we currently support 59. We've used an informal policy of adding almost every reasonable field that a user requests. But even this liberal approach can be too constricting: adding a new field at least requires getting the attention of a developer and then waiting for another beets release to roll around.
+
+[list of fields]: http://beets.readthedocs.org/en/latest/reference/pathformat.html#available-values
+
+But what if users could add any field they could think of, without needing to rely on changes to beets itself? Decoupling the list of fields from the beets schema would enable awesome new use cases, great new plugins that aren't possible today, and even simplify the development of some [oft-requested features][attachments].
+
+
+## Flexible Attributes
+
+The next version of beets, 1.3, overhauls the internals to support *flexible attributes* (affectionately known as flexattrs). You're no longer constrained to the fields that beets ships with: you can now store information in *any field name at all*. For example, maybe you'd like to start classifying your music based on its *mood*. Even though beets itself doesn't use a `mood` field, you can set it on your music using the [modify][] command:
+
+    $ beet modify mood=sexy artist:miguel
+
+Then, later, if you want to see all your music whose mood is "sexy," just use the [list][] command:
+
+    $ beet ls mood:sexy
+
+This is exciting enough on its own---you can classify music based on any criteria you can think of---but it also means many more possibilities for plugins in the future. ([Here](https://github.com/sampsyo/beets/issues/310) [a](https://github.com/sampsyo/beets/pull/101) [are][attachments] [few](https://github.com/sampsyo/beets/issues/266) [ideas](https://github.com/sampsyo/beets/issues/116).)
+
+[attachments]: https://github.com/sampsyo/beets/issues/111
+[modify]: http://beets.readthedocs.org/en/latest/reference/cli.html#modify
+[list]: http://beets.readthedocs.org/en/latest/reference/cli.html#list
+
+
+## How it Works
+
+I procrastinated for a long time in implementing flexible attributes because a clean, maintainable implementation was surprisingly tricky to get right. (This post is going to get a little nerdy from here on out; if you're interested in the technical details, read on.)
+
+As background, beets libraries are stored in [SQLite][] databases. The schema is simple: there's an `items` table and an `albums` table, and each field is a column on these tables. To support flexattrs, I initially considered switching to a schema-free database: [abusing SQLite as a key--value store][sqlite3dbm], [serializing the whole database to JSON][jsonshelve], [LevelDB][], or the like. With the help of other developers, however, I eventually realized that we could tack on flexible attributes alongside the existing *fixed* attributes and minimize the disruption.
+
+[SQLite]: http://www.sqlite.org/
+
+Here's what it looks like from a developer's perspective. If you write:
+
+    item.artist = 'Rhye'
+
+beets updates the fixed `artist` attribute, which is backed by an SQLite column. If you make up your own field name:
+
+    item.mood = 'ethereal'
+
+then beets detects that you're defining a flexattr and updates a separate key--value table instead. The idea here is that code that accesses fixed or flexible attributes should look identical---that way, we can prototype code using flexible attributes and eventually promote the attribute to built-in field with zero changes to the client code. Three cheers for sound abstractions!
+
+### Queries, Fast and Slow
+
+Things get slightly more complicated when querying. For example, to query the database for songs by [Miguel][], beets constructs a SQL query like:
+
+    SELECT * FROM items WHERE artist = 'Miguel';
+
+This query can be fast because it's performed entirely by the native SQLite library; no Python is involved. But to match on a flexible attribute, we'd need to generate complex query with a JOIN on the flexattr table. This gets especially hairy when wildcard-matching on any field or using a special query type like [regex queries][regex] or [fuzzy queries][fuzzy]. To avoid this complexity, beets instead evaluates queries involving flexattrs *in Python*. It fetches every item in the database and "manually" checks the query against each row. This is probably slower than the SQLite query used for purely fixed attribute queries, but it's probably not much slower---if at all---than the complex JOIN on the flexattr table that would otherwise be required.
+
+To support flexattr queries, beets now classifies all queries into *fast* and *slow* queries. As always, `Query` objects have a `clause()` method that returns the SQLite `WHERE` expression that evaluates them and a `match(item)` method that returns a boolean indicating whether a particular item passes the query. With the flexattr changes, the `clause()` method can now return `None`, indicating that the query is *slow* and cannot be evaluated in SQLite. The library then falls back to one-at-a-time predication.
+
+This fast/slow distinction also had the knock-on effect of simplifying the way beets handles "special" queries like [regexes][regex] and [fuzzy matching][fuzzy]. These query types can't easily be written in SQLite, so they're implemented as Python predicates. Previously, to shoehorn everything into a SQLite query, we had to [*register* a Python function as a callback][create_function] and then name the function in the `clause()` result. Crossing the Python/C barrier is not great for performance, so sticking to SQLite queries here was likely not buying us much. Now, any query involving a complex predicate implemented in Python is just evaluated using the slow path---eliminating the need for callback registration.
+
+### Next Steps
+
+I'm happy with the overall design for flexible attributes, but there's a laundry list of things we can do better with over time:
+
+* Ideally, it should be easy to "promote" a flexible attribute to a built-in fixed attribute. This almost works already, but we'll need a way to transparently move the data from the flexattr tables to the main `item` and `album` tables. We can cross this bridge when some field eventually needs migration.
+* A query that matches on both a fixed attribute and a flexible attribute, like ``artist:miguel mood:sexy``, is currently classified as "slow"---any slow component makes the whole query slow. This could be optimized by matching the fast component in SQLite and then filtering on the slow component in Python. In general, we need to measure the performance of this feature to see whether optimizations like this are necessary.
+* I'm not ruling out an eventual move to a completely schema-free design, perhaps using a simple key--value store like [LevelDB][] or a putative [SQLite4][sqlite4]. Eliminating the distinction between the two types of attributes will pay off in simplicity.
+* Beets needs a system for specifying value types. Currently, we have an ad-hoc data type classification that, for example, knows to print bitrates in KHz and track numbers with zero padding. We need a more principled way to specify conversions to and from strings---at the same time, we should let plugins specify this type information for attributes they define.
+
+[regex]: http://beets.readthedocs.org/en/latest/reference/query.html#regular-expressions
+[fuzzy]: http://beets.readthedocs.org/page/plugins/fuzzy.html
+[SQLite4]: http://sqlite.org/src4/doc/trunk/www/design.wiki
+[Miguel]: http://www.officialmiguel.com/home
+[LevelDB]: http://code.google.com/p/leveldb/
+[sqlite3dbm]: https://github.com/Yelp/sqlite3dbm/
+[jsonshelve]: https://github.com/sampsyo/jsonshelve
+[create_function]: http://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.create_function