This uses two tables, item_attributes and album_attributes, as key/value
tables for the respective entities. Plugins register fields, at which point
they are magically materialized as properties on Items (Albums are not done).
This currently supports adding and modifying these fields but not retrieving
them (which will need some sort of join).
This message was being logged as an error every time MediaFile failed to parse a
file. But this is not actually an error -- the importer uses FileTypeErrors to
determine whether a file is music or not. This resulted in error logs for every
album art file, .m3u, etc. in the imported directory. Verbose output is a better
home for this message.
A user reported a problem with one of the logging statements where .format()
tried to convert a Unicode string to bytes because the log message was '', not
u''. As a rule, we should ensure that all logging statements use Unicode
literals.
With the new centralized print_obj function, we can greatly simplify the code
for the list command. This necessitated a couple of additional tweaks:
- For performance reasons, print_obj can now take a compiled template. (There's
still an issue with using the default/configured template, but we can cross
that bridge later).
- When listing albums, $path now expands to the album's item dir. So the format
string '$path' now exactly corresponds to passing the -p switch.
As an added bonus, we can now also reduce copypasta in the random plugin (which
behaves almost exactly the same as list).
This has been a long time coming, but we now finally keep track of ReplayGain
values in the database. This is an intermediate step toward a refactoring of the
RG plugin; at the moment, these values are not actually saved!
This should be used in *every* filename conversion, not just the destination
generation. Also required a change to sorted_walk (erroneously didn't use
syspath).
When we store paths in the database, we always use bytestrings for consistency.
But on Windows, these paths are converted back to Unicode before they reach the
FS API. This means that the codec used internally is immaterial.
However, we were naively using sys.getfilesystemencoding() for this internal
representation. On Windows, this is MBCS, a broken encoding that can't represent
all of Unicode. This change replaces that with UTF-8, a "real" codec.
The decoding bit now tries UTF-8 and falls back to MBCS for compatibility with
existing databases. The reality, however, is that existing databases may not
work with this change -- a byte string may represent something different in
UTF-8 from what it represents in MBCS. So users should recreated their DBs if
anything goes wrong.
This is the crux of the matter. This new lock synchronizes accesses to the
database itself, allowing only one thread to have transactions active at a time.
In effect, this makes sure that beets only has one SQLite transaction active at
a time (and is very conservative in this effort; no read sharing is allowed
either). This prevents all contention in SQLite and hides the pathological
HAVE_SLEEP=0 behavior.
We now properly synchronize access to _tx_stacks and _connections, which can be
concurrently mutated by different threads. This way I don't have to worry about
GIL semantics: DRF => SC!
The Library object is now (almost) safe to share across threads. Per-thread
resources are now automatically managed internally. There is no longer any need
for the _reopen_lib hack to get multiple copies of the Library object.
This is the first of several commits that will modernize the beets codebase for
Python 2.6 conventions. (Compatibility with Python 2.5 is hereby abandoned.)
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
Instead of parsing the template at each call to destination(), it's now possible
to parse them *once*, a priori, and re-use the resulting template object. This
is analogous to the re module's compiled expressions.
Previously, all files would be moved/copied/deleted before the corresponding
Items and Albums were added to the database. Now, the in-place items are added
to the database; the files are moved; and then the new paths are saved to the
DB. The apply_choices coroutine now executes two database transactions per task.
This has a couple of benefits:
- %aunique{} requires album structures to be in place before the destination()
call, so this now works as expected.
- As an added bonus, the "in_album" parameter to move() and destination() --
along with its associated ugly hacks -- is no longer required.
For a less cumbersome uniquifying string, only a single field value is now used
instead of a prefix of a list of fields. The old semantics had two problems that
made it both unnecessary and insufficient:
- In the vast majority of cases, a single field suffices (year OR label OR
catalog number, for example) and forcing the string to include many identical
fields is unnecessary.
- If the albums are very similar, a prefix may be insufficient; a better
solution may be found with an arbitrary subset. (Of course, we can't afford to
search the whole power set.)
So we're going with a single field for now. This should cause far less
confusion.
The new fields are:
ALBUM: mb_releasegroupid asin catalognum script language country albumstatus
media albumdisambig
TRACK: disctitle encoder
These are not yet parsed from MusicBrainz responses (just added to MediaFile
and the database).
There's no longer a distinction between Unix and Windows substitutions. Enough
users reported problems with Windows-forbidden characters on Samba shares that
it seems appropriate to make all filenames Windows-safe, even on Unix. Users who
really want those additional characters (<>:"?*|\) can re-enable them via the
"replace" option. Nobody has complained about beets being *too* conservative.
This also adds sanitization of control characters, which is an all-around good
idea, and the substitution now runs in the Unicode (rather than byte) domain.
Previously, there was just an "artist sort name" field -- now there's a
corresponding sort name for both track artists and album artists. I also made
the names shorter (artist_sort and albumartist_sort).
Generates disambiguating strings to distinguish albums from one another. To be
used as the basis for a simpler path field, $unique, as a default disambiguator.
Using a class wrapper allows additional context to be provided to the functions.
Namely, we can now provide the Item itself to the function, so %foo{} could
conceivably do something useful even without arguments. This will be used for
the upcoming %unique function.
This is incorrect when the file was out-of-sync when moved. A possible approach
in the future could check whether the old mtime was up to date and, in that case
only, keep it up to date with the new filename.
Previously, ResultIterators would query the database lazily. Specifically, they
would only fetch a row from the underlying cursor when an Item was pulled from
the iterator. This was a performance optimization. However, it was causing
endless headaches due to SQLite's locking policy: as long as the cursor is
"open", it holds a reader lock. This led to many hard-to-diagnose problems when
trying to acquire a writer lock. This solution may require a little more memory,
but it should put an end to this kind of bug for good.
Only files which were modified after beets checked them will be checked again.
Implements feature request #227
--HG--
extra : transplant_source : K%F1d%C5%B1%1F%CA%AB%95ck%8C%AC%25m%F0%26%E4%9DB
- Inference must be enabled explicitly with the "infer_aa" flag. It does not
happen transparently.
- Infer both artist and artist ID.
- Fixed a bug where only the database row was using the inferred data, not the
returned data structure.
- Added tests.
The default path formats now include both a "default", which is the same as
before but now uses $albumartist instead of $artist, and a "comp" path, which
uses a Compilations directory. Old paths are supported as-is by letting $artist
refer to either a track artist (when present, as it is in all old library
tracks) or album artist (when the track artist isn't present, as is the case
with most albums imported now).
This required the introduction of a track_distance method on plugins. We'll also
need to add an album_distance method as well as a mechanism for extending the
search routine (so we can search for albums in MusicBrainz even when they have
no tags). This commit also adds the '-v' flag for printing debug logs (something
we should do more of).
When computing track destination paths, we now look for album-level values when
they're available. This has the effect of making albums go into a single
directory even when their tracks have heterogeneous metadata. We will need to
revisit this once we start explicitly supporting non-album tracks.
In the end, after all of this, it turns out that we basically need to abandon
the temptation of dealing with unicode paths altogether. The POSIX filesystem
API has no notion of unicode and is very much a bytes-only interface. This
means that undecodable pathnames are a reality we must deal with. This new
approach stores all paths as buffers (blobs) in SQLite and -- as transparently
as possible -- presents them as str objects to the Python code. Legacy
databases will have their paths automatically encoded into str objects, and
will lazily have their unicodes in the database replaced with buffers.
Decoding a path as latin1 when it appears undecodable is a non-solution
because, the next time we want to actually *use* the path, it will be encoded
differently and the file won't be found. Death to undecodable paths!
So. Apparently, os.listdir() will *try* to give you Unicode when you give it
Unicode, but will occasionally give you bytestrings when it can't decode a
filename. Also, I've now had two separate reports from users whose filesystems
report a UTF-8 filesystem encoding but whose files contain latin1 characters.
The choices were to (a) switch over to bytestrings entirely for filenames or
(b) just deal with the badly-encoded filenames. Option (a) is very unattractive
because it requires me to store bytestrings in sqlite (which is not only
complicated but would require more code to deal with legacy databases) and
complicates the construction of pathnames from (Unicode) metadata. Therefore,
I've implemented a static fallback to latin1 if the default pathname decode
fails. Furthermore, if that also fails, the _sorted_walk function just ignores
the badly-encoded file (and logs an error).
This makes way more sense than fetching every metadata request from the
database. The performance of "beet ls -a" and the like should be drastically
better.
As part of this, the BaseLibrary class was also adapted to include a notion of
albums. This is reflected by the new BaseAlbum class, which the Album class
(formerly _AlbumInfo) completely replaces in the concrete Library. The BaseAlbum
class just fetches metadata from the underlying items.