The changes introduced in rc1 caused paths to be syspath-ified before they were
passed to os.path.abspath. The magic prefix caused them to be interpreted as
absolute paths even if they were relative. The fix is, in this *isolated*
case, to use Unicode but prefix-free paths in calls to the os.path.* functions.
Those functions need to act on Unicode objects but seem to be purely syntactic
-- nothing is tripped up by using long filenames without the magic prefix.
This is an alternative to #58 that makes bytestring_path perform more like the
inverse of syspath on Windows. This way, we can convert to syspath, operate on
the path, and then bring back to internal representation without data loss. This
involves looking for the magic prefix on the Unicode string and removing it
before encoding to the internal (UTF-8) representation.
When we store paths in the database, we always use bytestrings for consistency.
But on Windows, these paths are converted back to Unicode before they reach the
FS API. This means that the codec used internally is immaterial.
However, we were naively using sys.getfilesystemencoding() for this internal
representation. On Windows, this is MBCS, a broken encoding that can't represent
all of Unicode. This change replaces that with UTF-8, a "real" codec.
The decoding bit now tries UTF-8 and falls back to MBCS for compatibility with
existing databases. The reality, however, is that existing databases may not
work with this change -- a byte string may represent something different in
UTF-8 from what it represents in MBCS. So users should recreated their DBs if
anything goes wrong.
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
For a less cumbersome uniquifying string, only a single field value is now used
instead of a prefix of a list of fields. The old semantics had two problems that
made it both unnecessary and insufficient:
- In the vast majority of cases, a single field suffices (year OR label OR
catalog number, for example) and forcing the string to include many identical
fields is unnecessary.
- If the albums are very similar, a prefix may be insufficient; a better
solution may be found with an arbitrary subset. (Of course, we can't afford to
search the whole power set.)
So we're going with a single field for now. This should cause far less
confusion.
There's no longer a distinction between Unix and Windows substitutions. Enough
users reported problems with Windows-forbidden characters on Samba shares that
it seems appropriate to make all filenames Windows-safe, even on Unix. Users who
really want those additional characters (<>:"?*|\) can re-enable them via the
"replace" option. Nobody has complained about beets being *too* conservative.
This also adds sanitization of control characters, which is an all-around good
idea, and the substitution now runs in the Unicode (rather than byte) domain.
Generates disambiguating strings to distinguish albums from one another. To be
used as the basis for a simpler path field, $unique, as a default disambiguator.
Previously, an empty argument was treated as "not an argument at all". Now,
every function call always has at least one argument -- i.e., %foo{} is a
function call whose only argument is "" -- and %foo{,bar} is valid syntax.
- Inference must be enabled explicitly with the "infer_aa" flag. It does not
happen transparently.
- Infer both artist and artist ID.
- Fixed a bug where only the database row was using the inferred data, not the
returned data structure.
- Added tests.
The default path formats now include both a "default", which is the same as
before but now uses $albumartist instead of $artist, and a "comp" path, which
uses a Compilations directory. Old paths are supported as-is by letting $artist
refer to either a track artist (when present, as it is in all old library
tracks) or album artist (when the track artist isn't present, as is the case
with most albums imported now).