When we store paths in the database, we always use bytestrings for consistency.
But on Windows, these paths are converted back to Unicode before they reach the
FS API. This means that the codec used internally is immaterial.
However, we were naively using sys.getfilesystemencoding() for this internal
representation. On Windows, this is MBCS, a broken encoding that can't represent
all of Unicode. This change replaces that with UTF-8, a "real" codec.
The decoding bit now tries UTF-8 and falls back to MBCS for compatibility with
existing databases. The reality, however, is that existing databases may not
work with this change -- a byte string may represent something different in
UTF-8 from what it represents in MBCS. So users should recreated their DBs if
anything goes wrong.
The 'decode' call fails in what is already a unicode string. I'm not
sure under what circumstances the string is or isn't unicode (apparently
it varies), so I added a check. The test passes with the patch, at
least.
This allows matches to indicate both missing and unmatched tracks in their
candidates and solves some of the spaghetti tuples that were passed around
during autotagging.
This replaces order_items with assign_items, the first step to allowing unequal
numbers of items on either side of the equation (user files and canonical
tracks). Rather than returning a "holey" list and assuming that the TrackInfo
objects stay static, the function returns a dictionary mapping Item objects to
TrackInfo objects. To indicate unmatched objects, two sets are also returned.
For the moment, some temporary code is included to turn the result from this
new function into the old format (a holey Item list). This allowed me to test
this change in isolation before plunging ahead with the necessary refactoring to
expose all of this to the importer workflow, etc.
This essential import pipeline stage is now two: one that applies metadata
changes and one that manipulates the filesystem. This will eventually allow
lastgenere to apply its changes before destinations are calculated.
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
Instead of parsing the template at each call to destination(), it's now possible
to parse them *once*, a priori, and re-use the resulting template object. This
is analogous to the re module's compiled expressions.
For a less cumbersome uniquifying string, only a single field value is now used
instead of a prefix of a list of fields. The old semantics had two problems that
made it both unnecessary and insufficient:
- In the vast majority of cases, a single field suffices (year OR label OR
catalog number, for example) and forcing the string to include many identical
fields is unnecessary.
- If the albums are very similar, a prefix may be insufficient; a better
solution may be found with an arbitrary subset. (Of course, we can't afford to
search the whole power set.)
So we're going with a single field for now. This should cause far less
confusion.
- Copying and moving are mutually exclusive. Moving overrides copying so the
user only has to add one line ("import_move: true") to disable copying and
enable moving in its place.
- Deleting is only possible when copying.
- Deprecating the "delete" option (moving is almost always better).
- Removed command-line switch for moving. It's somewhat "unsafe", so this
removes some potential for accidental irreversible changes.
- Changelog & thanks.
- Update docs to refer to import_move instead of import_delete as the
correct solution for ending up with only one copy of the file.
The new fields are:
ALBUM: mb_releasegroupid asin catalognum script language country albumstatus
media albumdisambig
TRACK: disctitle encoder
These are not yet parsed from MusicBrainz responses (just added to MediaFile
and the database).