The changes introduced in rc1 caused paths to be syspath-ified before they were
passed to os.path.abspath. The magic prefix caused them to be interpreted as
absolute paths even if they were relative. The fix is, in this *isolated*
case, to use Unicode but prefix-free paths in calls to the os.path.* functions.
Those functions need to act on Unicode objects but seem to be purely syntactic
-- nothing is tripped up by using long filenames without the magic prefix.
These tests were written when I knew almost nothing about Python and even less
about unittest. The class-generating magic never worked with nose for a crazy
reason I won't get into here. This has a bit more copypasta but the workings
are more obvious and we no longer generate enormous numbers of independent
tests. There should be a more representative number of dots in the test runner
output now.
Fixed a number of issues with the changes to fetchart:
- Remove redundant fetches. This was making the Amazon source download every
image twice even when art resizing was not enabled!
- Restore local_only switch in plugin hook, which got lost in the shuffle at
some point.
- Don't replace the original image file in-place; use a temporary file instead.
This would clobber the original source image on the filesystem with the
downscaled version!
This is an alternative to #58 that makes bytestring_path perform more like the
inverse of syspath on Windows. This way, we can convert to syspath, operate on
the path, and then bring back to internal representation without data loss. This
involves looking for the magic prefix on the Unicode string and removing it
before encoding to the internal (UTF-8) representation.
This has been a long time coming, but we now finally keep track of ReplayGain
values in the database. This is an intermediate step toward a refactoring of the
RG plugin; at the moment, these values are not actually saved!
This is fixed by allowing MediaFiles to convert strings to integers on
assignment. An eventual complete fix will perform these type conversions in the
Item interface.
When we store paths in the database, we always use bytestrings for consistency.
But on Windows, these paths are converted back to Unicode before they reach the
FS API. This means that the codec used internally is immaterial.
However, we were naively using sys.getfilesystemencoding() for this internal
representation. On Windows, this is MBCS, a broken encoding that can't represent
all of Unicode. This change replaces that with UTF-8, a "real" codec.
The decoding bit now tries UTF-8 and falls back to MBCS for compatibility with
existing databases. The reality, however, is that existing databases may not
work with this change -- a byte string may represent something different in
UTF-8 from what it represents in MBCS. So users should recreated their DBs if
anything goes wrong.
The 'decode' call fails in what is already a unicode string. I'm not
sure under what circumstances the string is or isn't unicode (apparently
it varies), so I added a check. The test passes with the patch, at
least.
This allows matches to indicate both missing and unmatched tracks in their
candidates and solves some of the spaghetti tuples that were passed around
during autotagging.
This replaces order_items with assign_items, the first step to allowing unequal
numbers of items on either side of the equation (user files and canonical
tracks). Rather than returning a "holey" list and assuming that the TrackInfo
objects stay static, the function returns a dictionary mapping Item objects to
TrackInfo objects. To indicate unmatched objects, two sets are also returned.
For the moment, some temporary code is included to turn the result from this
new function into the old format (a holey Item list). This allowed me to test
this change in isolation before plunging ahead with the necessary refactoring to
expose all of this to the importer workflow, etc.
This essential import pipeline stage is now two: one that applies metadata
changes and one that manipulates the filesystem. This will eventually allow
lastgenere to apply its changes before destinations are calculated.
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
Instead of parsing the template at each call to destination(), it's now possible
to parse them *once*, a priori, and re-use the resulting template object. This
is analogous to the re module's compiled expressions.
For a less cumbersome uniquifying string, only a single field value is now used
instead of a prefix of a list of fields. The old semantics had two problems that
made it both unnecessary and insufficient:
- In the vast majority of cases, a single field suffices (year OR label OR
catalog number, for example) and forcing the string to include many identical
fields is unnecessary.
- If the albums are very similar, a prefix may be insufficient; a better
solution may be found with an arbitrary subset. (Of course, we can't afford to
search the whole power set.)
So we're going with a single field for now. This should cause far less
confusion.
- Copying and moving are mutually exclusive. Moving overrides copying so the
user only has to add one line ("import_move: true") to disable copying and
enable moving in its place.
- Deleting is only possible when copying.
- Deprecating the "delete" option (moving is almost always better).
- Removed command-line switch for moving. It's somewhat "unsafe", so this
removes some potential for accidental irreversible changes.
- Changelog & thanks.
- Update docs to refer to import_move instead of import_delete as the
correct solution for ending up with only one copy of the file.
The new fields are:
ALBUM: mb_releasegroupid asin catalognum script language country albumstatus
media albumdisambig
TRACK: disctitle encoder
These are not yet parsed from MusicBrainz responses (just added to MediaFile
and the database).
There's no longer a distinction between Unix and Windows substitutions. Enough
users reported problems with Windows-forbidden characters on Samba shares that
it seems appropriate to make all filenames Windows-safe, even on Unix. Users who
really want those additional characters (<>:"?*|\) can re-enable them via the
"replace" option. Nobody has complained about beets being *too* conservative.
This also adds sanitization of control characters, which is an all-around good
idea, and the substitution now runs in the Unicode (rather than byte) domain.
Previously, there was just an "artist sort name" field -- now there's a
corresponding sort name for both track artists and album artists. I also made
the names shorter (artist_sort and albumartist_sort).
Generates disambiguating strings to distinguish albums from one another. To be
used as the basis for a simpler path field, $unique, as a default disambiguator.
Based on the "remove_duplicates" flag on ImportTask, the apply_choices coroutine
now looks for duplicates (using an extended version of the _duplicate_check
functions) and removes items from the library. It also *deletes* files
associated with those items when they are located inside the beets library
directory. Files outside of the directory are left on disk (but their DB entry
is still removed). This should "do the right thing" in most cases -- again, this
is something we can add a config option for if it comes up.
For the recently-added samplerate, bitdepth, and channels properties on
MediaFile, a few things were fixed:
- tests in test_mediafile_basic
- never return None (zero when unavailable)
- make channels work with MP3 files (by looking at the codec "mode")
Also added some docstrings on all of the properties.
With import_delete enabled and performing a re-import (which moves files), the
former location of the file would be "deleted". This would lead to a "file does
not exist" error.
When a partial match is found, its first item (task.items[0]) may be None, and
_infer_album_fields would crash in this case. This solution walks through the
items list and finds the first non-None item.
Previously, an empty argument was treated as "not an argument at all". Now,
every function call always has at least one argument -- i.e., %foo{} is a
function call whose only argument is "" -- and %foo{,bar} is valid syntax.
This was causing a problem with situation where }} would have semantic meaning
other than escaping a }. Specifically, %func{%func{arg}} contains a }} but
should not escape the }. $} seems to cover this situation. However, ${ is not
permitted as an escape sequence because it looks like the beginning of a symbol
(variable reference) like ${foo}. This is OK because { can be used anywhere as a
literal.
- Plugins are sent the unadulterated, None-ridden ordered items lists. Changed
the lastid plugin to accommodate this.
- Make colorization optional in partial album warnings.
- Fix some tests.
In the function order_items, instead of automatically reject the
canonical candidate if it has more tracks, the function still tries to
find matches for the tracks amongst the items, and otherwise uses None
to fill the void in order to keep the information about the track
numbers
If the user has some songs from a specific album, but not all of them,
the real solution is immediately discarded. This commit is the first of a
series that will implement support for these incomplete albums.
The point of this patch is to make sure missing commits are taken into
account when calculating the distance between an album and its canonical
data.
Note that in order not to break API compatibility, the album_distance
call for the plugins receives a purged version of both the items and the
album info, resulting in some potential accuracy if the plugin bases
itself on the index of a track in album_info.tracks.
The interface no longer specifies the type of the image embedded in the file; it
just returns a bytestring blob. When a type must be stored, it is inferred using
the imghdr module, which shoudl reduce the potential for weird bugs when the
formats don't correspond.
"beet import -i" now tags items instead of albums. There are many loose ends to
tie up (marked with TODOs in the source):
- What to do about applying non-track metadata to matched tracks? Currently it's
just left in place.
- Plugin autotag candidates for tracks.
- No user querying yet.
- Non-autotagged -i import are unimplemented.
And, on top of those:
- Need to remove the action.TRACKS workflow and replace it with an option that
lets you jump over to the individual-track interface from the album tagger.
I'm shuffling around the feature-creeping importer code to keep it as
interface-agnostic as possible. The "importer" module now takes care of the
basic, increasingly complicated workflow while the ui.commands module is
relegated to containing actual user-interface stuff.
The import_resume option (nee import_progress) now exactly reflects the behavior
of -p and -P on the command line, which I think is way less confusing. That
option now has three settings: yes, no, and "ask" (the default). The "ask"
behavior cannot be specified on the command line, but I think that's OK. It's
also important to note that "no" means that progress is disabled entirely
(including saving progress for later resumes). The -q flag still overrides the
config option.
- Inference must be enabled explicitly with the "infer_aa" flag. It does not
happen transparently.
- Infer both artist and artist ID.
- Fixed a bug where only the database row was using the inferred data, not the
returned data structure.
- Added tests.
test_library.py so far just documents existing behavior of album
objects, especially the fact that trying to access unset fields causes
an AttributeError.
The default path formats now include both a "default", which is the same as
before but now uses $albumartist instead of $artist, and a "comp" path, which
uses a Compilations directory. Old paths are supported as-is by letting $artist
refer to either a track artist (when present, as it is in all old library
tracks) or album artist (when the track artist isn't present, as is the case
with most albums imported now).
I've essentially loaded up the string distance function with heuristics that
apply different weights to different kinds of string cruft that one encounters
in music tags. For example, tracks ending with "feat. Somebody" shouldn't be
penalized for all those extra characters. Now the weight of that part of the
string is significantly reduced.
This involves yet another new plugin method: album_distance. This leaves as the
last major puzzle piece for lastid the ability to augment the initial search
into MB (i.e., can start a search using fingerprinted metadata).
(I'm not sure why, but the weight for track index mismatches was set to 0.0.
This way, the tagger will be slightly more reluctant to frivolously reorder.)
When computing track destination paths, we now look for album-level values when
they're available. This has the effect of making albums go into a single
directory even when their tracks have heterogeneous metadata. We will need to
revisit this once we start explicitly supporting non-album tracks.
In the end, after all of this, it turns out that we basically need to abandon
the temptation of dealing with unicode paths altogether. The POSIX filesystem
API has no notion of unicode and is very much a bytes-only interface. This
means that undecodable pathnames are a reality we must deal with. This new
approach stores all paths as buffers (blobs) in SQLite and -- as transparently
as possible -- presents them as str objects to the Python code. Legacy
databases will have their paths automatically encoded into str objects, and
will lazily have their unicodes in the database replaced with buffers.
As part of this, the BaseLibrary class was also adapted to include a notion of
albums. This is reflected by the new BaseAlbum class, which the Album class
(formerly _AlbumInfo) completely replaces in the concrete Library. The BaseAlbum
class just fetches metadata from the underlying items.
In the case that Mutagen throws an exception while trying to read a file, we
throw an UnreadableFileError, which is a new superclass for FileTypeError.
This entailed:
- changing the "flac" storage style option to "etc" to encompass both
flac and vorbis as the tags are very similar
- permitting multiple StorageStyles per field/format, to allow a
read-any/store-all approach to multiple field options
current metadata to be correct if it's complete
Previously, we were using the Munkres algorithm (minimum bipartite matching) to
order tracks intelligently only as a fallback if the current metadata was
paradoxical or incomplete. This was because of a concern about the performance
of the potentially-O(n^3) Munkres solver. However, it was found that (a) the
performance is actually not bad, taking on the order of 0.02 to perform a
matching, and (b) there was no recourse for the tagger to reorder tracks that
were legitimately in the wrong order. Now, we get intelligent reordering of
badly tagged music even when the metadata seems to be complete.
To retain some of the functionality of the old orderer, the track distance
metric was expanded to include a component reflecting the track index.
In doing this, another bug was discovered in the UI that showed the track name
differences based on an arbitrary ordering. Now, the tag_album function returns
a reordered items list with every candidate.
This is especially important for read(), which will assign many times while, in many cases, causing few actual changes. A store() that follows soon after will now be much more lightweight.
--HG--
extra : convert_revision : svn%3A41726ec3-264d-0410-9c23-a9f1637257cc/trunk%4079
Also, new organization for tests and automatic loader. Fixed bugs uncovered by new tests.
--HG--
extra : convert_revision : svn%3A41726ec3-264d-0410-9c23-a9f1637257cc/trunk%4069