The encoder that produced this file for some reason included an empty image as
the cover art and was confusing the tests.
I left the null check in place to deal with this situation in the future. I
think returning None is better than returning the empty string (which is of
course not a valid image).
This is an effort to make the distance object feel slightly more dict-like.
The name changed and order of tuples is reversed: we now yield (key, value)
instead of (value, key), which I think is a little more intuitive.
We were incorrectly adding 1 to the length of options to avoid a divide
by zero, when we should instead default the length to 1. Otherwise we
skew the penalty towards zero.
The new Distance object knows how to perform various types of distance
calculations (expression, equality, number, priority, string).
It will keep track of each individual penalty that has been applied so
that we can utilise that information in the UI and when making decisions
about the recommendation level.
We now display the top 3 penalties (sorted by weight) on the release
list (and "..." if there are more than 3), and we display all penalties
on the album info line and track change line.
The implementation of the `max_rec` setting has been simplified by
removing duplicate validation and instead looking at the penalties that
have been applied to a distance. As a result, we can now configure a
maximum recommendation for any penalty that might be applied.
We have a few new checks when calculating album distance:
`match: preferred: countries` and `match: preferred: media` can each be
set to a list of countries and media in order of your preference. These
are empty by default. A value that matches the first item will have no
penalty, and a value that doesn't match any item will have an unweighted
penalty of 1.0.
If `match: preferred: original_year` is set to "yes", beets will apply
an unweighted penalty of 1.0 for each year of difference between the
release year and the original year.
We now configure individual weights for `mediums` (disctotal), `label`,
`catalognum`, `country` and `albumdisambig` instead of a single generic
`minor` weight. This gives more control, but more importantly separates
and names the applied penalties so that the UI can convey exactly which
fields have contributed to the overall distance penalty.
Likewise, `missing tracks` and `unmatched tracks` are penalised and
displayed in the UI separately, instead of a combined `partial` penalty.
Display non-MusicBrainz source in the disambiguation string, and
"source" in the list of penalties if a release is penalised for being
a non-MusicBrainz.
This isn't yet finished, it needs some input on how to organize the data, and actually where to implement the use of this data, but it already works in setting the date
The configuration was not loaded for these tests because they didn't inherit
the common test harness. These failures were hidden on my system because of
some kind of dependency on another test.
The initial idea for this refactor was motivated by the need to make
PluginQuery.match() have the same method signature as the match() methods on
other queries. That is, it needed to take an *item*, not the pattern and
value. (The pattern is supplied when the query is constructed.) So it made
sense to move the value-to-pattern code to a class method.
But then I realized that all the other FieldQuery subclasses needed to do
essentially the same thing. So I eliminated PluginQuery altogether and
refactored FieldQuery to subsume its functionality. I then changed all the
other FieldQuery subclasses to conform to the same pattern.
This has the side effect of allowing different kinds of queries (even
non-field queries) down the road.
The generic test harness now uses a temporary directory for beets' various
files as well as $HOME. As one packager pointed out, there were various test
failures when $HOME did not exist. This is no longer the case.
* util.prune_dirs modified to accept glob patterns as clutter to determine emptiness.
* config option, 'clutter' (a list of filenames/glob patterns)
* ImportTask.prune passes this option's value to prune_dirs.
For example, catalogue numbers like "[REACT217]". This shouldn't bypass the
nested multi-disc detection and automatically include all subdirs.
Do nested multi-disc detection first, so that `collapse_pat` is only set for
flattened albums, and we can skip the ancestry check on subsequent folders.
- Remove "part", "volume", "vol." multi-disc markers. These are often
part of album titles, and not necessarily indicative of a multi-disc
album. Only look for "CD X" and "disc X" (case insensitive), ignoring
white space and other non-word characters.
- Don't only expect each disc to be in a subdirectory of a common parent
directory, with all siblings belonging to the same release. Also match
any consecutive siblings (even when the parent contains other albums)
that are named with the same prefix and multi-disc marker.
- The `albums_in_dir(path)` function now always yields a list of paths
along with each list of items. `ItemTask.path` is now always a list of
paths.
- The `displayable_path(path)` function now accepts a list of paths, and
will join them with "; " by default. This can be changed with the
`separator` argument.
- The `sorted_walk()` function now does a case insensitive sort on
directories, but still returns case sensitive results. This allows
better multi-disc album detection.
- The `art_for_album()` function now takes a list of paths as its second
argument, instead of a single path.
The changes introduced in rc1 caused paths to be syspath-ified before they were
passed to os.path.abspath. The magic prefix caused them to be interpreted as
absolute paths even if they were relative. The fix is, in this *isolated*
case, to use Unicode but prefix-free paths in calls to the os.path.* functions.
Those functions need to act on Unicode objects but seem to be purely syntactic
-- nothing is tripped up by using long filenames without the magic prefix.
These tests were written when I knew almost nothing about Python and even less
about unittest. The class-generating magic never worked with nose for a crazy
reason I won't get into here. This has a bit more copypasta but the workings
are more obvious and we no longer generate enormous numbers of independent
tests. There should be a more representative number of dots in the test runner
output now.
Fixed a number of issues with the changes to fetchart:
- Remove redundant fetches. This was making the Amazon source download every
image twice even when art resizing was not enabled!
- Restore local_only switch in plugin hook, which got lost in the shuffle at
some point.
- Don't replace the original image file in-place; use a temporary file instead.
This would clobber the original source image on the filesystem with the
downscaled version!
This is an alternative to #58 that makes bytestring_path perform more like the
inverse of syspath on Windows. This way, we can convert to syspath, operate on
the path, and then bring back to internal representation without data loss. This
involves looking for the magic prefix on the Unicode string and removing it
before encoding to the internal (UTF-8) representation.
This has been a long time coming, but we now finally keep track of ReplayGain
values in the database. This is an intermediate step toward a refactoring of the
RG plugin; at the moment, these values are not actually saved!
This is fixed by allowing MediaFiles to convert strings to integers on
assignment. An eventual complete fix will perform these type conversions in the
Item interface.
When we store paths in the database, we always use bytestrings for consistency.
But on Windows, these paths are converted back to Unicode before they reach the
FS API. This means that the codec used internally is immaterial.
However, we were naively using sys.getfilesystemencoding() for this internal
representation. On Windows, this is MBCS, a broken encoding that can't represent
all of Unicode. This change replaces that with UTF-8, a "real" codec.
The decoding bit now tries UTF-8 and falls back to MBCS for compatibility with
existing databases. The reality, however, is that existing databases may not
work with this change -- a byte string may represent something different in
UTF-8 from what it represents in MBCS. So users should recreated their DBs if
anything goes wrong.
The 'decode' call fails in what is already a unicode string. I'm not
sure under what circumstances the string is or isn't unicode (apparently
it varies), so I added a check. The test passes with the patch, at
least.
This allows matches to indicate both missing and unmatched tracks in their
candidates and solves some of the spaghetti tuples that were passed around
during autotagging.
This replaces order_items with assign_items, the first step to allowing unequal
numbers of items on either side of the equation (user files and canonical
tracks). Rather than returning a "holey" list and assuming that the TrackInfo
objects stay static, the function returns a dictionary mapping Item objects to
TrackInfo objects. To indicate unmatched objects, two sets are also returned.
For the moment, some temporary code is included to turn the result from this
new function into the old format (a holey Item list). This allowed me to test
this change in isolation before plunging ahead with the necessary refactoring to
expose all of this to the importer workflow, etc.
This essential import pipeline stage is now two: one that applies metadata
changes and one that manipulates the filesystem. This will eventually allow
lastgenere to apply its changes before destinations are calculated.
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
Instead of parsing the template at each call to destination(), it's now possible
to parse them *once*, a priori, and re-use the resulting template object. This
is analogous to the re module's compiled expressions.
For a less cumbersome uniquifying string, only a single field value is now used
instead of a prefix of a list of fields. The old semantics had two problems that
made it both unnecessary and insufficient:
- In the vast majority of cases, a single field suffices (year OR label OR
catalog number, for example) and forcing the string to include many identical
fields is unnecessary.
- If the albums are very similar, a prefix may be insufficient; a better
solution may be found with an arbitrary subset. (Of course, we can't afford to
search the whole power set.)
So we're going with a single field for now. This should cause far less
confusion.
- Copying and moving are mutually exclusive. Moving overrides copying so the
user only has to add one line ("import_move: true") to disable copying and
enable moving in its place.
- Deleting is only possible when copying.
- Deprecating the "delete" option (moving is almost always better).
- Removed command-line switch for moving. It's somewhat "unsafe", so this
removes some potential for accidental irreversible changes.
- Changelog & thanks.
- Update docs to refer to import_move instead of import_delete as the
correct solution for ending up with only one copy of the file.
The new fields are:
ALBUM: mb_releasegroupid asin catalognum script language country albumstatus
media albumdisambig
TRACK: disctitle encoder
These are not yet parsed from MusicBrainz responses (just added to MediaFile
and the database).
There's no longer a distinction between Unix and Windows substitutions. Enough
users reported problems with Windows-forbidden characters on Samba shares that
it seems appropriate to make all filenames Windows-safe, even on Unix. Users who
really want those additional characters (<>:"?*|\) can re-enable them via the
"replace" option. Nobody has complained about beets being *too* conservative.
This also adds sanitization of control characters, which is an all-around good
idea, and the substitution now runs in the Unicode (rather than byte) domain.
Previously, there was just an "artist sort name" field -- now there's a
corresponding sort name for both track artists and album artists. I also made
the names shorter (artist_sort and albumartist_sort).
Generates disambiguating strings to distinguish albums from one another. To be
used as the basis for a simpler path field, $unique, as a default disambiguator.
Based on the "remove_duplicates" flag on ImportTask, the apply_choices coroutine
now looks for duplicates (using an extended version of the _duplicate_check
functions) and removes items from the library. It also *deletes* files
associated with those items when they are located inside the beets library
directory. Files outside of the directory are left on disk (but their DB entry
is still removed). This should "do the right thing" in most cases -- again, this
is something we can add a config option for if it comes up.