(I'm not sure why, but the weight for track index mismatches was set to 0.0.
This way, the tagger will be slightly more reluctant to frivolously reorder.)
So. Apparently, os.listdir() will *try* to give you Unicode when you give it
Unicode, but will occasionally give you bytestrings when it can't decode a
filename. Also, I've now had two separate reports from users whose filesystems
report a UTF-8 filesystem encoding but whose files contain latin1 characters.
The choices were to (a) switch over to bytestrings entirely for filenames or
(b) just deal with the badly-encoded filenames. Option (a) is very unattractive
because it requires me to store bytestrings in sqlite (which is not only
complicated but would require more code to deal with legacy databases) and
complicates the construction of pathnames from (Unicode) metadata. Therefore,
I've implemented a static fallback to latin1 if the default pathname decode
fails. Furthermore, if that also fails, the _sorted_walk function just ignores
the badly-encoded file (and logs an error).
Previously, importing without autotagging just imported a bunch of Items. Now,
like the autotagging version, "import -A" creates albums based on the directory
hierarchy. The effect is exactly as if the user chose "use as-is" every time in
the interactive procedure. One side effect is that "import -A" can now only take
directories, where previously it could take single items on the command line. We
need a new solution for this kind of import in the future.
On terminals where the LANG environment variable didn't list UTF-8 as the
terminal's character encoding, the Python print statement throws an error when
it encounters a character that can't be encoded. So now we manually use the
"replace" policy for all output to the terminal.