Doing test-specific cleanup in tearDown before general sandbox deletion helps
avoiding contamination of global state between tests when cleanup fails.
Current Windows status:
Ran 1106 tests in 72.373s
FAILED (SKIP=10, errors=13, failures=15)
Closer!
* `encode()` raises an error when the command returns with non-zero exit
status. We catch that in the higher-level conversion functions and skip to
the next item without writing tags.
* Simplified the handling of the `keep_new` flag.