docs: Rewrite Handling Paths chapter (pathlib vs utils) (#6116)

## Description

Updates the docs chapter "Handling Paths" describing how to modernise
old code and intentionally includes historical details. Examples should
further guide contributors while refactoring. 

Also moved the guide from the contribution guide into the dev docs.
This commit is contained in:
Sebastian Mohr 2025-10-28 13:02:25 +01:00 committed by GitHub
commit adc0d9e477
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 69 additions and 25 deletions

View file

@ -286,31 +286,6 @@ according to the specifications required by the project.
Similarly, run ``poe format-docs`` and ``poe lint-docs`` to ensure consistent
documentation formatting and check for any issues.
Handling Paths
~~~~~~~~~~~~~~
A great deal of convention deals with the handling of **paths**. Paths are
stored internally—in the database, for instance—as byte strings (i.e., ``bytes``
instead of ``str`` in Python 3). This is because POSIX operating systems path
names are only reliably usable as byte strings—operating systems typically
recommend but do not require that filenames use a given encoding, so violations
of any reported encoding are inevitable. On Windows, the strings are always
encoded with UTF-8; on Unix, the encoding is controlled by the filesystem. Here
are some guidelines to follow:
- If you have a Unicode path or youre not sure whether something is Unicode or
not, pass it through ``bytestring_path`` function in the ``beets.util`` module
to convert it to bytes.
- Pass every path name through the ``syspath`` function (also in ``beets.util``)
before sending it to any *operating system* file operation (``open``, for
example). This is necessary to use long filenames (which, maddeningly, must be
Unicode) on Windows. This allows us to consistently store bytes in the
database but use the native encoding rule on both POSIX and Windows.
- Similarly, the ``displayable_path`` utility function converts bytestring paths
to a Unicode string for displaying to the user. Every time you want to print
out a string to the terminal or log it with the ``logging`` module, feed it
through this function.
Editor Settings
~~~~~~~~~~~~~~~

View file

@ -26,6 +26,10 @@ For packagers:
Other changes:
- The documentation chapter :doc:`dev/paths` has been moved to the "For
Developers" section and revised to reflect current best practices (pathlib
usage).
2.5.1 (October 14, 2025)
------------------------

View file

@ -18,6 +18,7 @@ configuration files, respectively.
plugins/index
library
paths
importer
cli
../api/index

64
docs/dev/paths.rst Normal file
View file

@ -0,0 +1,64 @@
Handling Paths
==============
``pathlib`` provides a clean, cross-platform API for working with filesystem
paths.
Use the ``.filepath`` property on ``Item`` and ``Album`` library objects to
access paths as ``pathlib.Path`` objects. This produces a readable, native
representation suitable for printing, logging, or further processing.
Normalize paths using ``Path(...).expanduser().resolve()``, which expands ``~``
and resolves symlinks.
Cross-platform differences—such as path separators, Unicode handling, and
long-path support (Windows) are automatically managed by ``pathlib``.
When storing paths in the database, however, convert them to bytes with
``bytestring_path()``. Paths in Beets are currently stored as bytes, although
there are plans to eventually store ``pathlib.Path`` objects directly. To access
media file paths in their stored form, use the ``.path`` property on ``Item``
and ``Album``.
Legacy utilities
----------------
Historically, Beets used custom utilities to ensure consistent behavior across
Linux, macOS, and Windows before ``pathlib`` became reliable:
- ``syspath()``: worked around Windows Unicode and long-path limitations by
converting to a system-safe string (adding the ``\\?\`` prefix where needed).
- ``normpath()``: normalized slashes and removed ``./`` or ``..`` parts but did
not expand ``~``.
- ``bytestring_path()``: converted paths to bytes for database storage (still
used for that purpose today).
- ``displayable_path()``: converted byte paths to Unicode for display or
logging.
These functions remain safe to use in legacy code, but new code should rely
solely on ``pathlib.Path``.
Examples
--------
Old style
.. code-block:: python
displayable_path(item.path)
normpath("~/Music/../Artist")
syspath(path)
New style
.. code-block:: python
item.filepath
Path("~/Music/../Artist").expanduser().resolve()
Path(path)
When storing paths in the database
.. code-block:: python
path_bytes = bytestring_path(Path("/some/path/to/file.mp3"))