Fix Discogs ID extractor to support short format

- Often discogs release links used to be written as discogs.com/release/<id>
- Extend one of the existing regex patterns to support that by making the
  trailing dash (-) optional.
- Save a new test regex on regex101.com and update the link to it.
This commit is contained in:
J0J0 Todos 2023-01-11 11:37:53 +01:00
parent aaa4cfce49
commit c48fa0a830

View file

@ -47,14 +47,15 @@ def extract_discogs_id_regex(album_id):
# - plain integer, optionally wrapped in brackets and prefixed by an
# 'r', as this is how discogs displays the release ID on its webpage.
# - legacy url format: discogs.com/<name of release>/release/<id>
# - legacy url short format: discogs.com/release/<id>
# - current url format: discogs.com/release/<id>-<name of release>
# See #291, #4080 and #4085 for the discussions leading up to these
# patterns.
# Regex has been tested here https://regex101.com/r/wyLdB4/2
# Regex has been tested here https://regex101.com/r/TOu7kw/1
for pattern in [
r'^\[?r?(?P<id>\d+)\]?$',
r'discogs\.com/release/(?P<id>\d+)-',
r'discogs\.com/release/(?P<id>\d+)-?',
r'discogs\.com/[^/]+/release/(?P<id>\d+)',
]:
match = re.search(pattern, album_id)