XOR Media

String Truncate Middle With Ellipsis

There are times when you need to middle truncate a string. In many cases it’s for UX/human purposes, though in some situations it’s the best way to generate unique string for a length-limited field. This is the case I ran in to recently in trying to automate submission of IAP to both Google Play and the App Store which require short unique names for each SKU.

The Setup

Consider the following titles, each of which is 39 characters long.

Midsomer Murders - Series 1 - Episode 1
Midsomer Murders - Series 1 - Episode 2
Midsomer Murders - Series 1 - Episode 3
Midsomer Murders - Series 1 - Episode 4
Midsomer Murders - Series 1 - Episode 5
Midsomer Murders - Series 2 - Episode 1
Midsomer Murders - Series 2 - Episode 2
Midsomer Murders - Series 2 - Episode 3
Midsomer Murders - Series 2 - Episode 4
Midsomer Murders - Series 2 - Episode 5
Midsomer Murders - Series 3 - Episode 1
Midsomer Murders - Series 3 - Episode 2
Midsomer Murders - Series 3 - Episode 3
Midsomer Murders - Series 3 - Episode 4
Midsomer Murders - Series 3 - Episode 5
...

Assume we had to fit these strings in to a field we had no control over that requires them to be 32 characters or less and unique, or perhaps we’re displaying them in a UI where there’s not enough room for the full title. The naive approach would be to truncate them to 32 characters in length and add ellipsis to make it clear that the title has been truncated.

Midsomer Murders - Series 1 -...
Midsomer Murders - Series 1 -...
...

That doesn’t work particularly well as it results in duplicates across each series. Taking a closer look at the format of the titles, which in this case are consistent, we notice that there’s two points that will uniquely identify an episode. The series number and the episode number. So what if we truncate things in the middle rather than the end.

Midsomer Murderi...1 - Episode 1
Midsomer Murders...1 - Episode 2
...

The code to do this is fairly simple and looks something like the following.

def truncate_middle(s, n):
    if len(s) <= n:
        # string is already short-enough
        return s
    # half of the size, minus the 3 .'s
    n_2 = int(n) / 2 - 3
    # whatever's left
    n_1 = n - n_2 - 3
    return '{0}...{1}'.format(s[:n_1], s[-n_2:])

This process isn’t perfect though as with a different set of titles it may truncate out the series number as result in duplicates. For UI purposes this may be acceptable (a best effort,) but for something that requires uniqueness won’t quite be enough. In my particular situation the items have hex UUID’s as unique identifiers so the simplest thing to do was to append a few characters of it to the end of the title before truncating. This for all practical purposes insures uniqueness. What other solutions can you think of?