The “String” type is, no doubt, one of the most used types in every programming language. It’s probably the data type that we, humans, understand better. They are so important that they can considerably alter our perception on how easy or hard to learn is a programming language, based on the use of strings alone. As an example, consider C, the archetypical language and the king of programming languages for decades. C has always been considered one of the most difficult languages to master when it comes to memory management, and this is directly related on how Strings are represented in C, as pointers to characters.

char myString1[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
char * myString2;
char myStringBuffer[1024] = "Goodbye!";

As you can see, there’s no unique way of defining or initialising a String in C. What’s even worse, the treatment of a String depends greatly on how it was defined and initialised. myString2, for instance, is just actually an integer, a pointer to a memory area that is still waiting to be allocated, whereas myString1 has been defined and initialised with a fixed value of 6 characters (including the null termination character).

There is a big chance that, if you are not using custom data structures in C, and you can represent your data with just integers, strings and floats, you won’t need to deal with structs or complex data memory allocations, but as soon as you have strings, you do have to manage pointers and deal with memory allocation, null termination characters, etc. This situation is responsible not only for the reputation of C as a “difficult” and “dirty” language, but also for the myriad of software bugs, vulnerabilities and exploits that has plagued software (and still do) for decades.

On the flip side of the coin, we have languages like javascript, where a string is “defined” and initialised as simply as this:

var myString = "Hello world";

Easy as pie, right? the language also abstracts you from memory allocation, buffer sizes and internal representation stuff.

But as important as how to define, initialise and start working with a type is the APIs and libraries that the language natively offers you to handle this data structures. If you have ever programmed in C/C++, you know that strings are a complete nightmare to use for the most simple of operations. You can introduce a bug when just trying to concatenate two strings:

#include <stdio.h>
#include <string.h>

int main () {

 char string1[6] = "Hello";
 char string2[6] = "World";
 char resultString[12];
 int len;

 strcpy(resultString, string1);
 strcat(string1, string2);

 return 0;
}

You will notice that this is a lot of code for just concatenating two strings, but not only that, you have to be careful with a lot of stuff here. You would probably need to check the size of the resultString buffer to make sure that string1+string2 fit there in a general situation, and also use strncpy to make sure you are copying exactly the amount of bytes that you intended… The conclusion is that C APIs for dealing with strings are terrible, and completely developer un-friendly.

On the other hand, we have javascript libraries, that again are the example of how an API becomes transparent for the developer:

var hello = "hello"
var world = "world"
var helloWorld = hello + world

See the difference? You might argue that after you’ve been programming with C enough time to be able to work rapidly with string operations like these, or you have your libraries in place, it’s faster, but at the end of the day, if those easy operations require such a complex sequence of code, more complex operations, like regular expression matching might, most probably, take you a lot of time and distract you from doing code that actually solves the problem at hand.

Note: I’m of course not saying here that Javascript is a better language than C or Swift, or that language X is better than language Y. I’m using Javascript as an example of a language that’s easy to use, and that’s why it’s so popular, and the reason why things like Node.js exist today.

So the API for a data type is as important as the definition of the data type itself. In this post, we’ll have a look at how swift deals with strings in its current (and past) versions, and the plans for the Swift 4 release.

Strings in Swift

Swift has not been very kind with strings. Practically all versions of swift have modified the API of the String type in one way or another. Also, the relationship between String and its out-of-fashion cousin, NSString, has varied wildly during these years. Let’s have a look, for example, at something as seemingly easy as counting the characters of a string, and how different versions of swift has coped with it:

let string = "Hello world!"

// Swift 1 and up to 1.1:
let swift1Length = countElements(string)

// Swift 1.2:
let swift1_2Length = count(string)

// Swift 2.X:
let swift2Length = string.characters.count

Luckily, Swift 3 maintains Swift 2.X syntax. Notice how we have up to three completely different ways of getting the length of a string. Notice also how, up to Swift 2, the length was retrieved by means of a global method, while in Swift 2 it transitioned to an instance method. Besides, up to Swift 2, String conformed to the CollectionType protocol, and you could say that a String was just a collection of Characters. In Swift 2, they changed that, so now you basically have to use the “characters” property to get the length of a String.

I don’t know if Swift is your only language or you code in many different programming languages, but this string.characters.count (and the previous countElements and count) get my vote for the worse way of retrieving the length of a String ever (and that includes C’s strlen()).

To make things even worse, when Apple introduced Swift as an emoji-friendly language, another problem appeared. In Objective-C, NSStrings were just a representation of UTF-16 characters. However, emojis cannot be represented with a fixed number of bits, they can take 8, 16, 32, 64 or more. Thus, the length of a String was no longer the length of a string if emojis (or unicode characters) where involved. A new property appeared in strings called utf16Count, that was later removed in Swift 1.2 and replaced by asking for string.utf16.count. This led to a great confusion among Swift developers on something as simple as get the length of a String.

Here is an example of the current situation in Swift 3.

let oneEmoji = "?"

oneEmoji.characters.count // 1
oneEmoji.utf16.count      // 2
oneEmoji.utf8.count       // 4

So as you can see, one wonders why was so difficult for Apple to just add a simple string.length function just like probably any other language out there.

This API instability has affected most of the String type, and many other basic types, from the very beginning of Swift.

Substrings and Indexes you say?

Certainly related to how strings are represented in Swift, and the difficulties for getting something as simple as the length of a string, another really weird API that has distressed the swift developer community is the Index system, that inherits the worse approach of NSRange (location + length) of Objective-C NSStrings and adds the upsetting strongly typed nature of Swift to the equation.

Because every character in a Swift string can take a variable number of bits, an index is not just an integer, and you can not traverse freely throughout a Swift string, you need to advance or move back the index through the characters of the string, and they are also tied to the strings, so you cannot just get the starting index of a string and just add 4 to it, or use the same index on another string. This greatly complicates String manipulation in Swift.

As an example, let’s see how from “Hello World!” we can extract the second part “World!” and then by modifying the index, just the exclamation mark “!” in Javascript:

var string = "Hello World!"
var index = string.indexOf("World!")
var substr = string.substring(index) // "World!"
index += 5
var substr2 = string.substring(index) // "!"

Now, this might seem like a dumb example, and it is, but a lot of your code probably is composed on simple operations like these.

Let’s see the same code in Swift:

var string = "Hello World!"

// Option 1: ranges
if let range = string.range(of: "World!") {
 let substring = string.substring(with: range) // "World!"
 let anotherLowBound = string.characters.index(range.lowerBound, offsetBy: 5)
 let anotherRange = Range(uncheckedBounds: (anotherLowBound, range.upperBound))
 let substring2 = string.substring(with: anotherRange) // "!"
}

// Option 2: indexes.
if let index = string.range(of: "World!")?.lowerBound {
 let substring = string.substring(from: index) // "World!"
 let anotherIndex = string.characters.index(index, offsetBy: 5)
 let substring2 = string.substring(from: anotherIndex) // "!"
}

Notice how, conceptually, in Javascript we are just performing simple operations on a string with an integer index. I am certain that a 8-9 years old kid would be able to grab the concept if you spend some time explaining it to him.

However, try to explain any of the swift options to this kid… It is just not intuitive. You are managing ranges, entities called String.Index that cannot be modified by themselves, but by accessing a “characters” property of the string containing them… If you really like the Swift way of doing it.. you need to get out more, like, seriously more.

Indexes and substrings are just two examples of basic functionality that Swift strings should offer in an easy, intuitive way, but fails to do so. Let’s not even get started on matching regular expressions.

What to expect from Swift 4 Strings

So, what’s coming up for Swift 4 in terms of our beloved data type? Well, there are some options being considered, some are good, some are bad, some are ok. They are all detailed in a proposal manifesto document that will probably go through some modifications, but anyway, here’s a short developer-friendly summary

Restoring Collection conformance and dropping .characters

This will be greatly appreciated, at least for me. If String becomes a collection again, that means that we won’t need that artefact called “characters” that nobody actually cares except for getting the length of the string. Being able to calculate it again as probably string.count would put it really close to being actually nice (I would prefer length, however, because I write a lot of Node.js code and I tend to write .count in my javascript code or viceversa).

Providing a more general, composable slicing syntax

This might sound good, but the proposal itself sounds terrible to me. Quoting the document:

When implementing substring slicing, languages are faced with three options:
1. Make the substrings the same type as string, and share storage.
2. Make the substrings the same type as string, and copy storage when making the substring.
3. Make substrings a different type, with a storage copy on conversion to string.
We think number 3 is the best choice. A walk-through of the tradeoffs follows.

Basically what they are proposing is the introduction of a new data type probably called Substring, that will be the result of the slicing operations on Strings.

I don’t really care which are the optimisation implications they go through, or how this will be better for preventing leaks and dealing with memory management. This to me sound like a terrible, terrible idea (are these ideas discussed with a normal, everyday developer at all?). String, Range, and String.Index APIs are complex enough to also add a new Substring that we need to take care of, translate back to String, be careful with Swift typed nature, and adapt to the methods now returning Substrings here and Strings there. No, thanks.

Altering Comparable so that parameterized (e.g. case-insensitive) is usable

This, again, is a fantastic proposal in theory. Apple’s implementation of comparisons again lacks from conceptual clarity, and this is stated in the proposal:

Because the current Comparable protocol expresses all comparisons with binary operators, string comparisons—which may require additional options—do not fit smoothly into the existing syntax.

Fair enough, the proposed fix includes turning the comparison to a method instead of binary operators, that will allow for the introduction of optional methods (like case insensitivity), which now require the use of not-so-intuitive methods.

I like this proposal. However, I’m worried on how are they planning to do that only to strings without altering the Comparable paradigm, and if strings will then still adhere to “Comparable” at all, or will adhere to “Comparable” and another one like “StringComparable”…

Clearly separating language-dependent operations  from […] operations […] for machine processing

Under this proposal, they discuss several changes that are aimed at improving the use of the APIs for humans. I completely agree with one of the discussions regarding internationalisation of strings. I think there is a lot of confusion and misuse of this API because it’s somewhat obscure, and translates the complexity to the developer:

There is strong evidence that developers cannot determine how to use internationalization APIs correctly. Although documentation could and should be improved, the sheer size, complexity, and diversity of these APIs is a major contributor to the problem, causing novices to tune out, and more experienced programmers to make avoidable mistakes.

I would really love to see changes in this direction, but I’m somewhat wary and totally mistrustful that the changes are going to be in the good direction.

Relocating APIs that fall outside the domain of basic string processing and discouraging the proliferation of ad-hoc extensions

Well, this is certainly the point. In my case at least, to make String close to usable in my everyday projects, I have a collection of ad-hoc extensions and categories for String that help me cope with basic operations such as slicing, indexing and matching strings while keeping my sanity.

I would really love to see this one become a reality.

Conclusion

String in Swift is broken. Utterly. Completely broken. Seriously. I think Apple engineers should start from scratch and come up with something truly intuitive and easy to use. Come on! If Java, Javascript, Perl, Ruby, etc can do it, certainly Apple engineers can too.

I do really appreciate that Apple engineers, Programming language experts and developers all around the world try to get the most of the language in terms of efficiency and optimisation, and consider thoroughly all the design options for structures, data types and memory management methodologies when doing these proposals. I won’t even dare to compare myself in terms of formal knowledge on compilers, memory, leaks, algorithmic, optimisation, etc with them of course…

… but I am an Apple developer, and let’s not forget that, at the end of the day, programs are written (for the moment) mainly by us, humans, and I strongly believe that Swift can greatly benefit from a more stable, easy to use and intuitive API for at least the basic data types and foundation classes. I would gladly exchange a 1% of efficiency in my programs just to be able to write string.length now and know that it will still mean the same in 5 years, for every string.

Swift 4 proposal for strings introduces some welcomed changes that I like, but also some proposals that have the potential of becoming new upsetting changes for me. I am not really happy on the evolution of Swift since its 2.0 release, and I’m getting worried that I will grow disenchanted with Apple development if this situation gets worse. I really love developing for iOS, WatchOS and tvOS, and I hope I will be able to cope with all these changes in Swift as the years pass… but I cannot be certain.

This post is dedicated to Tibor Bodecs for a short but inspiring Twitter chat that encouraged me to write it.