String Homogeneousness Algorithms
Posted on
Today, I tweeted the following code while I was working on Whiskey:
import Foundation
extension String {
var isHomogeneous: Bool {
var homogeneous = true
var character: NSString?
enumerateSubstringsInRange(Range(start: startIndex, end: endIndex), options: [.ByComposedCharacterSequences]) { substring, _, _, stop in
if let character = character {
if character != substring {
homogeneous = false
stop = true
}
} else {
character = substring
}
}
return homogeneous
}
}
It just checks to see if all of the characters in a string are all the same. Here's the test:
import XCTest
class StringExtensionTests: XCTestCase {
func testHomogeneous() {
XCTAssertTrue("~~~".isHomogeneous)
XCTAssertTrue("aaa".isHomogeneous)
XCTAssertTrue("ππ".isHomogeneous)
XCTAssertTrue("π".isHomogeneous)
XCTAssertTrue("".isHomogeneous)
XCTAssertFalse("as".isHomogeneous)
XCTAssertFalse(" ~~~".isHomogeneous)
}
It seemed like there would be a better solution. Terry Lewis and several others suggessted putting the characters in a set and counting the set. Here's Terry's solution (Kelly Sutton suggested making it more consice):
extension String {
var isHomogeneous: Bool {
return Set(characters).count <= 1
}
}
At first, I was all about this. No Foundation dependency is great. It's also way less code. Ayaka Nonaka suggested comparing their performance. I benchmarked them and they were basically equal. This makes sense. Either way, it has to iterate over all of the characters and then compare them.
The one benefit to my approach is bailing out early. Let's think about the following test:
XCTAssertFalse("πππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππππ ".isHomogeneous)
For 1,000 iterations, my first approach took 0.013 seconds. The set approach took 2.932 second! Crazy. Totally makes sense though. It doesn't bother getting the rest of the characters if there is a mismatch anywhere.
Next, the great Nate Cook suggested doing it all in Swift:
extension String {
var isHomogeneous: Bool {
if let firstChar = characters.first {
for char in dropFirst(characters) where char != firstChar {
return false
}
}
return true
}
}
Fantastic.