2004.10.30
Finding palindromes in texts
by Karel Thönissen
In a thread on Joel on Software I proposed the following algorithm for finding palindromes in texts (small editorial changes aside):

Finding palindromes is very simple using a different view of the matter.
Iterate over all the characters in the input text. Then comes the second and interesting part: assume that this character is the *middle* of the palindrome. So all you have to do is check the characters to its left and right etc. The third step is similar, but for even numbered palindromes: if this character and the next one are equal then inspect the characters to the left and the right of this pair, etc.
Obviously this is O(n^2) but it is simple to program and prove correct. It has good average time performance, approaching O(n), since most input texts have no or few palindromes.
[addition of 2004.11.01]:
Two simplifications are possible in the algorithm I presented. If a palindrome should not be sensitive to punctuation, then drop the punctuation from the input text *before* the searching. This greatly simplifies the most complex part of the algorithm. However, be aware that the result thus obtained should be transformed back to the form where it had its punctuation. That should not be too difficult though.
The second simplification involves adding two different sentinel values at either side of the input text before the searching begins. These values must be unique in the entire resulting text string. Since they are unique, they can never be part of any palindrome. Therefore, they act as a very simple stopping criterion. Of course the outer iteration should now start at low+1 and stop at high-1.
|