Advanced SubString (Support Unicode)

How to do proper substring for unicode characters

I wrote this article after experiencing a “Broken surrogate pair” issue in Java. After investigation, I found that improper substring is the root cause of this problem.

Unicode Problem

  • JAVA encode characters in 16-bits representation.
  • Unicode chars may be encoded using multiple 16-bit entities.
  • How can we cut it based on character index?
