Analysis of information sources in references of the Wikipedia article "UTF-16" in English language version.
I first came up with the idea for this Top Ten List over 10 years ago, which was prompted by some environments that still supported only BMP code points. The idea, of course, was to motivate the developers of such environments to support code points beyond the BMP by providing an enumerated list of reasons to do so. And yes, there are still some environments that support only BMP code points, such as the VivaDesigner app.
[…] the file system treats path and file names as an opaque sequence of WCHARs
These functions use UTF-16 (wide character) encoding (…) used for native Unicode encoding on Windows operating systems.
Windows 2000 introduces support for basic input, output, and simple sorting of supplementary characters. However, not all system components are compatible with supplementary characters.
As of Windows version 1903 (May 2019 update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page. [...]CP_ACP
equates toCP_UTF8
only if running on Windows version 1903 (May 2019 update) or above and the ActiveCodePage property described above is set to UTF-8. Otherwise, it honors the legacy system code page. We recommend usingCP_UTF8
explicitly.
By operating in UTF-8, you can ensure maximum compatibility [..] Windows operates natively in UTF-16 (or WCHAR), which requires code page conversions by using MultiByteToWideChar and WideCharToMultiByte. This is a unique burden that Windows places on code that targets multiple platforms. [..] The Microsoft Game Development Kit (GDK) and Windows in general are moving forward to support UTF-8 to remove this unique burden of Windows on code targeting or interchanging with multiple platforms and the web. Also, this results in fewer internationalization issues in apps and games and reduces the test matrix that's required to get it right.
InputStreamReader
File names editing in Window dialogs in broken (delete required 2 presses on backspace)
Each encoding form maps the Unicode code points U+0000..U+D7FF and U+E000..U+10FFFF
[...] the term UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard.
UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1 [...]
UTF-16 uses a single 16-bit code unit to encode over 60,000 of the most common characters in Unicode
UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1 [...]
UTF-16 encodings are the only encodings that this specification needs to treat as not being ASCII-compatible encodings.
The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding. [..] The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons that UTF-8 is now the mandatory encoding for all text things on the Web.
UTF-16 encodings are the only encodings that this specification needs to treat as not being ASCII-compatible encodings.