Proj3000: Only use UTF-8 without BOM
The byte-order mark (BOM) is a particular usage of the special Unicode character code, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text.
[…] Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information. […]
The use of BOM for UTF-8 however, can be problematic:
- Files that hold no text are no longer empty because they always contain the BOM.
- Files that hold text within the ASCII subset of UTF-8 are no longer themselves ASCII because the BOM is not ASCII. This can cause existing, often hard to replace, (legacy) tools to break down.
- It is not possible to concatenate several files together (without changes) because each file now has a BOM at the beginning.
- Multiple standards (including CSS, JSON, YAML) explicitly disallow the use of UTF-8 BOM.
Therefor, this rule will warn when a file is stored with an UTF-8 BOM.