在dos 下可以显示正常(使用type 查看也是异常的) ,在linux 下使用显示
<feff><?xml version="1.0" encoding="UTF-8"?>
具体如下图:
[itlife365@linux tempConf]$ file student.xml
student.xml: UTF-8 Unicode text, with CRLF line terminators
[itlife365@linux tempConf]$ file student.xml.bak_38
student.xml.bak_38: UTF-8 Unicode text, with CRLF, LF line terminators
[student.xml@linux96181 tempConf]$ file student.xml.bak_ok_zh_bj_
student.xml.bak_ok_zh_bj_tj: XML 1.0 document text
[itlife365@linux96181 tempConf]$
如果 XML 文档的实际编码、外部编码和内部编码(BOM 或 XML 声明)不一致,那么该文档就是不可读的。一个例外是外部编码为 Unicode(例如使用 UTF-16 的一个 Java String):任何内部编码都被忽略。当一个不支持 XML 的过程译码(也就是改变实际编码)或者在不支持内部编码的情况下更改文档时,就会发生一个常见的问题。Java 语言、CLI 和嵌入式 SQL 应用程序中对字符串的某些处理可以在不改变内部编码的情况下进行译码。
为什么出现Unicode 编码
任何遗留编码都是有限制的,因为它只能表示少数语言中的文本。管理多种编码是件头痛的事情,这不仅仅是因为大多数应用程序和数据库只能处理一种编码,还有很多其它原因。Unicode 就是为了解决这个问题而发明的。它是用于表示正在使用的所有语言中的所有字符的字符集,并且留有增长空间。
表 1. Unicode 编码的 Byte Order Mark
BOM 类型 | BOM 值 | 编码 |
---|---|---|
UTF-8 | X'EFBBBF' | UTF-8 |
UTF-16 Big Endian | X'FEFF' | UTF-16 |
UTF-16 Little Endian | X'FFFE' | UTF-16 |
UTF-32 Big Endian | X'0000FEFF' | UTF-32 |
UTF-32 Little Endian | X'FFFE0000' | UTF-32 |
上面的列表来自
http://www.ibm.com/developerworks/cn/education/data/db2-cert7333/section4.html
DB2 9 应用开发(733 考试)认证指南,第 3 部分: XML 数据操纵《XML 编码》
下面的内容来自:http://social.msdn.microsoft.com/Forums/zh-CN/Vsexpressvb/thread/6fbe8086-7950-43f4-a703-19491cb1d9f6
This is all about the differences in the encoding with different files, as discussed in the earlier thread. I've looked into it a little more and believe the following is correct.
If there is no BOM then the reader assumes it is UTF-8.
Your original file doesn't have a BOM but it uses UTF-16.
Depending upon how you edit the file it may get saved as UTF-8 (VS) or UTF-16 (Notepad).
It's all a bit of a mess really.
Private Function GetEncoding(ByVal Filename As String) As System.Text.Encoding
Dim Coding As System.Text.Encoding
Dim FS As New FileStream(Filename, FileMode.Open, FileAccess.Read)
Dim Header(1) As Byte
FS.Read(Header, 0, 2)
FS.Close()
Dim HeaderString As String = Hex(Header(0)) & Hex(Header(1))
Select Case HeaderString
Case "FFFE"
' Unicode UTF-16, little endian
Coding = System.Text.Encoding.Unicode
Case "FEFF"
' Unicode UTF-16, big endian
Coding = System.Text.Encoding.BigEndianUnicode
Case "EFBB"
' probably UTF-8
Coding = System.Text.Encoding.UTF8
Case Else ' No BOM (maybe)
If Header(0) = 0 Then
' probably big endian
Coding = System.Text.Encoding.BigEndianUnicode
ElseIf Header(1) = 0 Then
' probably little endian
Coding = System.Text.Encoding.Unicode
Else
' probably UTF-8 but I wouldn't put money on it
Coding = System.Text.Encoding.UTF8
End If
End Select
Return Coding
End Function