ars); System.out.println("Point 1 : " + str); System.out.println(" UTF-8 - UTF-8 " + new String(str.getBytes("UTF-8"), "ISO-8859-1")); System.out.println(" ISO-8859-1 - UTF-8 " + new String(str.getBytes("ISO-8859-1"), "UTF-8")); System.out.println(); chars = new char[]{''''\uE840''''}; str = new String(chars); System.out.println("Point 2 : " + str); //just a sample you can use this method to verify more characters System.out.println(" No less than 7F " + getHexString(str)); chars = new char[]{''''\u2260''''}; str = new String(chars); //just a sample you can use this method to verify more characters System.out.println("Point 3 : " + str); System.out.println(" Range of 1st Byte " + getHexString(str)); } public static String getHexString(String num) throws Exception { StringBuffer sb = new StringBuffer(); //You must specify UTF-8 here, else it will use the defaul encoding //which depends on your enviroment byte[] bytes = num.getBytes("UTF-8"); for (int i = 0; i < bytes.length; i++) { sb.append(Integer.toHexString((bytes[i] >= 0 ? bytes[i] : 256 + bytes[i])).toUpperCase() + " "); } return sb.toString(); } } --------------------------------------------------------------------------------- Pinciple of presenting a unicode use UTF-8: U-00000000 - U-0000007F: 0xxxxxxx U-00000080 - U-000007FF: 110xxxxx 10xxxxxx U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx How to use the principle above? Sample: The Unicode character U+00A9 = 1010 1001 (copyright sign) is encoded in UTF-8 as 11000010 10101001 = 0xC2 0xA9 Explain : A:1010 9:1001 principle 2 : 00000080 < 00A9 < 000007FF from low to high 1. There 6 x in the low bit we cut last 6 bit from - 10101001(A9) which is 101001 2.There 5 x in the high bit. we cut the rest 2 bit of A9 which is 10 and extend it to 5 bit with three 0 which is 00010 complete the low byte with 10. ----> (10) combine (101001) -> 10101001 complete the high byte with 110, ---> (110) combine (00010) -> 11000010 the Result is 11000010 10101001 = 0xC2 0xA9 you can also verify the following unicode with principle 3 use the way above: U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
character U+2260 = 0010 0010 0110 0000 (not equal to) is encoded as: 11100010 10001001 10100000 = 0xE2 0x89 0xA0 Reference: http://www.cl.cam.ac.uk/~mgk25/unicode.html#unicode
上一页 [1] [2] [办公软件]如何实现Office工具栏、菜单以及菜单命令重命名 [办公软件]如何在Office文档(大)括号内输入多行文字 [办公软件]如何在office(PowerPoint,Word,Excel)中制作带圈的… [办公软件]批量删除Office文档(word,excle,powerpoint)中的超… [办公软件]Office(Word,Excel)密码破解软件(Office Password… [办公软件]如何让低版本的Office也能顺利编辑2007文档 [办公软件]设置office艺术字的形状 [办公软件]如何将Office菜单设置、工具设置、宏设置等应用到… [办公软件]在Office(word,excel)中输入各级钢筋符号的方法 [办公软件]打开Office文档就提示安装的原因及解决方案
|