Explanation of UFT-8 and Unicode_系统软件_学习笔记★闵涛★计算机学习电脑编程软硬件技巧


	转至繁体中文版	\| 网站首页 \| 图文教程 \| 资源下载 \| 站长博客 \| 图片素材 \| 武汉seo \| 武汉网站优化 \|

Explanation of UFT-8 and Unicode

作者：闵涛文章来源：闵涛的学习笔记点击数：2645 更新时间：2009/4/25 0:44:56

ars);
        System.out.println("Point 1 : " + str);
        System.out.println("   UTF-8 - UTF-8      "
                + new String(str.getBytes("UTF-8"), "ISO-8859-1"));
        System.out.println("   ISO-8859-1 - UTF-8 "
                + new String(str.getBytes("ISO-8859-1"), "UTF-8"));
        System.out.println();

        chars = new char[]{''''\uE840''''};
        str = new String(chars);
        System.out.println("Point 2 : " + str);
        //just a sample you can use this method to verify more characters
        System.out.println("   No less than 7F      " + getHexString(str));

        chars = new char[]{''''\u2260''''};
        str = new String(chars);
        //just a sample you can use this method to verify more characters
        System.out.println("Point 3 : " + str);
        System.out.println("   Range of 1st Byte      " + getHexString(str));
    }

    public static String getHexString(String num) throws Exception {
        StringBuffer sb = new StringBuffer();
        //You must specify UTF-8 here, else it will use the defaul encoding
        //which depends on your enviroment
        byte[] bytes = num.getBytes("UTF-8");
        for (int i = 0; i < bytes.length; i++) {
            sb.append(Integer.toHexString((bytes[i] >= 0 ?
                    bytes[i] : 256 + bytes[i])).toUpperCase() + " ");
        }
        return sb.toString();
    }
}
---------------------------------------------------------------------------------
Pinciple of presenting a unicode use UTF-8:

U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

How to use the principle above?

Sample:
The Unicode character U+00A9 = 1010 1001 (copyright sign) is encoded in UTF-8 as

11000010 10101001 = 0xC2 0xA9

Explain :

A:1010

9:1001

principle 2 : 00000080 < 00A9 < 000007FF

from low to high

1. There 6 x in the low bit we cut last 6 bit from - 10101001(A9) which is 101001

2.There 5 x in the high bit. we cut the rest 2 bit of A9 which is 10 and extend it to 5 bit with three 0 which is 00010

complete the low byte with 10. ----> (10) combine (101001) -> 10101001

complete the high byte with 110, ---> (110) combine (00010) -> 11000010

the Result is

11000010 10101001 = 0xC2 0xA9

you can also verify the following unicode with principle 3 use the way above:

U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx

character U+2260 = 0010 0010 0110 0000 (not equal to) is encoded as:

11100010 10001001 10100000 = 0xE2 0x89 0xA0

Reference:

http://www.cl.cam.ac.uk/~mgk25/unicode.html#unicode

上一页 [1] [2]

[办公软件]如何实现Office工具栏、菜单以及菜单命令重命名  [办公软件]如何在Office文档(大)括号内输入多行文字
[办公软件]如何在office(PowerPoint,Word,Excel)中制作带圈的…  [办公软件]批量删除Office文档(word,excle,powerpoint)中的超…
[办公软件]Office(Word,Excel)密码破解软件(Office Password…  [办公软件]如何让低版本的Office也能顺利编辑2007文档
[办公软件]设置office艺术字的形状  [办公软件]如何将Office菜单设置、工具设置、宏设置等应用到…
[办公软件]在Office(word,excel)中输入各级钢筋符号的方法  [办公软件]打开Office文档就提示安装的原因及解决方案

教程录入：mintao 责任编辑：mintao

上一篇教程：如何让DevExpress的DateEdit控件正确显示日期的周名

下一篇教程：如何Flash发布为MXP组件

【字体：小大】【发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口】

注：本站部分文章源于互联网，版权归原作者所有！如有侵权，请原作者与本站联系，本站将立即删除！本站文章除特别注明外均可转载，但需注明出处！ [MinTao学以致用网]

　网友评论：（只显示最新10条。评论内容只代表网友观点，与本站立场无关！）

同类栏目

· 办公软件 · 系统软件
· 常用软件 · 聊天工具

热门推荐

没有教程

赞助链接

闵涛博文

500 - 内部服务器错误。

您查找的资源存在问题，因而无法显示。

鄂公网安备 42011102001154号

站长：MinTao ICP备案号：鄂ICP备11006601号-18

闵涛站盟:医药大全-武穴网。A打造B、C、D……