使用UTF8字符集存储中文生僻字
发布日期:2021-05-09 05:08:47 浏览次数:11 分类:博客文章

本文共 4871 字,大约阅读时间需要 16 分钟。

使用UTF8字符集存储中文生僻字

一、相关学习BLOG

https://www.cnblogs.com/jyzhao/p/8654412.htmlhttp://blog.itpub.net/781883/viewspace-1411259/https://www.qqxiuzi.cn/bianma/zifuji.phphttps://blog.csdn.net/iteye_7853/article/details/82516888

 

二、需求详情:

客户提出,关于氮卓斯汀变更为氮䓬斯汀,系统出现乱码问题
产生问题的原因为:oracle数据库字符集为:ZHS16GBK,对于部分生僻字是无法正常保存的。

三、客户提出的解决方案:

1. 修改数据库字符集为:UTF-8。此方法需对oracle字符集进行修改,但修改后,可能会将原有数据全部变成乱码。
2. 程序改造:将所有会涉及到生僻字的字段(例如产品名称、通用名等),存入数据库时,转码为16进制存,然后读取时再进行解码后展示到页面。此方法涉及修改代码庞大,且数据库内容可读性很差,手动刷数据、导出数据难度也很大。

四、解决思路:

1)直接修改数据库字符集,除非是子集修改为超集,否则不建议修改,从上述链接blog可以发现强行将db字符集从gbk修改为utf8后,plsql登录提示存在字符不匹配现象;
2)应用程序修改,代码量大,且可读写性太差;
3)建议将生僻字业务表,迁移至utf8 db库中存储(与开发人员沟通,实际存储生僻字的表只有20余个,可以单独对这些表进行迁移,业务修改查询表的代码(通过db_link),或者直接连接新的db,再或者通过创建db_link+同义词指向迁移后的远程表进行查询不修改应用代码(应用不修改,无感知);

五、实验测试

1.测试环境导出业务表
2.导入到UTF8环境下,进行读写测试

5.1源环境导出

修改字符集报错SQL> alter database character set al32utf8;alter database character set al32utf8*第 1 行出现错误:ORA-12712: 新字符集必须为旧字符集的超集SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%'PARAMETER VALUE------------------------------ ------------------------------NLS_CHARACTERSET ZHS16GBKNLS_NCHAR_CHARACTERSET AL16UTF16SQL> conn scott/tigerSQL> create table test(id int,c_name varchar2(200));表已创建。SQL> insert into test values(1,'板蓝根');SQL> insert into test values(2,'氮䓬斯汀');SQL> commit;SQL> insert into test values(3,'氮卓斯汀');SQL> commit;SQL> select * from testID C_NAME---------- --------------------1 板蓝根2 氮?斯汀3 氮卓斯汀C:\Users\Thinkpad>exp scott/tiger FILE=C:\Users\Thinkpad\Desktop\temp\hr_test.dmp TABLES=testExport: Release 11.2.0.4.0 - Production on 星期三 6月 26 13:20:58 2019Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集即将导出指定的表通过常规路径.... . 正在导出表 TEST导出了 3 行成功终止导出, 没有出现警告。

 

 

5.2目标环境导入

 

SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%';PARAMETER VALUE------------------------------ ------------------------------NLS_CHARACTERSET AL32UTF8NLS_NCHAR_CHARACTERSET AL16UTF16$env|grep LANGNLS_LANG=american_america.ZHS16GBKLANG=en_US.UTF-8enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=yImport: Release 11.2.0.4.0 - Production on Wed Jun 26 01:27:22 2019Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsExport file created by EXPORT:V11.02.00 via conventional pathimport done in ZHS16GBK character set and AL16UTF16 NCHAR character setimport server uses AL32UTF8 character set (possible charset conversion). importing SCOTT's objects into SCOTT. importing SCOTT's objects into SCOTT. . importing table "TEST" 3 rows importedImport terminated successfully without warnings.SQL> select * from test;ID C_NAME---------- ------------------------------1 2 3 ˹͡以上Oracle进行字符转换后,中文字符直接配置为Null修改语言格式,让Oracle无需进行字符转换export NLS_LANG=american_america.AL32UTF8enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=yExport file created by EXPORT:V11.02.00 via conventional pathimport done in AL32UTF8 character set and AL16UTF16 NCHAR character setexport client uses ZHS16GBK character set (possible charset conversion). importing SCOTT's objects into SCOTT. importing SCOTT's objects into SCOTT. . importing table "TEST" 3 rows importedImport terminated successfully without warnings.enmo:/home/oraclesqlplus / as sysdbaSQL*Plus: Release 11.2.0.4.0 Production on Wed Jun 26 02:40:32 2019Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options

 

数据验证

 

SQL> conn scott/tigerConnected.SQL> select * from test;ID C_NAME------------------------------------1 板蓝根2 氮?斯汀3 氮卓斯汀本次数据是有了,UTF8字符集SQL> select dump('氮卓斯汀') from dual;DUMP('氮卓斯汀')--------------------------------------------------------------Typ=96 Len=12: 230,176,174,229,141,147,230,150,175,230,177,128GBK字符集SQL> select dump('氮卓斯汀') from dual;DUMP('氮卓斯汀')---------------------------------------------Typ=96 Len=8: 181,170,215,191,203,185,205,161SQL> desc scott.test名称 是否为空? 类型----------------------------------------- -------- ----------------------------ID NUMBER(38)C_NAME VARCHAR2(200)对于两套环境test表字段进行收缩,可以发现UTF8字符集表,实际存储是使用三个字节存储一个汉字UTF8SQL> alter table scott.test modify c_name varchar2(8);alter table scott.test modify c_name varchar2(8)*ERROR at line 1:ORA-01441: cannot decrease column length because some value is too big SQL> alter table scott.test modify c_name varchar2(12);Table altered.GBKGBK存储中文两个字节存储一个汉字SQL> alter table scott.test modify c_name varchar2(8);表已更改。

 

上一篇:监听lsnrctl status查询状态报错linux error 111:connection refused
下一篇:创建新表,自动授权trigger

发表评论

最新留言

第一次来,支持一个
[***.219.124.196]2025年04月13日 11时40分59秒