13赞

将utf16写入二进制模式的文件

作者：黄晓敏3023 | 2023-09-03 23:14

如何解决《将utf16写入二进制模式的文件》经验，为你挑选了3个好方法。

我试图用二进制模式的ofstream写一个wstring文件,但我觉得我做错了.这就是我尝试过的:

ofstream outFile("test.txt", std::ios::out | std::ios::binary);
wstring hello = L"hello";
outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t));
outFile.close();

在例如Firefox中打开test.txt,编码设置为UTF16,它将显示为:

你好

谁能告诉我为什么会这样？

编辑:

在十六进制编辑器中打开文件我得到:

FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00

看起来我出于某种原因在每个角色之间得到两个额外的字节？

1> Martin York..：

在这里,我们遇到了很少使用的语言环境属性.如果将字符串输出为字符串(而不是原始数据),则可以使区域设置自动进行适当的转换.

注意:此代码未考虑wchar_t字符的edianness.

#include 
#include 
#include 
// See Below for the facet
#include "UTF16Facet.h"

int main(int argc,char* argv[])
{
   // construct a custom unicode facet and add it to a local.
   UTF16Facet *unicodeFacet = new UTF16Facet();
   const std::locale unicodeLocale(std::cout.getloc(), unicodeFacet);

   // Create a stream and imbue it with the facet
   std::wofstream   saveFile;
   saveFile.imbue(unicodeLocale);


   // Now the stream is imbued we can open it.
   // NB If you open the file stream first. Any attempt to imbue it with a local will silently fail.
   saveFile.open("output.uni");
   saveFile << L"This is my Data\n";


   return(0);
}

文件:UTF16Facet.h

 #include 

class UTF16Facet: public std::codecvt::state_type>
{
   typedef std::codecvt::state_type> MyType;
   typedef MyType::state_type          state_type;
   typedef MyType::result              result;


   /* This function deals with converting data from the input stream into the internal stream.*/
   /*
    * from, from_end:  Points to the beginning and end of the input that we are converting 'from'.
    * to,   to_limit:  Points to where we are writing the conversion 'to'
    * from_next:       When the function exits this should have been updated to point at the next location
    *                  to read from. (ie the first unconverted input character)
    * to_next:         When the function exits this should have been updated to point at the next location
    *                  to write to.
    *
    * status:          This indicates the status of the conversion.
    *                  possible values are:
    *                  error:      An error occurred the bad file bit will be set.
    *                  ok:         Everything went to plan
    *                  partial:    Not enough input data was supplied to complete any conversion.
    *                  nonconv:    no conversion was done.
    */
   virtual result  do_in(state_type &s,
                           const char  *from,const char *from_end,const char* &from_next,
                           wchar_t     *to,  wchar_t    *to_limit,wchar_t*    &to_next) const
   {
       // Loop over both the input and output array/
       for(;(from < from_end) && (to < to_limit);from += 2,++to)
       {
           /*Input the Data*/
           /* As the input 16 bits may not fill the wchar_t object
            * Initialise it so that zero out all its bit's. This
            * is important on systems with 32bit wchar_t objects.
            */
           (*to)                               = L'\0';

           /* Next read the data from the input stream into
            * wchar_t object. Remember that we need to copy
            * into the bottom 16 bits no matter what size the
            * the wchar_t object is.
            */
           reinterpret_cast(to)[0]  = from[0];
           reinterpret_cast(to)[1]  = from[1];
       }
       from_next   = from;
       to_next     = to;

       return((from > from_end)?partial:ok);
   }



   /* This function deals with converting data from the internal stream to a C/C++ file stream.*/
   /*
    * from, from_end:  Points to the beginning and end of the input that we are converting 'from'.
    * to,   to_limit:  Points to where we are writing the conversion 'to'
    * from_next:       When the function exits this should have been updated to point at the next location
    *                  to read from. (ie the first unconverted input character)
    * to_next:         When the function exits this should have been updated to point at the next location
    *                  to write to.
    *
    * status:          This indicates the status of the conversion.
    *                  possible values are:
    *                  error:      An error occurred the bad file bit will be set.
    *                  ok:         Everything went to plan
    *                  partial:    Not enough input data was supplied to complete any conversion.
    *                  nonconv:    no conversion was done.
    */
   virtual result do_out(state_type &state,
                           const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next,
                           char          *to,   char          *to_limit, char*          &to_next) const
   {
       for(;(from < from_end) && (to < to_limit);++from,to += 2)
       {
           /* Output the Data */
           /* NB I am assuming the characters are encoded as UTF-16.
            * This means they are 16 bits inside a wchar_t object.
            * As the size of wchar_t varies between platforms I need
            * to take this into consideration and only take the bottom
            * 16 bits of each wchar_t object.
            */
           to[0]     = reinterpret_cast(from)[0];
           to[1]     = reinterpret_cast(from)[1];

       }
       from_next   = from;
       to_next     = to;

       return((to > to_limit)?partial:ok);
   }
};

2> Jon Skeet..：

我怀疑你的环境中sizeof(wchar_t)是4 - 即它写出UTF-32/UCS-4而不是UTF-16.这肯定是十六进制转储的样子.

这很容易测试(只需打印出sizeof(wchar_t)),但我很确定这是正在发生的事情.

要从UTF-32 wstring转换为UTF-16,您需要应用适当的编码,因为代理对开始发挥作用.

3> 小智..：

如果你使用C++11标准就很容易(因为还有很多额外的包含"utf8"可以永远解决这个问题).

但是,如果要使用旧标准的多平台代码,可以使用此方法使用流写入:

阅读有关流的UTF转换器的文章

stxutif.h从上面的来源添加到您的项目

以ANSI模式打开文件并将BOM添加到文件的开头,如下所示:

std::ofstream fs;
fs.open(filepath, std::ios::out|std::ios::binary);

unsigned char smarker[3];
smarker[0] = 0xEF;
smarker[1] = 0xBB;
smarker[2] = 0xBF;

fs << smarker;
fs.close();

然后打开文件UTF并在那里写下您的内容:

std::wofstream fs;
fs.open(filepath, std::ios::out|std::ios::app);

std::locale utf8_locale(std::locale(), new utf8cvt);
fs.imbue(utf8_locale); 

fs << .. // Write anything you want...

推荐阅读

程序员
在C#中的接口实现中使用继承的接口

如何解决《在C#中的接口实现中使用继承的接口》经验，为你挑选了1个好方法。 ... [详细]
程序员
打印任何STL容器

如何解决《打印任何STL容器》经验，为你挑选了2个好方法。 ... [详细]
程序员
当我需要总数时生成NAN

如何解决《当我需要总数时生成NAN》经验，为你挑选了1个好方法。 ... [详细]
程序员
响应式图像拉伸 - 基于y轴的网格？

如何解决《响应式图像拉伸-基于y轴的网格？》经验，为你挑选了0个好方法。 ... [详细]
程序员
显示分配堆栈跟踪时防止程序崩溃

如何解决《显示分配堆栈跟踪时防止程序崩溃》经验，为你挑选了1个好方法。 ... [详细]
程序员
斯坦福依赖解析器设置和NLTK

如何解决《斯坦福依赖解析器设置和NLTK》经验，为你挑选了1个好方法。 ... [详细]
程序员
SecCopyErrorMessageString在swift中给出"使用未解析的标识符"

如何解决《SecCopyErrorMessageString在swift中给出"使用未解析的标识符"》经验，为你挑选了1个好方法。 ... [详细]
程序员
为什么我还需要打开Swift字典值呢？

如何解决《为什么我还需要打开Swift字典值呢？》经验，为你挑选了1个好方法。 ... [详细]
程序员
ClassCastException Double to Float,即使我没有使用Double

如何解决《ClassCastExceptionDoubletoFloat,即使我没有使用Double》经验，为你挑选了1个好方法。 ... [详细]
程序员
Spring 4.2.3.RELEASE和Hibernate 5.0.4.Final兼容性问题

如何解决《Spring4.2.3.RELEASE和Hibernate5.0.4.Final兼容性问题》经验，为你挑选了1个好方法。 ... [详细]
程序员
表中是否始终需要主键自动增量？

如何解决《表中是否始终需要主键自动增量？》经验，为你挑选了1个好方法。 ... [详细]
程序员
当我无法访问CLI工具时,如何关闭JBoss Wildfly？

如何解决《当我无法访问CLI工具时,如何关闭JBossWildfly？》经验，为你挑选了1个好方法。 ... [详细]
程序员
循环python的多个变量

如何解决《循环python的多个变量》经验，为你挑选了1个好方法。 ... [详细]
程序员
PySpark和广播连接示例

如何解决《PySpark和广播连接示例》经验，为你挑选了1个好方法。 ... [详细]
程序员
onEndEditing和onBlur之间的区别？

如何解决《onEndEditing和onBlur之间的区别？》经验，为你挑选了2个好方法。 ... [详细]
程序员
以kdb +为单位的微秒条数据汇总

如何解决《以kdb+为单位的微秒条数据汇总》经验，为你挑选了0个好方法。 ... [详细]
程序员
找到版本号最高的标签

如何解决《找到版本号最高的标签》经验，为你挑选了1个好方法。 ... [详细]
程序员
javax.xml.ws.WebServiceException:java.io.IOException:写入服务器Tomcat 8时出错

如何解决《javax.xml.ws.WebServiceException:java.io.IOException:写入服务器Tomcat8时出错》经验，为你挑选了0个好方法。 ... [详细]
程序员
哪个数据库用于iOS和Android

如何解决《哪个数据库用于iOS和Android》经验，为你挑选了1个好方法。 ... [详细]
程序员
JAX-RS/Jersey路径参数regex用于简单字符串

如何解决《JAX-RS/Jersey路径参数regex用于简单字符串》经验，为你挑选了1个好方法。 ... [详细]

黄晓敏3023

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章