我试图用二进制模式的ofstream写一个wstring文件,但我觉得我做错了.这就是我尝试过的:
ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t)); outFile.close();
在例如Firefox中打开test.txt,编码设置为UTF16,它将显示为:
你好
谁能告诉我为什么会这样?
编辑:
在十六进制编辑器中打开文件我得到:
FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00
看起来我出于某种原因在每个角色之间得到两个额外的字节?
在这里,我们遇到了很少使用的语言环境属性.如果将字符串输出为字符串(而不是原始数据),则可以使区域设置自动进行适当的转换.
注意:此代码未考虑wchar_t字符的edianness.
#include#include #include // See Below for the facet #include "UTF16Facet.h" int main(int argc,char* argv[]) { // construct a custom unicode facet and add it to a local. UTF16Facet *unicodeFacet = new UTF16Facet(); const std::locale unicodeLocale(std::cout.getloc(), unicodeFacet); // Create a stream and imbue it with the facet std::wofstream saveFile; saveFile.imbue(unicodeLocale); // Now the stream is imbued we can open it. // NB If you open the file stream first. Any attempt to imbue it with a local will silently fail. saveFile.open("output.uni"); saveFile << L"This is my Data\n"; return(0); }
文件:UTF16Facet.h
#includeclass UTF16Facet: public std::codecvt ::state_type> { typedef std::codecvt ::state_type> MyType; typedef MyType::state_type state_type; typedef MyType::result result; /* This function deals with converting data from the input stream into the internal stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_in(state_type &s, const char *from,const char *from_end,const char* &from_next, wchar_t *to, wchar_t *to_limit,wchar_t* &to_next) const { // Loop over both the input and output array/ for(;(from < from_end) && (to < to_limit);from += 2,++to) { /*Input the Data*/ /* As the input 16 bits may not fill the wchar_t object * Initialise it so that zero out all its bit's. This * is important on systems with 32bit wchar_t objects. */ (*to) = L'\0'; /* Next read the data from the input stream into * wchar_t object. Remember that we need to copy * into the bottom 16 bits no matter what size the * the wchar_t object is. */ reinterpret_cast (to)[0] = from[0]; reinterpret_cast (to)[1] = from[1]; } from_next = from; to_next = to; return((from > from_end)?partial:ok); } /* This function deals with converting data from the internal stream to a C/C++ file stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_out(state_type &state, const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next, char *to, char *to_limit, char* &to_next) const { for(;(from < from_end) && (to < to_limit);++from,to += 2) { /* Output the Data */ /* NB I am assuming the characters are encoded as UTF-16. * This means they are 16 bits inside a wchar_t object. * As the size of wchar_t varies between platforms I need * to take this into consideration and only take the bottom * 16 bits of each wchar_t object. */ to[0] = reinterpret_cast (from)[0]; to[1] = reinterpret_cast (from)[1]; } from_next = from; to_next = to; return((to > to_limit)?partial:ok); } };
我怀疑你的环境中sizeof(wchar_t)是4 - 即它写出UTF-32/UCS-4而不是UTF-16.这肯定是十六进制转储的样子.
这很容易测试(只需打印出sizeof(wchar_t)),但我很确定这是正在发生的事情.
要从UTF-32 wstring转换为UTF-16,您需要应用适当的编码,因为代理对开始发挥作用.
如果你使用C++11
标准就很容易(因为还有很多额外的包含"utf8"
可以永远解决这个问题).
但是,如果要使用旧标准的多平台代码,可以使用此方法使用流写入:
阅读有关流的UTF转换器的文章
stxutif.h
从上面的来源添加到您的项目
以ANSI模式打开文件并将BOM添加到文件的开头,如下所示:
std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close();
然后打开文件UTF
并在那里写下您的内容:
std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt); fs.imbue(utf8_locale); fs << .. // Write anything you want...