sách gpt4 ai đã đi

C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?

In lại 作者:太空宇宙 更新时间:2023-11-04 03:14:37 29 4
mua khóa gpt4 Nike

câu hỏi:

如函数的 C 引用页所述,c16rtomb,来自 CPPReference ,在注释部分下:

In C11 as published, unlike mbrtoc16, which converts variable-width multibyte (such as UTF-8) to variable-width 16-bit (such as UTF-16) encoding, this function can only convert single-unit 16-bit encoding, meaning it cannot convert UTF-16 to UTF-8 despite that being the original intent of this function. This was corrected by the post-C11 defect report DR488.

在这段话的下方,C 引用页面提供了一个示例源代码,上面有以下句子:

Note: this example assumes the fix for the defect report 488 is applied.

这句话暗示有一种方法可以采用 DR488 并以某种方式将修复程序“应用”到 C11 标准函数 c16rtomb.

我想知道如何为 GCC 应用修复程序。因为在我看来,从 v141 开始,该修复程序已应用于 Visual Studio 2017 Visual C++。

在 GCC 中看到的行为,在 GDB 中调试代码时,与在 DR488 中发现的一致,如下所示:

Section 7.28.1 describes the function c16rtomb(). In particular, it states "When c16 is not a valid wide character, an encoding error occurs". "wide character" is defined in section 3.7.3 as "value representable by an object of type wchar_t, capable of representing any character in the current locale". This wording seems to imply that, e.g. for the common cases (e.g, an implementation that defines __STDC_UTF_16__ and a program that uses an UTF-8 locale), c16rtomb() will return -1 when it encounters a character that is encoded as multiple char16_t (for UTF-16 a wide character can be encoded as a surrogate pair consisting of two char16_t). In particular, c16rtomb() will not be able to process strings generated by mbrtoc16().

粗体文字是所描述的行为。

源代码:

#include 
#include

#define __STD_UTF_16__

int chính() {
char16_t* ptr_string = (char16_t*) u"我是誰";

//C++ disallows variable-length arrays.
//GCC uses GNUC++, which has a C++ extension for variable length arrays.
//It is not a truly standard feature in C++ pedantic mode at all.
//https://stackoverflow.com/questions/40633344/variable-length-arrays-in-c14
char buffer[64];
char* bufferOut = buffer;

//Must zero this object before attempting to use mbstate_t at all.
mbstate_t multiByteState = {};

//c16 = 16-bit Characters or char16_t typed characters
//r = representation
//tomb = to Multi-Byte Strings
while (*ptr_string) {
char16_t character = *ptr_string;
size_t size = c16rtomb(bufferOut, character, &multiByteState);
if (size == (size_t) -1)
phá vỡ;
bufferOut += size;
ptr_string++;
}

size_t bufferOutSize = bufferOut - buffer;
printf("Size: %zu - ", bufferOutSize);
for (int i = 0; i < bufferOutSize; i++) {
printf("%#x ", +(unsigned char) buffer[i]);
}

//This statement is used to set a breakpoint. It does not do anything else.
int debug = 0;
trả về 0;
}

Visual Studio 的输出:

Size: 9 - 0xe6 0x88 0x91 0xe6 0x98 0xaf 0xe8 0xaa 0xb0

GCC 的输出:

Size: 0 -

1 Câu trả lời

在 Linux 中,您应该可以通过调用 setlocale(LC_ALL, "en_US.utf8");

来解决这个问题

Về ideone 的示例

此函数将执行以下操作,如 Microsoft documentation Như đã nêu trong:

Convert a UTF-16 wide character into a multibyte character in the current locale.

POSIX 文档类似。 __STD_UTF_16__ 在这两个编译器中似乎都没有效果。它应该指定源的编码,应该是 UTF16。它没有指定目的地的编码。

Windows 文档似乎更不一致,因为它似乎暗示 thiết lập ngôn ngữ 是必需的或转换为 ANSI 代码页是一个选项

关于C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53148386/

29 4 0
Bài viết được đề xuất: javascript - 如何使用多个请求创建 Promise.all
Bài viết được đề xuất: python - 使用 iGraph 定位边缘标签
Bài viết được đề xuất: javascript - 如何使用 Firebase 函数从 Firestore 获取值(value)?
Bài viết được đề xuất: c - realloc,用于 C 中的字符串数组
太空宇宙
Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá Didi Taxi miễn phí
Mã giảm giá Didi Taxi
Giấy chứng nhận ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com