cuốn sách gpt4 ai đã làm

swift utf16 数据流 - 分成 block 的问题

In lại Tác giả: Walker 123 更新时间:2023-11-28 08:29:20 28 4
mua khóa gpt4 Nike

我可以请求帮助将 UTF-16 数据流拆分成 block 吗?

不幸的是,很难找到字母边界。

任何帮助表示赞赏,已经花了几个晚上在这上面,很想了解这个问题。

运行良好的 Java 版本(是否有任何自动更正,即使在拆分前两个字节时输出也会给出正确的字符串作为第 2 部分?):

public static void main(String[] args) throws Exception {
String encoding = "UTF-16";
byte[] data = "ČŘŠŤĎŽŇčřšťďňě".getBytes(encoding);

System.out.println("Data size: "+data.length);

for(int index=2; index<= data.length / 2; index+=2)
{
byte[] part1 = java.util.Arrays.copyOfRange(data, 0, index);
byte[] part2 = java.util.Arrays.copyOfRange(data, index, data.length);

assert(part1.length + part2.length == data.length);

System.out.println("--------------------- "+index);

System.out.println(new String(part1, encoding));
System.out.println(new String(part2, encoding));
}
}

Java 输出:

Data size: 30
--------------------- 2

ČŘŠŤĎŽŇčřšťďňě
--------------------- 4
Č
ŘŠŤĎŽŇčřšťďňě
--------------------- 6
ČŘ
ŠŤĎŽŇčřšťďňě
--------------------- 8
....

Swift(Xcode 8 beta 6、Swift 3) Playground 代码:

import Foundation

let encoding = String.Encoding.utf16
let data = "ČŘŠŤĎŽŇčřšťďňě".data(using: encoding)!

print("Data size: \(data.count)")

for index in stride(from: 2, to: data.count/2, by: 2)
{
let part1 = data.subdata(in: 0..<>
let part2 = data.subdata(in: index..<>

assert(part1.count + part2.count == data.count)


print("--------------------- \(index)")
print(String(data: part1, encoding: encoding))
print(String(data: part2, encoding: encoding))
}

快速输出:

    Data size: 30
--------------------- 2
Optional("")
Optional("ఁ堁态搁ก紁䜁ഁ夁愁攁༁䠁ᬁ")
--------------------- 4
Optional("Č")
Optional("堁态搁ก紁䜁ഁ夁愁攁༁䠁ᬁ")
--------------------- 6
Optional("ČŘ")
Optional("态搁ก紁䜁ഁ夁愁攁༁䠁ᬁ")
--------------------- 8
Optional("ČŘŠ")
Optional("搁ก紁䜁ഁ夁愁攁༁䠁ᬁ")
--------------------- 10
Optional("ČŘŠŤ")
Optional("ก紁䜁ഁ夁愁攁༁䠁ᬁ")
--------------------- 12
Optional("ČŘŠŤĎ")
Optional("紁䜁ഁ夁愁攁༁䠁ᬁ")

如果我将 swift 编码更改为 String.Encoding.utf8,输出符合预期,但对于 utf16 和 utf32,我不明白发生了什么。

Cảm ơn.

câu trả lời hay nhất

简答:sử dụng utf16LittleEndian hoặc utf16BigEndian 编码获得预期结果:

Data size: 28--------------------- 2Optional("Č")Optional("ŘŠŤĎŽŇčřšťďňě")--------------------- 4Optional("ČŘ")Optional("ŠŤĎŽŇčřšťďňě")--------------------- 6Optional("ČŘŠ")Optional("ŤĎŽŇčřšťďňě")...

Longer answer: utf16 encoding converts the string to little-endianUTF-16 data, prepended by a byte-order marker:

let data = "abc".data(using: .utf16)!
print(data as NSData) //

当数据被拆分成两部分时,第二部分还没有不再是前导字节顺序标记:

let part1 = data.subdata(in: 0..<4)
let part2 = data.subdata(in: 4..<8)
print(part1 as NSData, part2 as NSData) // <62006300>

没有字节序标记的部分显然是转换错了它是现在假定的 big-endian 字节顺序:

print(String(data: part1, encoding: .utf16)) // Optional("a")
print(String(data: part2, encoding: .utf16)) // Optional("戀挀")
print(String(data: part2, encoding: .utf16LittleEndian)) // Optional("bc")

关于swift utf16 数据流 - 分成 block 的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39175962/

28 4 0
Walker 123
Hồ sơ

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá taxi Didi miễn phí
Phiếu giảm giá taxi Didi
Chứng chỉ ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com
Xem sitemap của VNExpress