
Encoding Behavior Could've Been Better
Reported by Wil | December 29th, 2010 @ 12:26 PM
What I did:
- Open a non-utf8-encoded file (in my case, Shift-JIS)
- The encoding detection algorithm of Kod will fallback to using UTF-8 encoding (which is wrong).
- Attempt to fix the file by adding an "encoding: shift-JIS" comment line at the top.
- Save the file. At this point, Kod will save the file with the xattr of "utf-8".
- Reopen the file, Kod will first look at the xattr and resolve the encoding to "utf-8". Instead of "Shift-JIS"
Note that step 3 and 4 could be done with TextEdit (since it doesn't write the file's xattr) and reopening it in Kod will show the correct encoding... though this doesn't quite "solve the problem".
What I expected to happen: The behavior is pretty much up to discussion. Though here are the suggestions I offer:
- Change the order of detection: honor the explicit "charset/encoding" line in the file over the xattr of the file. The rationale behind this is that it's easier to modify the file's contents than manually editing the xattr.
- If encoding detection utterly fails, instead of defaulting to UTF-8 or "NSISOLatin1StringEncoding", just ask the user what encoding to use.
- In the event the encoding detection utterly fails, don't write the xattr when saving.
- In the file open dialog, provide an option to selection which encoding to use (default to "auto-detect")
- The status bar could be more useful by showing the current encoding of the file plus the file type.
- Provide a menu for the user to select which encoding to use.
Upon inspecting the code however, this is impossible without
reloading the file (the NSData is released after assigning the
converted text to the view), which is not a good solution because:
- it discards the changes in the current buffer and asking the user if he wants to save first is kinda funky (what if the file is over HTTP?)
- it will throw away the undo history
- if the file is over the network, it will take some time to load again (I don't think Kod caches files over networks)
I'm willing to work on this but I'll need some inputs from the devs on what is the best course of action. I don't think implementing all of the above items will be good (6, for instance is already a no-go).
A related question to this is "can the user select the encoding to use when saving?" because right now, the encoding when saving will always be UTF-8 unless the file opening already has a previous marked encoding (xattr, file markers, etc.).
Comments and changes to this ticket
-
rsms December 31st, 2010 @ 04:20 PM
- State changed from new to open
- Assigned user set to rsms
- Tag set to encoding, file, text, writing
First; thanks for a very good and "meaty" ticket.
I've given this some thought and believe Kod should only be able to write Unicode data, but be able to read whatever encoding. The world of text encoding is a scary place which streets are slowly being cleaned up by Unicode.
This is what should happen:
-
Read a file and interpret it in any encoding possible (should have a vast support for different encodings)
-
Text is stored internally as UTF-16 (host byte order AFAIK)
-
Upon writing to file, encode the text as UTF-8 OR is an explicit output encoding has been chosen, use that (the list should be fairly short).
List of output encodings:
- UTF-8
- UTF-16
- UTF-32
- ISO-8859-1 (aka Latin-1)
For the record, TextMate only allows writing files using MacRoman, UTF-8, UTF-16 or Latin-1 text encoding.
-
surin May 28th, 2022 @ 08:22 AM
กินหมูกระทะให้ได้เงิน pgslot เว็บ pg slot กับการหาเงิน แบบง่ายๆได้ เพื่อมาปั่นสล็อตออนไลน์ นั่งชิว pg slot กินหมูกระทะ ได้อย่างเต็ม ที่สบายใจ “ความสด” และ “กลิ่นหอม” ของเกมสล็อตออนไลน์
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.