February 24th, 2006

userinfo senji
2006/02/24 17:44:00 - One Encoding to rule them all,
One Encoding to find them,
One Encoding to bring them all
And in the darkness bind them.

Since chiark has now moved to Debian sarge, I feel it is time to uproot my life from the simple realms of ASCII/Latin-1 and venture forth into the hazardous lands of Unicode. This has been done many a time before, and I'm sure someone must have written a document on the process, settings, and pitfalls of such a migration; does anyone happen to know of one?
Current Mood: [mood icon] geeky
Entry Tags: geeky, unicode, unix

< | 5 glosses | comment | > )

userinfo Felix [ex_http://www.matthew.ath.cx/564]
2006/02/24 18:38:51
For just IRC I have an irssi/screen howto here: http://www.matthew.ath.cx/misc/utf8 - I got bored of telling people repeatedly how to do this, so wrote it down

reply | thread )
userinfo pm215
2006/02/24 23:02:27

I recommend either (1) setting LANG as above but setting LC_COLLATE=C or (2) setting LANG=C and LC_CTYPE=en_GB.UTF-8. The rationale is that you must have the charset setting as UTF-8, that you really want POSIX collation settings (unless you like weird case insensitive sortings messing up all your command line runes) and it's pretty irrelevant what you set the rest to most of the time.

defutf8 on is the rune to put in your .screenrc.

pluto is a cute hack I run IRC in -- it turns iso-8859-x into UTF-8 (while letting real UTF-8 through) so you get the best of both worlds. Unfortunately when fed ambiguous data it assumes iso-8859-x rather than UTF-8; I must get round to fixing that.

Hopefully senji wasn't foolish enough to create files with names or data in iso-8859-x, as that will complicate the transition :-)

reply | parent | thread )
userinfo senji
2006/02/24 23:25:42
I have a very limited number of files in iso-8859-1, but not enough to worry about. My oggs are already unicoded (which makes them hard to reading in iso-8859-1, but...).

I'm also worried by simpler questions like "will my fixed support unicode?" and "where do I get a unicode-capable console font from?". Oh, and "how do I enter all these funky characters into things?".

"Will nmh cope?", "what about trn" etc...
reply | parent | thread )
userinfo pm215
2006/02/25 10:42:01

If you're using pterm you probably need this X resource:

pterm.Font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1

xterm has support for finding the right font by default, if invoked as uxterm or via the locale-checking wrapper lxterm. (Make sure the x-terminal-emulator alternative is pointing at the right thing.)

trn's i18n support is nonexistent anyway, it can't get any worse.

nmh seems to work fine, at least to the extent that it worked before.

unifont is a handy 'everything' font (it has a Debian package).

reply | parent | thread )
userinfo ghworg.myopenid.com
2006/02/25 17:49:12
I found this one useful http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/HOWTO/howto.html

reply | thread )

< | 5 glosses | comment | > )

One Encoding to rule them all, - Squaring the circle... — LiveJournal

> log in
> recent entries
> fiends
> archive
> toothywiki page
> profile
> new entry
> recent comments

> go to top