Log in

No account? Create an account
09 December 2005 @ 03:10 pm
Various Things (On The Road)  
So, here I am in DIA, with at least a half-hour to kill (well, more now, my flight's been delayed). So, consider this a test of Xjournal offline, as well as a general entry.

As I'm on the PowerBook here, I suppose now is as good a time as any to talk about my impressions so far (having used it for a little more than a week now).

Got a bag and a bluetooth mouse -- I actually like the USB mouse I used before better, but it's okay now that I'm getting used to it. And it saves one of the two USB ports for an iPod, thumb drive, external USB HD or whatever (I have one of each of those). The bag is actually seeing its first use this weekend -- I'm only carrying the powerbook to Houston with me, the last week or so I've been carrying around both laptops in the bag I have for the 17" Dell. As for what kind of bag it is -- well, they didn't have the super-cheap ($25) Targus bags at Micro Center when I went shopping, so I got the $49 bag instead. It's a fairly nice bag -- I'm happy with it.

Got the carbonized version of emacs, which is pretty slick. And already has the ruby mode installed, which is more than I can say about the last few installations of Emacs I've used (two Linux, one Windows). It's actually really nice, the only complaint about it I have is the fact that I can't get it to do reverse video (a flaw it seems to share with the Windows version). Well, and the icon, which is pretty crappy.

Adium is very nice (or at least seems to be, what little I've used it so far). It is (of course) exactly what I was looking for. I do already like it better than Trillian, which doesn't take much -- Trillian does exactly what I want (and does it better than GAIM), but other than that, it faintly sucks.

Tensai is okay as far as it goes -- it works fine looking up the English meaning of Japanese words, but it kinda sucks trying to find Japanese words from the English. Of course, that's also been true of every other Japanese computer dictionary I've used to one extent or another, but I think I'll be continuing to search for something a little more comprehensive (also, something that uses EDICT and ENAMDICT, as I do like those dictionaries quite a bit). I've been thinking of maybe trying to get GTK to work and to try using GJiten (which I use under Linux), but I'm not sure that would be worth the effort (GJiten is nice -- but also rather unstable).

Grabbed a couple of LJ clients, and obviously settled on Xjournal here, after checking out the online communities for them, etc. Xjournal just seemed to be a little better supported, although I suspect iJournal would have been fine, too. We'll see how it goes.

Also grabbed a lot of other open source stuff (scribus, vlc, mplayer, postgresQL, inkscape, blender, azureus), but haven't messed with any of them yet, and probably won't until I have some use for them. Which I suppose means that some of those install packages may sit on my hard drive forever unused.

And finally, the fun you can have repeatedly select expose and dashboard by selecting the corners of your screen should probably be illegal.
In the mood: deferred
Justingoatbag on January 11th, 2006 04:34 am (UTC)
Hi Doug, I'm Justin, the guy that's making Tensai. I found your post through google's blog search, hope you don't mind me commenting here.

What do you want to see changed with its E-J lookups? There's only so much I can do, since the dictionary files were designed for J-E use. Looking for compounds that mean "pretty" won't find you anything that means "beautiful" for instance. I'd have to include an entire thesaurus and cross-reference queries against it to guess at what you're looking for, and then the nuanced meanings of some words would be lost.

I'd really like to know what you'd like changed in Tensai, so please reply to this or email me at justin@tensaimac.com if you have the time.

Also, the JMdict used by Tensai is a superset of the EDICT, in XML format instead of the flat text of the EDICT format. The people at Monash University, who maintain the dictionaries, are trying to phase out the EDICT format files.
Douglas Triggs: languagedoubt72 on January 11th, 2006 10:57 am (UTC)
Re: Tensai
Hi, Justin. You're more than welcome to comment here.

Anyways... You're right -- it's a lot harder to translate into the language you don't know than into the language you do. You already have the proper word in the second case (and hence only need the meaning), it's more difficult to choose the right word when you're trying to translate into the language you don't know because you don't know whatever shades of meanings the words to select from might have, nor how common they may be, etc.

That's a difficult problem, of course, and not easily solved, but along those lines, I found the "top 20k words" metadata useful for choosing which word to use. As far as I know, you're using it to sort the results, but without knowing that for sure (and there's no docs that say one way or the other, and that information isn't showing up in the search results themselves, unlike WWWJDIC and the like), it makes me feel a little nervous about choosing words when I'm trying to compose Japanese "blind" without embarrasing myself. And even if they're sorted in that order, I'd still like to know where the "top 20k" ends and the more unusual forms begin, as that's a useful data point (of course, even finer frequencies would be nice, but last I checked, "top 20k" was it).

That's really my chief complaint. Another thing I'd like to see is the inclusion of the more specialized dictionaries (especially ENAMDICT -- or JMnedict? -- etc.) which I used to use quite a bit, and maybe some sort of basic extension to hit KANJIDIC[2] to see information on the Kanji (a full search interface would be great, I suppose, but that's a project unto itself, simply having a display of the Kanji's particulars for a specific search result would be nice).

Other than that, it's a fairly clean program and I've come to appreciate the simplicity of it -- although it does load a little slowly on my PowerBook (I suspect because it's loading the dictionary. Not that a G4 PowerBook is exactly a powerhouse by any definition, but it's the sort of app that you'd expect to be a little more lightweight. At least it's nowhere near as bad as NeoOffice or Firefox, which are downright dogs, and it's just the load that's slow, it's perfectly peppy once it's up).

Oh, and it needs an icon. :) (Need a volunteer for that?)
Justingoatbag on January 11th, 2006 10:31 pm (UTC)
Re: Tensai
Oh no, don't worry about asking too much. Most of the feedback I get is either "all i want is an icon" or "It needs voice recognition and laser beams." I'm just glad you're asking for something reasonable and worthwhile.

Marking frequent words won't be difficult, but I am curious how you would like them marked. One way I had been thinking of marking them was to change the bullet on the left to a star or some other symbol for those entries. Would you prefer them marked by a shape, by coloring the text, or something else? I'm also considering making Tensai only return the most frequent results first, then all results if you ask for them. There are actually 4 or 5 different kinds of frequent words in JMdict, and they're mostly measuring a word's frequency in newspapers, which isn't always a good metric due to the differences between newspapers and normal language in Japanese.

Adding JMnedict isn't difficult, but including it would make the program huge. Tensai runs on a custom search engine of mine, which I only wrote because Apple's SearchKit library was excruciatingly slow for OS X 10.3 when dealing with foreign languages — it would have taken it days to index the whole JMdict on a fast computer. Tensai, on the other hand, can index the whole JMdict in under 20 minutes on my 5 year old PowerBook. The indexes it creates are enormous though, usually around the same size as the dictionaries from which they come. JMdict is 36 MB uncompressed and makes a 30 MB index. JMnedict is 84 MB uncompressed and would make a comparably sized index. So adding JMnedict would double or triple the size of Tensai I'm guessing. Also, JMdict contains definitions not just in English but in like 6 other languages. To keep file sizes down, I only index English meanings, but at some point I'd like to make those other definitions available to users in other countries.

I spent the majority of my time before the beta researching information retrieval algorithms and trying to get index sizes and load times down to something reasonable. I feel Tensai's at least reasonable right now in terms of file size and speed. The new SearchKit in 10.4 is orders of magnitude faster than 10.3's, so at some point in the future — I'm thinking 1.5 — I'll make Tensai require 10.4 and use SearchKit.

So the short answer is that JMnedict and other specialized dictionaries won't get in until after 1.0 due to technical limitations.

Kanji lookups from JMdict will be easier in 0.92. It has contextual menu items which let you look up the selected text in JMdict or kanjidic2, so you won't have to do a 'copy, switch dictionaries, paste, hit enter' routine.

To look up kanji by nanori, do the "Textual" or "Reading" search types. "Textual" searches by kanji, readings, nanori, and meanings. "Reading" searches by reading and nanori.

There's a lot of work still to be done for kanji lookup, especially in terms of how its displayed. I'll start tackling that either in 0.93 or 0.94.

As for the icon, I've made one and it will be in 0.92 — which incidentally should be out within a week. I'm thinking it'll just be a temporary icon though. I'm still open to icon submissions/suggestions.

Also (not sure if you know about it) you can select text and type cmd-shift-t in just about any app (e.g. not in Mail and TextEdit) to launch Tensai and look up the selected text. It seems like people find that feature useful.
Douglas Triggs: taodoubt72 on January 12th, 2006 09:11 am (UTC)
Re: Tensai
Heh. Laser beams.

But seriously -- for frequent words, I don't care how they're marked, as long as they're marked. As for using newspaper frequency -- it's better than nothing, and for most things, works fine (it's only for more conversational forms that it really breaks down -- but what are the other frequency markers in JMdict? Assuming that's the database behind WWWJDIC, which I'd expect it to be, I've only seen the one. And I don't remember EDICT having more). And having only the frequent words returned with an option for more doesn't sound like a bad idea.

Given the size of JMnedict, I suppose it might make sense to have it load optionally somehow. Dunno how you'd do that, though, I'd have to think about it. Maybe an optional package that would get loaded only if installed? Makes things more complicated, though (which wouldn't bother me, but might some).

As for なのり -- was it? I thought I tested that, and it wasn't matching things I thought it should match -- okay, I checked again, and I just missed the kanji I was looking for. However, that does bring up one thing -- the ability to sort kanji results by various things, such as grade, frequency, strokes, and what have you would be nice.

Anyways... I assume you'll announce the new version in your blog; I've added it to my reading list.

I'll mail you the icon I threw together for Tensai on my box; honestly, I'm not all that happy with it, but I guess it looks okay. If you've got any other ideas for designs, toss them at me and I'll give it a shot. Or random web art or whatever (everything I do seems to turn into a LJ icon eventually, too).
Douglas Triggs: languagedoubt72 on January 11th, 2006 11:02 am (UTC)
Re: Tensai
Oh, and call me an idiot, now I see it does do kanji lookup, just not the way I expected. Be nice to have a button that would send you there hanging off a vocabulary entry as a shortcut, though, say if you were searching via kana or English, instead of (in the latter case, especially) having to cut and paste or retype the entry and switch dictionaries.
Douglas Triggs: taodoubt72 on January 11th, 2006 11:08 am (UTC)
Re: Tensai
Oh, and the じょうよう level, and ability to search by なのり. But those aren't particularly major.

(See? Ask me a question, and I want the moon.)