BidiWiki

About the Bidi Wiki

This is a logically separate Wiki, but technically it's part of [MacMac] that is kindly hosted by [WWW] http://plonter.co.il. Perhaps it will be moved to a really separate Wiki one day but there is no pressing need. I refer you to [Wiki]WikiWikiWeb to learn what a Wiki is, [Wiki]WhyWikiWorks, etc. All bidi content should be in subpages of this page (see [[WWW] http://twistedmatrix.com/users/jh.twistd/moin/moin.cgi/HelpOnEditing_2fSubPages MoinMoin's HelpOnEditing/SubPages] for how to use them).

This Wiki is devoted to handling of bi-directional text ("bidi"), which routinely occurs in /RightToLeftScripts, by computers. It's mainly devoted to programmers working on these matters but is intended to be useful to users who just want to understand more about it. It also targets programmers who don't use any such language but want to support bidi (which is one reason for using English); if you are such a person be sure to read /WhatIsBidi. Eventually we should prepare a friendly /HowToSupportBidi guide for them. The initial purpose of this Wiki is to collect and document what is known about bidi handling, because there no such resource is available yet. Then we can proceed to discuss the future.

One thing which does not belong here is settings and workaround for bidi support in specific programs - because there are already places for it, e.g. on the [WWW] IGLU FAQ (arabic links anybody?). However, if you find no better place this policy could be changed.

/!\ Work very much in progess, just started.

The real stuff: bidi handling by computers

From here on, it's assumed that you know /WhatIsBidi. Now we want computers to support it. I'll try to structure the problem into parts; note that most names are my inventions for this task, improvements welcome... -- /BeniCherniavsky

In the background of all bidi support on computers looms the /MinorityFactor: there are not enough motivated hackers to ensure bidi support in all programs, unless it requires *really* minimal changes. So we need to find a really smart architecture to allow this.

The early efforts to get /RTL text out of computers were based on /VisualOrder, making it trivial to display text but extremely inconvenient to edit and process it (even such basic thing as /LineBreaking don't work!).

Forward direction: bidi display

The general concensus now is that /VisualOrderIsEvil and one should use /LogicalOrder as far as possible. This allows to set aside the /DisplayProblem: given text in /LogicalOrder, produce correct mixed-order display. Implementing this a bit tricky but not challenging and has been done, many times. Actually this is not just about to text order on the paragraph level (/LogicalToVisual), but also about text layout: left vs. right alignment, etc. (/BidiLayout). This means that you need to make layout engines aware of bidi; the /CoordinateFlipping trick can make this relatively easy.

Can you now forget about bidi for the rest of your system? Mostly yes but not entirely. You still need to record which parts of the text go in which direction, and how do they nest. Let's call this the /EncodingProblem. You can't avoid it because the /EncodingProblemIsNotTrivial. This is probably the most important, challenging and creative part of the bidi architecture, it's /WhyBidiIsInteresting IMHO. If you get this right, you will have happy users, if you get it wrong, you will just never get bidi working in all the applications you need.

We already said that the right encoding should follow logical order. Another important distinction is between /ExplicitBidi encodings where the bidi information must be supplied by special codes and /ImplicitBidi encodings part of the information is inferred from the text itself. Explicit encodings are much more irritating to users and are usually a bad idea in user-visible formats. Even if not visible they require more bidi-awareness from programs that process them. However, implicit formats are never enough by themselves - I'd estimate that you need to override the implicit "guessing" at least once per page of text, and in some documents much more frequenlty.

Unicode specifies a standard /UnicodeBidiAlgorithm for the /DisplayProblem (but it ignores /BidiLayout issues). It's input is Unicode plain text. Unicode includes several characters (/UnicodeBidiMarks) to override the default behaivor. The resulting encoding is implicit in the simple cases but more explicit than it could be in complex cases. It also suffers from too much complexity and subtlety - very few people understand all it's implications. No seriosly better scheme has been implemented to date, partly because it's the only standard we have got so nobody dares break compatibility with it.

What about /HigherLevelFormats? Since the /EncodingProblemIsNotTrivial, to allow bidi output from any format, it must be augmented with new constructs to represent bidi information. This is a major inconveninece and any improvement here would be very welcome.

One more issue about complex formats can be called the /SourceProblem: when text undergoes complex transformations, the bidi context may change. This makes it hard to edit bidi source texts because the editor generally can't apply bidi the way it will happen in the end. This problem could be reduced by applying domain-specific knowledge but this is little-practiced so far. First we need to figure out an easy way to provide /BidiHints carrying such domain-specific knowledge.

Backward direction: bidi editing

The next step in complexity comes when you close the loop from the user's back to the orignal text. Any application involving such feedback recieves mouse / keyboard information related to what the user sees on screen, which is in /VisualOrder. The minimal thing to do is to be able to translate from /VisualToLogical order to recover the positions in the /LogicalOrder text. However the /LogicalToVisual transform is not reversible, so to go in the back direction you must record the correspondence during the forward direction.

last edited 2008-10-13 02:43:04 by ניר סופר