Coding for Right-To-Left text in DotNet (using VB.NET and Syriac) 

I want to write some software to work with unicode text which is entered and displayed right-to-left.  However I am NOT running Arabic Windows or any other special version: the software must work on 'normal' English US/UK Windows XP.  My development environment is Visual Studio 2003.

It can be done, and I include a simple sample application, done in VB.NET for simplicity.  But there are an awful lot of pitfalls along the way.

Special note: my interest is Syriac, but most of this will apply equally to Arabic or Hebrew.  The fonts chosen will be different.



1.  Getting set up to enter unicode Syriac text

The best way to test all this out is to make sure that you can enter text in Microsoft Word.  If it doesn't work there, you've done something wrong.

1.1  Install and activate the Syriac keyboard.

Microsoft provide two Syriac keyboards.  You have to install this specially, however.  These scripts are known to Microsoft as "complex scripts".

A.  Activate Language Bar (instructions cribbed from here)

B. Install East Asian and Complex Script Utilities (instructions cribbed from here)

Two Syriac keyboards are available in Windows, but you may have to install it from the Windows System disk because it is a complex script. Follow the instructions below.

  1. Go to Start » Settings » Control Panels to open the Control Panels window, then click on the Regional and Language Options icon.
  2. In the Regional Options window, click on the Language tab on the top.
  3. In the Languages tab see if the options for "complex script and right-to-left languages" or "East Asian languages" are present and checked. If they are checked, proceed to Step #6.


    Language Tab of Regional Control Panel with checks to install complex languages.

  4. Check the options for the scripts needed.  For Syriac you only need the first box ("complex scripts").  The "East Asian" ones are Chinese, etc, and require HUGE fonts -- don't check this. (You may be asked to insert the Windows XP CD-ROM with the system files, but I was not asked to). The install may take few minutes and will require you to reboot the machine.
C - Activate Keyboards
  1. Once all the scripts have been installed, click the Details button in the Languages tab to open the Settings window.
  2. In the Settings window, click on the Add button to open the Add Input Language window
  3. From the Input language drop-down list, select a language from the drop-down menu, then a Keyboard Layout/IME corresponding to the language keyboard you wish to activate.   Note that for Syriac there are two keyboards.  DO NOT INSTALL BOTH.  The first keyboard, "Syriac" has a strange layout.  The second keyboard, "Syriac phonetic" is the one that you will use, so choose that.  Click OK to close the Input window.
  4. Back in the Settings window, you should see the new language or keyboard listed in the Input language menu. (Do not make the added language the default!)
    Regional Control Panel, Input Details, Properties Button to right

  5. You will normally switch between keyboards by clicking the language bar at the top right of your screen.  But if you want to assign some keyboard shortcuts to switch between multiple keyboards, make sure you are in the Settings window (shown in Step #4) , then click the Key Settings button on the bottom to open a new window. Adjust the sequence as desired.
  6. Click OK to close the Regional Options windows.
    NOTE: If you close a window without clicking OK, none of the settings will be changed.
  7. You can also minimise the language bar, so that it is merely an icon on your task bar.  Do this by clicking the horizontal line at the top right of the language bar.  You change language by simply clicking on the minimised icon.  This interferes a lot less with your work.

You are now ready to input Syriac text.

1.2  Get the Meltho fonts and macros

This is about getting the right fonts.  It will apply also to Arabic and Hebrew, but I have no information on what fonts should be obtained.  So what follows is mainly about Syriac.

To enter Syriac text you will need a windows font that (a) is a unicode font and (b) includes characters for the Syriac letters.  Some people imagine that all unicode fonts include all the symbols defined for unicode, but this is not so.  Only a few (e.g. Titus Cyberbit, Arial Unicode MS) include these.  For Syriac the actual letters can appear in three different forms anyway, depending on whether they are written in the ancient Estrangelo script (as visible in the Estrangelo Edessa font, which ships with Windows XP -- use charmap to browse the characters that this font contains), or the later West Syrian or East Syrian scripts.  The symbol Alap means the same in all three scripts (=A), and has the same code in unicode (known as a "code point"), but is visibly rather different in the three scripts.

A pack of fonts is available free online.  These are the "Meltho" fonts.  

1.3 Microsoft Visual Keyboard

The Microsoft Visual Keyboard is a utility which allows you to view the keyboard layout for each Input Locale within Microsoft Office applications.  You will find it most useful to see what key will give what result.

You can download the utility onto your own computer from http://office.microsoft.com/downloads/2002/VkeyInst.aspx. Follow the posted instructions to install and use.

The Visual Keyboard can be opened from Start » All programs » Microsoft Office Tools » Microsoft Visual Keyboard. Switch to the appropriate keyboard in the Language Bar to see its layout. Keys highlighted in white are typically "hot keys" for adding accents.

The image below shows a sample layout window of a Hebrew keyboard as seen in the Microsoft Visual Keyboard.

XP Visual Keyboard

It also allows you to enter text by hitting those keys.

1.4 Entering Syriac text right-to-left in Microsoft Word

Start up Word.  You should get an extra menu item "Meltho".  (If you do not, then probably your security settings for macros are too high -- put them down to medium).

Now change your language to "Syriac" by left-clicking on the language bar hovering at the top right, and the keyboard to "Syriac phonetic" by left-clicking on the bit of the language bar to the right of the keyboard icon.

Now change the font in Word to Estrangelo Edessa, font size to 20.  Click on the page, and hit MALKA.  This should appear, right-to-left (i.e. as AKLAM).  If it does not, you've done something wrong; go back and recheck your steps.

You can repeat this exercise in Notepad as well, and it will work.

1.5 Keystrokes for Syriac phonetic keyboard for Serto Jerusalem font

The Meltho macros show layouts for the Estrangelo consonants, but most of us learn Serto first.  Here are the key-mappings (case is important):

Consonants Vowels (above) (below)
alaph a mim m ptaha  (a) Q A
beth b nun n zqafa (ā) W S
gamal g semkat s rhboso (e) E D
dalat d ayin i hsoso (i) R F
he h pe p usoso (u) T G
waw w tsade x
zayn z qop q Other marks
het ; resh r seyame I
tet t shin v qussaye P
yod y taw j rukkaka :
kaph k underscore L
lamad l full stop .
end of paragraph ,

In word, you can change keyboard from English to Syriac mid-sentence and the direction of the characters will reverse.  You can then change back, and fro, as many times as you like.

1.6 Problems

1.6.1  It doesn't seem to work in Word 2000 on Windows XP.

Correct.  It won't.  You must use Word XP.

1.6.2  It works in Word XP on Windows XP but I can't see any diacritics

Mark Dickens writes:

I've discovered what the problem is. My wife had said to me, "It's probably some little box you haven't ticked somewhere" and indeed it was. So, FYI (in case you run across this problem again), in Word XP (and I assume later versions of Word, but not in Word 2000 or earlier versions), under Tools/Options there is a tab called Complex Scripts. On that tab, there is a section Show and under that is a box for Diacritics to select. Once I selected it, my diacritics are showing up fine. 

As to how it got unselected or whether Microsoft Office just assumed I wouldn't want to see my diacritics and so installed itself with that box unselected, I will never know. One of life's little mysteries...


2.  Using right-to-left unicode text in a Windows application

This is actually simple, but you will not find this out from any other source on the net.  There are also some real limits on what you can do.

2.1  A RichTextBox will not work!

Trust me on this.  You can spend as much time as you like on this, but you will NEVER be able to get a RichTextBox to support rich-to-left text entry and display as Word and Notepad do.  It doesn't work.  You can set the "right to left" property, you can set "right align", you can mirror the control (which also doesn't work); nothing will make any difference.

Solution: use the TextBox control, with multiline=true.  This DOES work.

2.2  Mirroring

Some controls can be flipped so that they display completely right-to-left.  This isn't hard to do in VB.NET either.  Microsoft document this here:

Here are the controls which do allow mirroring:

Control Should not allow layout inheritance
Listview No
Panel Yes
Statusbar Yes
Tabcontrol Yes
TabPage Yes
Toolbar No
TreeView No
Form Yes
Splitter Yes

To mirror a form, add this to the top of the form.vb file:

Public Class Form1
Inherits System.Windows.Forms.Form
Const WS_EX_LAYOUTRTL = &H400000
Const WS_EX_NOINHERITLAYOUT = &H100000
Protected Overrides ReadOnly Property CreateParams() As System.Windows.Forms.CreateParams
Get
	Dim CP As System.Windows.Forms.CreateParams = MyBase.CreateParams
	If Not MyBase.DesignMode() Then
		CP.ExStyle = CP.ExStyle Or WS_EX_LAYOUTRTL
		''Or _
		'' WS_EX_NOINHERITLAYOUT
	End If
	Return CP
End Get
End Property
'-- rest of code below

The WS_EX_NOINHERITLAYOUT specifies whether child controls should inherit the mirroring or not.  I have it commented out here, since I wanted to see the reversed buttons, and not just a reversed titlebar on the form.

To mirror a control, you need to subclass it, and add something similar at the top.  I created a myRichTextBoxClass1.vb, which started:

Imports System.ComponentModel

    Public Class MyRichTextBoxClass1
    Inherits System.Windows.Forms.RichTextBox

    Const WS_EX_LAYOUTRTL As Integer = &H400000
    Const WS_EX_NOINHERITLAYOUT As Integer = &H100000

    Private _mirrored As Boolean = False

    <Description("Change to the right-to-left layout."), _
        DefaultValue(False), Localizable(True), _
        Category("Appearance"), Browsable(True)> _
        Public Property Mirrored() As Boolean
        Get
            Return _mirrored
        End Get

        Set(ByVal Value As Boolean)
            If _mirrored <> Value Then
                _mirrored = Value
                MyBase.OnRightToLeftChanged(EventArgs.Empty)
            End If
        End Set

    End Property

    Protected Overrides ReadOnly Property CreateParams() _
        As System.Windows.Forms.CreateParams
        Get
            Dim CP As System.Windows.Forms.CreateParams = _
                MyBase.CreateParams
            If Mirrored Then
                CP.ExStyle = CP.ExStyle Or WS_EX_LAYOUTRTL 'Or _
                'WS_EX_NOINHERITLAYOUT
            End If
            Return CP
        End Get
    End Property

    ' Rest of control code here

While this does NOT work for RichTextBoxes, since they won't mirror, it would work perfectly well for a TreeView control.

2.3 Demo of VB.NET 2003 application

A .zip of the project and all its files is here.  This will also work with VB.NET 2005, which is available for free download from the Microsoft site.

2.4 Handling Unicode in VB.NET

Just assign the value in the textbox to a string.  It will work fine!  You can split it into characters using tochararray().  Here are some sample bits of code:

    Public AscArray() As Char = {"A", "B", "G", "D", "H", "W", "Z", _
       "h", "t", "Y", "K", "L", "M", "N", "S", _
        "E", "P", "z", "Q", "R", "s", "T", _
        "'", ",", "*"}
    Public IntArray() As Integer = {&H710, &H712, &H713, &H715, &H717, &H718, &H719, _
       &H71A, &H71B, &H71D, &H71F, &H720, &H721, &H722, &H723, _
       &H725, &H726, &H728, &H729, &H72A, &H725, &H72C, _
       &H741, &H742, &H308}

    '-- Take an ascii char in sedra encoding and return a syriac code point
    Private Function AscToSyriac(ByVal ch As Char) As Integer


        ch = ch.ToUpper(ch)
        Dim i As Integer
        For i = 0 To AscArray.Length - 1
            If AscArray(i) = ch Then
                Return IntArray(i)
            End If
        Next i

        '-- If drops through
        Return AscW(ch)

    End Function

    '-- Take a syriac code point and return a sedra ascii character
    Private Function SyriacToAsc(ByVal ch As Integer) As Char

        Dim i As Integer
        For i = 0 To IntArray.Length - 1
            If IntArray(i) = ch Then
                Return AscArray(i)
            End If
        Next i

        '-- If drops through
        Return ChrW(ch)

    End Function

    Public Sub dumpUnicode(ByVal mystr As String)

        Dim i As Integer
        For i = 0 To mystr.Length - 1
            Dim lstr As String = mystr.Substring(i, 1)
            BottomBox.Text = BottomBox.Text & IIf(i = 0, "", vbCrLf) & "&H" & Hex(AscW(lstr))
        Next i

    End Sub

You work out what the hex code for the character is using charmap and then some way to map an integer containing that value (I specify these in hex, since that is what charmap gives me:  &H0701 = hex 0701) to an ASCII character.  In the above example, I have two arrays of characters, and use these to convert from one to the other.  Then internally I just process all the unicode characters as ASCII, and convert them back when the time comes to display.

Split a string separated by spaces and @'s :

Dim split As String() = mytext.Split(New [Char]() {" ", "@"})

StrReverse() works fine on the unicode string, although you may get a shock when it is displayed if the unicode character has more than one appearance depending on what is to the left and right of it (as is the case for Syriac, and Arabic and all these languages with characters joined together): this is one reason why you cannot fake RTL in the RichTextBox.

ChrW will convert your integer representation of a character back to a char containing the unicode symbol.  Display it in the Textbox.text using

Textbox.text = textbox.text & mychar & vbcrlf

If you use the TextBox to handle RTL, the strings will internally all be LTR, so you won't need to do anything special; the RTL will be transparent to your code. 


3.  Syriac automatic transcription utility

I have adapted the above example to take Syriac text right-to-left in the left hand box and transcribe it in the right-hand box left-to-right.  It requires the Meltho fonts for the left-hand window (using Serto Jerusalem) and the free Titus Cyberbit Basic font for the right hand one (only so that it can handle any untranslated Syriac codes and also handle shin - s with a circumflex over it).

3.1 Install

Install the utility (it's very simple) under XP by downloading and double-clicking on setup.msi. A more sophisticated version of this with a different interface can be downloaded as setupqs.msi

3.2 Source code

Here's the code (painting the form, assigning the font to each window, etc all being done by clicking items in the IDE)

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim i As Long
        Dim x As String
        Dim y As String
        x = TextBox1.Text
        y = ""

        Dim chararray As Char()
        chararray = x.ToCharArray()

        '-- Walk around string and process character in turn
        For i = 0 To chararray.Length - 1
            ' display each character in hex
            'MsgBox(Hex(AscW(chararray(i))))

            '-- handle underscore for unpronounced letters as brackets
            '-- look ahead one character to see if there is an underscore under the current character
            If i < chararray.Length - 1 Then
                If AscW(chararray(i + 1)) = &H331 Then
                    y = y + "("
                End If
            End If

            '-- Process current character
            y = y + cvt(AscW(chararray(i)))
        Next i

        RichTextBox1.Text = y
    End Sub

    Function cvt(ByRef c As Integer) As Char
        Select Case c
            Case &H710
                cvt = ChrW(&H2019)    '-- alap
            Case &H712
                cvt = "b"
            Case &H713
                cvt = "g"
            Case &H715
                cvt = "d"
            Case &H717
                cvt = "h"
            Case &H718
                cvt = "w"
            Case &H719
                cvt = "z"
            Case &H71A
                cvt = ChrW(&H1E25)    '-- het
            Case &H71B
                cvt = ChrW(&H1E6D)    '-- tet
            Case &H71D
                cvt = "y"
            Case &H71F
                cvt = "k"
            Case &H720
                cvt = "l"
            Case &H721
                cvt = "m"
            Case &H722
                cvt = "n"
            Case &H723
                cvt = "s"
            Case &H725
                cvt = ChrW(&H2018)    '-- ayin
            Case &H726
                cvt = "p"
            Case &H728
                cvt = ChrW(&H1E63)    '-- tsade
            Case &H729
                cvt = "q"
            Case &H72A
                cvt = "r"
            Case &H72B
                cvt = ChrW(&H161)    '-- shin
            Case &H72C
                cvt = "t"
            Case &H730, &H731
                cvt = "a"
            Case &H733, &H734
                cvt = ChrW(&H101)    '-- zqafa
            Case &H736, &H737
                cvt = "e"   '-- rhbasa
            Case &H73A, &H73B
                cvt = "i"   '-- hbasa
            Case &H73D, &H73E
                cvt = "u"   '-- esasa/usoso
            Case &H331
                cvt = ")"
            Case Else  '-- passthrough
                cvt = ChrW(c)
        End Select
    End Function
End Class

There is, thus, very little to it other than a bit of look-ahead for the underscore.

3.3 Complete VB.NET 2005 Project

Download syriactranscription.zip.


4.  Changing keyboard in your application

It is quite likely that you will only want to enter Syriac text in certain boxes in your application, while still using English menus and entering English text in other boxes.  Your user will get very tired very quickly of changing keyboard, so you must handle this for them.

Note:  I have been unable to get this to work with more than one Syriac keyboard installed.

I created a class that gave me two methods.  I ran keyboardCheck() when the program started, and stored the 'Original language' and the Syriac language keyboard id's.  My public string ErrorMessage was set if there was an error.  

Then I had two more methods: ActivateOriginalKeyboard() and ActivateSyriacKeyboard() which I called from GotFocus() events in my code (i.e. whenever a user clicked on a box to enter text, I called one of these).

Public Class clsPlatform

    Declare Function GetKeyboardLayoutList Lib "user32" (ByVal nBuff As Integer, ByRef lpList As Integer) As Integer
    Declare Function ActivateKeyboardLayout Lib "user32" (ByVal HKL As Long, ByVal flags As Integer) As Integer
    Declare Function GetLocaleInfo Lib "kernel32" Alias "GetLocaleInfoA" (ByVal Locale As Integer, ByVal LCType As Integer, ByVal lpLCData As String, ByVal cchData As Integer) As Integer
    Declare Function IsValidLocale Lib "kernel32" (ByVal Locale As Integer, ByVal dwFlags As Integer) As Integer
    Const LOCALE_SENGCOUNTRY As Long = &H1002 '// English name of country
    Const LOCALE_SLANGUAGE As Long = &H2  'localized name of language
    Const LCID_INSTALLED As Long = &H1 '-- is locale present?

    Declare Function GetKeyboardLayout Lib "user32" (ByVal dwLayout As Integer) As Integer
    Declare Function GetKeyboardLayoutName Lib "user32" Alias "GetKeyboardLayoutNameA" (ByVal pwszKLID As String) As Long

    Public ErrorMessage As String

    '-- Store these so can use when switching in editors
    Public OriginalKeyboardCode As Long = 0
    Public SyriacKeyboardCode As Long = 0


    Public Function KeyboardCheck() As ArrayList
        '-- Make sure Syriac installed.  We do not care which Syriac keyboard the user uses

        Dim rc As Long
        Dim i As Integer

        Dim lLayouts(50) As Integer
        Dim retval As New ArrayList
        Dim buf As String = "                                 "
        Dim layout As String
        Dim SyriacFound As Boolean = False

        ErrorMessage = ""

        'Save current configuration
        OriginalKeyboardCode = GetKeyboardLayout(0)
        rc = GetKeyboardLayoutName(buf)
        layout = buf.Substring(0, 8)
        Dim layoutCode As String = "&H" & layout  '-- a long
        Dim layoutName As String = getLocale(layoutCode, LOCALE_SLANGUAGE)
        If layoutName.Contains("English") = False Then
            MsgBox("Your keyboard is currently not set to English, but to " + layoutName _
            + ".  It will be reset to English.", MsgBoxStyle.Exclamation)
        End If

        'Get the first 50 supported keyboard layouts (50 is max supported for now)
        rc = GetKeyboardLayoutList(50, lLayouts(0))
        'Loop through all the keyboard layouts
        'Ignore the first one on 0 which is negative
        For i = 0 To UBound(lLayouts)
            If lLayouts(i) = 0 Then   '-- all entries beyond those installed are 0
                Exit For
            End If
            '--Activate the keyboard layout and get its name
            rc = ActivateKeyboardLayout(lLayouts(i), 0)
            rc = GetKeyboardLayoutName(buf)
            '-- This returns a long, i.e. 8 digits.  The first 4 are something else.
            '-- The second 4 are the locale id.
            '-- Note that 0,8 gives a long, but getLocaleInfo only takes an int
            layout = buf.Substring(4, 4)
            layoutCode = "&H" & layout
            If IsValidLocale(layoutCode, LCID_INSTALLED) = 0 Then
                MsgBox("invalid locale " + i.ToString + " " + layoutCode)
            End If
            layoutName = getLocale(layoutCode, LOCALE_SLANGUAGE)
            MsgBox(i.ToString + " " + Hex(lLayouts(i)) + vbCrLf + buf.Substring(0, 4) + " " + layoutCode + vbCrLf + layoutName)
            If layoutName.Contains("Syriac") Then
                SyriacKeyboardCode = lLayouts(i)
                SyriacFound = True
            End If
            If layoutName.Contains("English") And OriginalKeyboardCode = 0 Then
                OriginalKeyboardCode = lLayouts(i)
            End If
            retval.Add(layoutCode + ":" + layoutName)
        Next i

        'Restore current configuration
        ActivateOriginalKeyboard()

        If SyriacFound = False Then
            ErrorMessage = "The Syriac language and phonetic keyboard are not installed on your PC.  Please correct this."
        End If

        Return retval
    End Function

    Private Function getLocale(ByVal m_LocaleLCID As Long, ByVal reqInfo As Integer) As String
        Dim Buffer As String = "                                                              "
        If GetLocaleInfo(m_LocaleLCID, reqInfo, Buffer, Buffer.Length) = 0 Then
            MsgBox("Unable to get locale info")
        End If
        getLocale = StripNull(Buffer)
    End Function

    Private Function StripNull(ByVal StrIn As String) As String
        Dim nul As Long
        nul = InStr(StrIn, vbNullChar)
        Select Case nul
            Case Is > 1
                StripNull = Left$(StrIn, nul - 1)
            Case 1
                StripNull = ""
            Case 0
                StripNull = Trim$(StrIn)
            Case Else
                StripNull = StrIn
        End Select

    End Function

    Private Function LoWord(ByVal wParam As Long) As Integer
        If wParam And &H8000& Then
            LoWord = &H8000& Or (wParam And &H7FFF&)
        Else
            LoWord = wParam And &HFFFF&
        End If
    End Function

    Public Function ActivateOriginalKeyboard() As Long
        If OriginalKeyboardCode = 0 Then Return -1
        Return (ActivateKeyboardLayout(OriginalKeyboardCode, 0))
    End Function

    Public Function ActivateSyriacKeyboard() As Long
        If SyriacKeyboardCode = 0 Then Return -1
        Return (ActivateKeyboardLayout(SyriacKeyboardCode, 0))
    End Function

End Class

Another approach to this is to find out if a complex script is installed using the IsValidLocale function with the LCID_INSTALLED flag on any locale that requires complex script support, such as:

BOOL fComplexScripts = IsValidLocale(LANG_HEBREW, LCID_INSTALLED);

A table of language identifiers is below.  You can find the keyboard codes on your own machine by using regedit HKEY_CURRENT_USER\Keyboard Layout\ Preload and Substitutes.

Links

http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/VB_Controls/Q_21208243.html?query=Regional+Options&topics=94 http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/VB_Controls/Q_21409380.html http://custom.programming-in.net/articles/art9-2.asp?lib=user32.dll (reference for vb.net calls; what was a long in VB6 is an integer in VB.NET)
http://vbnet.mvps.org/index.html?code/locale/localecountry.htm  (stuff on country info) 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_238z.asp - table of language identifiers.


Useful Links

Constructive feedback is welcomed to Roger Pearse.

Written 30th August 2006.
Updated 28th December 2006 with key strokes for Syriac and transcription utility.
Updated 12th January 2007 with minimising language bar and changing keyboard in your code.

This page has been online since 30th August 2006.

Return to Roger Pearse's Pages