I've been researching Hebrew on the web for several months. A friend of mine at Hebrew College asked me to look at several URLs and figure out what he could do to put things online that would be equally accessible under Hebrew-enabled Macs or PCs (or, for that matter, Linux, Unix, whatever).
As folks who have done this for years know, this is messy. There are two general standards, the Windows way (charset=Windows-1255), and the supposedly standard way, (charset=iso-8859-8). If you are encoding your pages straight UTF-8, you also take advantage of Unicode.
Last year I did some tests with my friend Jack Woehr and we discovered that if you really write Unicode, Hebrew displays fine on Mac and PC using utf-8. This year I got a quick project to get some Hebrew up on the web for "we are the future" and jumped in to see if I could find something simple. The results mostly work on PC, but there are some issues on the Mac, under OS X, using Safari.
The main text flows correctly - if you try to read this using Safari under OS X, you won't at first notice any problems. Then, you'll note that the ambiguous characters (glyphs such as punctuation that could be placed differently depending on the language the browser things is the base for the current paragraph).
Bidirectionality is a messy subject. The problem is that when you tell a browser that you are using UTF-8, for instance, it is easy for the browser not to be sure where to put punctuation marks: if the base language were English, say, then periods go on one side of the sentence display. If the base language is Hebrew, the opposite is true. On the pages I did for "We are the future" (see, for instance, www.wearethefuture.com/he/concert_about.html, everything looks fine on a PC using IE or Mozilla. But, fire up OS X and take a look with Safari and you see hyphens and periods that are placed wrong. Not fatal - at least the text flows nicely from right to left as it should - but not what I want.
Here is what I used:
- In the head of each document, I noted utf-8: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
- Paragraphs got styled thus: <p align="right" lang="HE" dir="RTL">. (Normally I'd put this in a style sheet, but this was a drop-in and I didn't want to isolate the pages I worked on from the general style sheet and any global changes. And the job was too fast to comfortably isolate the items in a local style sheet.)
- The Hebrew characters were encoded using some Microsoft? characterset, using the range (aleph to taf) &1488; through &1514;. I got this by saving a Word document containing Hebrew to HTML. I used Word 2000, under Windows NT, with Hebrew resources installed. (I don't think that what was saved is Unicode, which I recall having a different hex offset that is much larger - but I'm new to this, I could be entirely wrong.)
שלום
your page www.wearethefuture.com/he/concert_about.html, looks great on my safari 2.0 under Mac OS 10.4 ie Tiger with text encoding to utf-8
patrick iglesias-zemmour
iglesias@math.huji.ac.il
Thanks!