Author |
Topic |
SamC
White Water Yakist
3467 Posts |
Posted - 2003-05-08 : 16:00:52
|
I'm retrieving adVarWChar using ADO .ParameterIt seems to be getting the data correctly but I don't recall that ASP variables support UNICODE - or do they? Seems to be that ADP character strings are 1 type only - 1 byte per character.. If I retrieve a Unicode stringstrMyHeader = .Parameter("@MyHeader")Is there some special treatment other thanResponse.Write strMyHeaderto properly format the output as unicode characters? Note: I'm using UTF-8 characterset in my HTML page, and static unicode characters display properly.Sam |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-05-08 : 17:47:55
|
I'm not at my workstation, but I think the answer may be:Response.Write Server.URLEncode(strMyHeader)Sam |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-06-02 : 13:42:58
|
I found the answer to writing unicode properly from ASP... codepageSam |
 |
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2003-06-02 : 13:53:02
|
Hey Sam, I am having a similar problem. What codepage property did you change? I tried Response.Codepage (not available in Win2k), Session.CodePage and the ASPCodePage Metabase property of the Application. None worked Any helpful pointers?Owais |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-06-02 : 14:15:15
|
Well, first, you gotta know the codepage mapping values.Then, I used<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> The 65001 is the codepage mapping for UNICODE. Of course, stored procedures using unicode parmeters must be NVARCHAR and the ADO must declare these parameters as adVarWChar.I'd like to hear from you if you have any new discoveries using UNICODE (or other) languages. I'm still on my learning curve myself.I'm using a meta tag declaring charset=UTF-8, which works very well for lots of languages. I'm happily surprised that Netscape 4.X browsers render the UTF-8 page (after an unexpected automatic refresh).I thought Arnold got stuck at this point too. Arnold?Sam |
 |
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2003-06-02 : 14:53:50
|
Thanx for the input Sam...actually the story goes something like this:We have been developing bilingual web applications (English/Arabic), and so far the only way we have been able to get the Web Server to interpret Arabic characters coming from the database correctly is to set the default codepage of the Web Server OS to Arabic (CP 1256). Of course, that doesnt really make sense, what's the point of being able to install multiple codepages if you can't use more than one at a time? Web Server to Browser is not a problem, just DB Server to Web Server. So I was toying with all these settings to see if I could get it to work!Incidently, when I started working with Visual Studio .NET, I found that ASP.Net automatically detected incoming Arabic characters and switched everything to Unicode including Codepage and Charset. Since then I have been trying to replicate the same behavior in classic ASP with no luck. It has to be some elusive setting somewhere!Owais |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-06-02 : 15:08:40
|
I'd suggest you try a test page which uses the UTF-8 on a test page with the 65001 codepage. UTF-8 Supports Arabic, Japanese and many other charactersets. There would be no need to select the proper codepage for a given characterset.SamPS: I'd bet a SQLTEAM beer that .NET uses 65001 / UTF-8 |
 |
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2003-06-03 : 14:30:40
|
Thanks for the tips, Sam. I changed the column datatypes to nvarchar, and the encoding of the .ASP file to UTF-8, and voila!! Everything is Unicode, and shows up perfectly in the browser. Now I just need to convert 200 column definitions and 300 stored procs Thanx, pal!Owais |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-06-03 : 14:37:37
|
So your column definitions are VARCHAR NOW? And you can store Arabic in 1 byte character form? Hmm..Converting columns to NVARCHAR will require more table space. UNICODE characters take from 1 to 6 bytes each. I'm not sure how SQL handles variable character lengths when allocating space..Sam |
 |
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2003-06-03 : 14:47:14
|
Yeah, all columns are varchar currently, and it works fine. That's because the database collation is set to SQL_Latin_1256_CS_AI (which is Arabic - CP 1256). And unlike Japanese, Arabic has enough characters to fit within 255 slots I dont think space will be a major issue here, dont have so much data. I just hope my boss doesn't ask me to develop German, Japanese, Korean, etc. versions Owais |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2003-06-04 : 05:43:52
|
quote: UNICODE characters take from 1 to 6 bytes each. I'm not sure how SQL handles variable character lengths when allocating space..
Not quite. Unicode has somewhat over a million codepoints -- numbers that can be given to a character. At present, around 100000 are assigned to characters. In order to store Unicode characters, there are several character encodings that can be used.UTF-32 represents any codepoint as a 32-bit value (4 bytes). It is used in some APIs, but rarely in storage.UTF-16 represents codepoints from U+0000 to U+FFFF as 16-bit values (2 bytes) and U+10000 to U+10FFFF as two 16-bit values (4 bytes). Data stored in SQL Server in nvarchar (etc) columns should be encoded as UTF-16.[1]UTF-8 represents codepoints from U+00 to U+7F as 8-bit values (1 byte, maintaining ASCII compatibility) and other codepoints as between 2 and 6 bytes. UTF-8 is the preferred encoding for XML (and consequently XHTML).Presumably, ASP should be able to handle the conversion from the UTF-16 encoding of nvarchar to UTF-8 on web pages automatically -- I'm not an ASP user, so I don't know.[1] Actually, that's a bit misleading. As long as you work within the BMP (codepoints from U+0000 -- U+FFFF) you should be fine. However, SQL Server 2000 doesn't have any real concept of 4 byte character encodings, so it will treat UTF-16-encoded codepoints from U+10000 -- U+10FFFF as two characters, the first being in (what current versions of Unicode consider to be) the high surrogate range, the second in the low surrogate range.This might not be as bad as it sounds: characters didn't start getting assigned to codepoints beyond the BMP until Unicode 3.1.I don't whether the ASP libraries would correctly interpret UTF-16 surrogate pairs correctly or not.Edited by - Arnold Fribble on 06/04/2003 06:20:00 |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-06-04 : 07:45:25
|
quote: UTF-8 represents codepoints from U+00 to U+7F as 8-bit values (1 byte, maintaining ASCII compatibility) and other codepoints as between 2 and 6 bytes. UTF-8 is the preferred encoding for XML (and consequently XHTML).Presumably, ASP should be able to handle the conversion from the UTF-16 encoding of nvarchar to UTF-8 on web pages automatically -- I'm not an ASP user, so I don't know.
Thanks for the correction Arnold. UTF-8 (not UNICODE) characters are 1 to 6 bytes each. I misquoted from this article. quote: This might not be as bad as it sounds: characters didn't start getting assigned to codepoints beyond the BMP until Unicode 3.1.I don't whether the ASP libraries would correctly interpret UTF-16 surrogate pairs correctly or not.
Because ASP handles the mapping from SQL/UTF-16/NVARCHAR to the target page through the CODEPAGE=65001 declaration, the basis for translating from UTF-16 surrogate pairs to UTF-8 is present. It would be interesting to run a test.Sam |
 |
|
|