Unicode from SQL to ASP - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Development Tools

ASP.NET

Unicode from SQL to ASP

Author

Topic

SamC
White Water Yakist

3467 Posts

Posted - 2003-05-08 : 16:00:52

I'm retrieving adVarWChar using ADO .Parameter

It seems to be getting the data correctly but I don't recall that ASP variables support UNICODE - or do they?

Seems to be that ADP character strings are 1 type only - 1 byte per character..

If I retrieve a Unicode string

strMyHeader = .Parameter("@MyHeader")

Is there some special treatment other than

Response.Write strMyHeader

to properly format the output as unicode characters?

Note: I'm using UTF-8 characterset in my HTML page, and static unicode characters display properly.

Sam

SamC
White Water Yakist

3467 Posts

Posted - 2003-05-08 : 17:47:55

I'm not at my workstation, but I think the answer may be:

Response.Write Server.URLEncode(strMyHeader)

Sam

SamC
White Water Yakist

3467 Posts

Posted - 2003-06-02 : 13:42:58

I found the answer to writing unicode properly from ASP.

.. codepage

Sam

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-06-02 : 13:53:02

Hey Sam, I am having a similar problem. What codepage property did you change? I tried Response.Codepage (not available in Win2k), Session.CodePage and the ASPCodePage Metabase property of the Application. None worked

Any helpful pointers?

Owais

SamC
White Water Yakist

3467 Posts

Posted - 2003-06-02 : 14:15:15

Well, first, you gotta know the codepage mapping values.

Then, I used

<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>

The 65001 is the codepage mapping for UNICODE. Of course, stored procedures using unicode parmeters must be NVARCHAR and the ADO must declare these parameters as adVarWChar.

I'd like to hear from you if you have any new discoveries using UNICODE (or other) languages. I'm still on my learning curve myself.

I'm using a meta tag declaring charset=UTF-8, which works very well for lots of languages. I'm happily surprised that Netscape 4.X browsers render the UTF-8 page (after an unexpected automatic refresh).

I thought Arnold got stuck at this point too. Arnold?

Sam

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-06-02 : 14:53:50

Thanx for the input Sam...actually the story goes something like this:

We have been developing bilingual web applications (English/Arabic), and so far the only way we have been able to get the Web Server to interpret Arabic characters coming from the database correctly is to set the default codepage of the Web Server OS to Arabic (CP 1256). Of course, that doesnt really make sense, what's the point of being able to install multiple codepages if you can't use more than one at a time? Web Server to Browser is not a problem, just DB Server to Web Server. So I was toying with all these settings to see if I could get it to work!

Incidently, when I started working with Visual Studio .NET, I found that ASP.Net automatically detected incoming Arabic characters and switched everything to Unicode including Codepage and Charset. Since then I have been trying to replicate the same behavior in classic ASP with no luck. It has to be some elusive setting somewhere!

Owais

SamC
White Water Yakist

3467 Posts

Posted - 2003-06-02 : 15:08:40

I'd suggest you try a test page which uses the UTF-8 on a test page with the 65001 codepage. UTF-8 Supports Arabic, Japanese and many other charactersets. There would be no need to select the proper codepage for a given characterset.

Sam

PS: I'd bet a SQLTEAM beer that .NET uses 65001 / UTF-8

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-06-03 : 14:30:40

Thanks for the tips, Sam. I changed the column datatypes to nvarchar, and the encoding of the .ASP file to UTF-8, and voila!! Everything is Unicode, and shows up perfectly in the browser. Now I just need to convert 200 column definitions and 300 stored procs

Thanx, pal!

Owais

SamC
White Water Yakist

3467 Posts

Posted - 2003-06-03 : 14:37:37

So your column definitions are VARCHAR NOW? And you can store Arabic in 1 byte character form? Hmm..

Converting columns to NVARCHAR will require more table space. UNICODE characters take from 1 to 6 bytes each. I'm not sure how SQL handles variable character lengths when allocating space..

Sam

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-06-03 : 14:47:14

Yeah, all columns are varchar currently, and it works fine. That's because the database collation is set to SQL_Latin_1256_CS_AI (which is Arabic - CP 1256). And unlike Japanese, Arabic has enough characters to fit within 255 slots

I dont think space will be a major issue here, dont have so much data. I just hope my boss doesn't ask me to develop German, Japanese, Korean, etc. versions

Owais

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-06-04 : 05:43:52

quote:

UNICODE characters take from 1 to 6 bytes each. I'm not sure how SQL handles variable character lengths when allocating space..

Not quite. Unicode has somewhat over a million codepoints -- numbers that can be given to a character. At present, around 100000 are assigned to characters. In order to store Unicode characters, there are several character encodings that can be used.
UTF-32 represents any codepoint as a 32-bit value (4 bytes). It is used in some APIs, but rarely in storage.
UTF-16 represents codepoints from U+0000 to U+FFFF as 16-bit values (2 bytes) and U+10000 to U+10FFFF as two 16-bit values (4 bytes). Data stored in SQL Server in nvarchar (etc) columns should be encoded as UTF-16.[1]
UTF-8 represents codepoints from U+00 to U+7F as 8-bit values (1 byte, maintaining ASCII compatibility) and other codepoints as between 2 and 6 bytes. UTF-8 is the preferred encoding for XML (and consequently XHTML).
Presumably, ASP should be able to handle the conversion from the UTF-16 encoding of nvarchar to UTF-8 on web pages automatically -- I'm not an ASP user, so I don't know.

[1] Actually, that's a bit misleading. As long as you work within the BMP (codepoints from U+0000 -- U+FFFF) you should be fine. However, SQL Server 2000 doesn't have any real concept of 4 byte character encodings, so it will treat UTF-16-encoded codepoints from U+10000 -- U+10FFFF as two characters, the first being in (what current versions of Unicode consider to be) the high surrogate range, the second in the low surrogate range.
This might not be as bad as it sounds: characters didn't start getting assigned to codepoints beyond the BMP until Unicode 3.1.
I don't whether the ASP libraries would correctly interpret UTF-16 surrogate pairs correctly or not.

Edited by - Arnold Fribble on 06/04/2003 06:20:00

SamC
White Water Yakist

3467 Posts

Posted - 2003-06-04 : 07:45:25

quote:
UTF-8 represents codepoints from U+00 to U+7F as 8-bit values (1 byte, maintaining ASCII compatibility) and other codepoints as between 2 and 6 bytes. UTF-8 is the preferred encoding for XML (and consequently XHTML).
Presumably, ASP should be able to handle the conversion from the UTF-16 encoding of nvarchar to UTF-8 on web pages automatically -- I'm not an ASP user, so I don't know.

Thanks for the correction Arnold. UTF-8 (not UNICODE) characters are 1 to 6 bytes each. I misquoted from this article.

quote:
This might not be as bad as it sounds: characters didn't start getting assigned to codepoints beyond the BMP until Unicode 3.1.
I don't whether the ASP libraries would correctly interpret UTF-16 surrogate pairs correctly or not.

Because ASP handles the mapping from SQL/UTF-16/NVARCHAR to the target page through the CODEPAGE=65001 declaration, the basis for translating from UTF-16 surrogate pairs to UTF-8 is present. It would be interesting to run a test.

Sam

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources