Исходники.RU - Информационный сервер для программистов: исходники со всего света.

MFC UNICODE Support ??


Raja Segar -- rsz@pc.jaring.my
Monday, November 25, 1996

Environment : VC++4.2b Patched , Win95, NT4.0

Hi there ...
 I am confused about this UNICODE development support in MFC.
 As usual i have a few questions ?

 1). Do i have to use the macros in TCHAR.H and put __TEXT in front of
string literals and use
     TCHAR is instead of char .. etc  or do i just code for ANSI and MFC
takes care
     of the rest if i got UNICODE defined ? BTW what are the guidelines to
follow ?

 2). Do i have to do something like the following in my source code to
support UNICODE?
       #ifdef _UNICODE
           x = wcslen(lpMyString);
       #else
           x = strlen( lpMyString );
       #endif       

 2). Is there any thing to look out for while doing code that compiles on
both UNICODE
     and ANSI ?
    
 3). Suppose i want to have just a single EXE to distribute to my WIN95
     and NT customers. Should i choose the one compiled with UNICODE or ANSI?
     If i did choose the UNICODE version will the points mentioned below apply ?
      a) Win95 will be doing on the fly conversion to ANSI and back while
running the code.
      b) On NT the code would fly all the way.

 4). The UNICODE version will take up almost double the string space? 
 
 5). Someone told me if I compiled with UNICODE, at run time the program
will use either the ANSI
     version or the wide character version depending on the host operating
system. I really doubt this.
     Is it true ?   
 
 Hopefully someone can shed some light on this matter.
 Any help would be greatly appreciated.
 Thanks in advance.
 Bye
 
 (  _ \/ __)(_   )
  )   /\__ \ / /_ 
 (_)\_)(___/(____)@pc.jaring.my

Mike Blaszczak -- mikeblas@nwlink.com
Monday, November 25, 1996

[Mini-digest: 6 responses]

At 19:46 11/25/96 +0800, Raja Segar wrote:
>Environment : VC++4.2b Patched , Win95, NT4.0

> 1). Do i have to use the macros in TCHAR.H and put __TEXT in front of
>string literals and use
>     TCHAR is instead of char .. etc  or do i just code for ANSI and MFC
>takes care
>     of the rest if i got UNICODE defined ? BTW what are the guidelines to
>follow ?

You have to code expecting UNICODE.  Using TCHAR is one way to  do it.
A program that has been built for UNICODE or ANSI can use both UNICODE and
ANSI strings, if it needs to.

If you want to get an ANSI string, just use quotes:

        "here is my string literal"

if you want to get a UNICODE string, use the L prefix:

        L"here is my UNICODE string literal"

If you want to have a UNICODE string when you compile with _UNICODE defined
and an ANSI string otherwise, you should use the _T() macro:

        _T("here is a string which switches")

Sometimes, even when you make an ANSI program, you need to create a UNICODE
string.  OLE likes to have UNICODE strings, for example, even if you've built
for ANSI.

> 2). Do i have to do something like the following in my source code to
>support UNICODE?
>       #ifdef _UNICODE
>           x = wcslen(lpMyString);
>       #else
>           x = strlen( lpMyString );
>       #endif       

This would work, but you don't have to do it.  You could just use

        x = _tcslen(lpMyString);

and the compiler would automatically use a UNICODE-compatible function if
UNICODE was defined and an MBCS-compatible function if it wasn't.  _tcslen()
isn't a function; it's actually a macro that's defined in TCHAR.H.

> 2). Is there any thing to look out for while doing code that compiles on
>both UNICODE
>     and ANSI ?

This question is too vague.

> 3). Suppose i want to have just a single EXE to distribute to my WIN95
>     and NT customers. Should i choose the one compiled with UNICODE or ANSI?

Something compiled for UNICODE won't work on Windows 95, period.

> 4). The UNICODE version will take up almost double the string space? 

Yes.

> 5). Someone told me if I compiled with UNICODE, at run time the program
>will use either the ANSI
>     version or the wide character version depending on the host operating
>system. I really doubt this.
>     Is it true ?   

No, it is not true.

My book has an appendix which describes most of this stuff.

.B ekiM
http://www.nwlink.com/~mikeblas/
I'm afraid I've become some sort of speed freak.
These words are my own. I do not speak on behalf of Microsoft.

-----From: "Claus Michelsen" 

Dear Raja,

1) Yes you should use _T(""), TCHAR, LPTSTR, etc. instead of using normal
string literals, char, and char*. These will compile correctly in both the
UNICODE or MBCS version. The CString class will behave correctly in both
versions.
2) You can use macros that will compile to the correct function in all
versions of your application (you would use _tcslen in the example you
mention). See "Routine Mappings" in the documentation for a list of all
these mappings for C runtime functions. The Win32 functions use the same
principle (e.g.  the TextOut API exist in a TextOutA and TextOutW version,
but by simply using the TextOut "macro" you get the correct API function in
the different versions of your application).
2a) Just remember to consistenly use the TCHAR and similar definitions.
This also means that you can not expect a character to be of a specific
size (ansi is 1 byte, mbcs is 1 OR 2, unicode is always 2). You should also
take great care of how you do string manipulation since there are many
pitfalls here. You can find a good description of the pitfalls in the book
"Developing International Software for Window 95 and Windows NT" from
Microsoft Press (great book).
3) There is limited unicode support under Win95 so you will have to call
MultiByteToWideChar and WideCharToMultiByte to convert between ansi and
unicode. An MBCS version will work under both operating systems but with
the problems that it has (codepages). Please note that most unicode
versions of the API functions (post-fixed with "W") are stubbed under Win95
and will always return success but don't actually do anything.
4) Yes, if compared to ansi (not mbcs).
5) No, that is not true. If it is compiled with UNICODE it will call the
unicode versions of the API function, etc.

I hope this helps you.

Best Regards
Claus Michelsen

-----From: Mario Contestabile

> 1). Do i have to use the macros in TCHAR.H and put __TEXT in front of
>string literals and use
>     TCHAR is instead of char .. etc  or do i just code for ANSI and MFC
>takes care
>     of the rest if i got UNICODE defined ? BTW what are the guidelines to
>follow ?

You want to build the same source as two different binaries, one ANSI and one 
UNICODE,
with little or no changes, except for #defining _UNICODE. 
Use TCHAR instead of char, and put strings inside the _T("") macro.

> 2). Do i have to do something like the following in my source code to
>support UNICODE?
>      #ifdef _UNICODE
>           x = wcslen(lpMyString);
>       #else
>           x = strlen( lpMyString );
>       #endif       

All strxxx routines have an equivalent tcsxxx routine.
In this case you should use _tcslen, and remove the #ifdefs.

> 2). Is there any thing to look out for while doing code that compiles on
>both UNICODE
>     and ANSI ?

OLE always uses UNICODE.
Tooltips (TTN_NEEDTEXT message) needs ANSI and UNICODE, even on 95.

> 3). Suppose i want to have just a single EXE to distribute to my WIN95
>     and NT customers. Should i choose the one compiled with UNICODE or ANSI?
>     If i did choose the UNICODE version will the points mentioned below apply 
?
>      a) Win95 will be doing on the fly conversion to ANSI and back while
>running the code.
>      b) On NT the code would fly all the way.

Distribute 2 binaries. If you must distribute one, it must be the ANSI version, 
95 will
not run the UNICODE version. NT will run the ANSI version, but convert all the 
strings
to UNICODE for you, at a small price.

> 4). The UNICODE version will take up almost double the string space? 

The executable will be bigger.

 > 5). Someone told me if I compiled with UNICODE, at run time the program
>will use either the ANSI
>     version or the wide character version depending on the host operating
>system. I really doubt this.
>     Is it true ?   

See 3 above.

mcontest@universal.com

-----From: Pradeep Tapadiya 

At 07:46 PM 11/25/96 +0800, you wrote:
>Environment : VC++4.2b Patched , Win95, NT4.0
>
>Hi there ...
> I am confused about this UNICODE development support in MFC.
> As usual i have a few questions ?
>
> 1). Do i have to use the macros in TCHAR.H and put __TEXT in front of
>string literals and use
>     TCHAR is instead of char .. etc  or do i just code for ANSI and MFC
>takes care

This is correct.

Instead of using TEXT(lpMyString), you can also use _T(lpMyString).

>     of the rest if i got UNICODE defined ? BTW what are the guidelines to
>follow ?
>
> 2). Do i have to do something like the following in my source code to
>support UNICODE?
>       #ifdef _UNICODE
>           x = wcslen(lpMyString);
>       #else
>           x = strlen( lpMyString );
>       #endif       

For each function that has two different equivalents
for unicode/non-unicode, there is a macro defined (look at ).
In the above case, you will just need
   x = tcslen(lpMyString)

>
> 2). Is there any thing to look out for while doing code that compiles on
>both UNICODE
>     and ANSI ?

Watch out for items that are really supposed to be char or wchar_t
and NOT TCHAR. For example, COM API requires Unicoded strings as parameters.
If you use any such API, you will have to explictly convert between
ANSI and UNICODE strings. 

If you opt to use TCHAR, remember to run nmake on both UNICODE as well as
ANSI version (I have seen programmers have a tendency to compile just the
ANSI version and take the UNICODE version for granted). Pay close attention
to the compiler warnings.

For unicode version, define both of the following macros in project
settings.

UNICODE
_UNICODE

If you are using MFC all the way, defining just the first macro
is enough. MFC defines the other macro for you.

>    
> 3). Suppose i want to have just a single EXE to distribute to my WIN95
>     and NT customers. Should i choose the one compiled with UNICODE or ANSI?

It all depends on the intended use of your product. If you want your
product to be used internatinally, you will need to ship the UNICODE version.
(There is more to internationalization that just using UNICODE macro. Look
into MSDEV help on this topic).

>     If i did choose the UNICODE version will the points mentioned below
apply ?
>      a) Win95 will be doing on the fly conversion to ANSI and back while
>running the code.
>      b) On NT the code would fly all the way.

There is no "on-the-fly" conversion on any platform for any version. 
When compiling, depending on the UNICODE setting, your functions get
mapped to the appropriate function. As a matter of fact, you can
actually use (if you wish to) Unicoded versions of functions from 
within your ANSI version (and vice-versa).

>
> 4). The UNICODE version will take up almost double the string space? 

Yes. However, you typically store all your strings in a resource file
(helps in internationlization). These strings get loaded on demand.
Therefore, though you .exe filesize could be bigger, it doesn't imply
a similar memory footprint.

> 
> 5). Someone told me if I compiled with UNICODE, at run time the program
>will use either the ANSI
>     version or the wide character version depending on the host operating
>system. I really doubt this.
>     Is it true ?   

NO.

Just to illustrate, you have both strlen and wcslen available to you
under both UNICODE as well as ANSI platform. You can use them directly
if you wish to under any platform. If you use tcslen, it appropirately
gets mapped to strlen or wcslen DURING COMPILE TIME.

Hope this helps.

Pradeep

-----From: Dave Ryan 

Please not there are times when you have to use #ifdef UNICODE. However =
there are macros and functions for most of the char, w_char types. For =
example wcslen there exist a macro _tcslen. Yes you must use TCHAR and =
_T("") macros. These macros expand to either char or w_char types =
depending if you define _UNICODE or _MBCS in your flags section. If =
developing an application for win95 and winnt you either have the option =
of building two release builds (one UNICODE for NT and MBCS for win95) =
or develope the application with ansi MBCS which will run on win95 and =
nt. The main difference would be that NT would convert your ANSI calls =
to UNICODE and then call the wide versions of the functions. Note that =
ansi function names end with A and UNICODE functions end with a W. The =
compiler and the operating system take care of calling the correct =
function or making the conversion on NT. Also note there are wide string =
types in command line parameters as well. A smart move would be to =
ensure your application will compile as UNICODE or MBCS using the macros =
and whatever code neccessary to convert when needed. Then you have the =
option of delivering either releases at the end. Note Microsoft has a =
excellent article noting major pitfalls when programming for both =
operating systems. It's available on www.microsoft.com/win32dev web =
site. You'll also find good info on unicode. If you design properly you =
can save the decisions to ship time.

-----From: "George V. Reilly" 

> 1). Do i have to use the macros in TCHAR.H and put __TEXT in
> front of string literals and use TCHAR is instead of char
> .. etc or do i just code for ANSI and MFC takes care of the
> rest if i got UNICODE defined ? BTW what are the guidelines to
> follow ?

Surround all your string and character constants with _T() or
_TEXT(); e.g., _T('\n'), _T("Hello, World!").  I think _T is
preferred nowadays; it's certainly shorter.

> 2). Do i have to do something like the following in my source
> code to support UNICODE?
>       #ifdef _UNICODE
>           x = wcslen(lpMyString);
>       #else
>           x = strlen( lpMyString );
>       #endif       

No, use the _tcs* macros in tchar.h instead, making this
	x = _tcslen(lpMyString).

Note: If you use CString, it automatically takes care of most of
this stuff for you.

> 2). Is there any thing to look out for while doing code that
> compiles on both UNICODE and ANSI ?

Some APIs work in terms of characters, some in bytes.  A character
occupies one byte in ANSI, but two in UNICODE (and one-and-a-bit, on
average, in MBCS, but let's not get into that).  You often need to
multiply by or divide by sizeof(TCHAR); e.g., this fragment will work
correctly with both ANSI and UNICODE.
	dwCount = ::ExpandEnvironmentStrings((LPCTSTR) pbData, NULL, 0);
	ptszExpanded = (LPTSTR) _alloca(dwCount * sizeof(TCHAR));
	::ExpandEnvironmentStrings((LPCTSTR) pbData, ptszExpanded, dwCount);

Be sure to build both ANSI and UNICODE versions of your program.  The
compiler will catch some of your mistakes, but not all.  It wouldn't
notice a failure to multiply by sizeof(TCHAR) in the call to _alloca
above, for example.  You really need to look at every bit of your
code that deals with strings and characters with a sceptical eye.
Should this be TCHAR, wchar_t, WCHAR, char, or CHAR?  Should I be
using wcscpy, strcpy, or _tcscpy here?  Sometimes you will want to
explicitly deal with ANSI chars (reading from a file, perhaps) or
with WCHARs (OLE stuff).

> 3). Suppose i want to have just a single EXE to distribute to my
> WIN95 and NT customers. Should i choose the one compiled with
> UNICODE or ANSI? If i did choose the UNICODE version will the
> points mentioned below apply ?
>      a) Win95 will be doing on the fly conversion to ANSI and back
>         while running the code.
>      b) On NT the code would fly all the way.

Win95 doesn't have Unicode support built in.  If you look in win*.h,
you'll see that ExpandEnvironmentStrings is really a macro that's
defined to be either ExpandEnvironmentStringsW (if _UNICODE is
defined) or ExpandEnvironmentStringsA.  Similarly for all of the
other APIs that are defined as taking TCHAR, LPTSTR, or LPCTSTR
arguments.  On Win95, the wide-character versions are stubs that
return error codes.  The wcs* functions, on the other hand, live in
the C runtime library and are available on both Win95 and NT.  This
is useful because OLE is the one big exception to the rule.  The
32-bit version of OLE2 uses wide characters on both Win95 and NT.

If you need to build an application that runs on both Win95 and NT,
use ANSI.  If it runs only on NT, you can use UNICODE.

> 4). The UNICODE version will take up almost double the string space? 

All string resources in your DLLs and EXEs are UNICODE anyway, but
using UNICODE will double the size of any strings you allocate in
your program.  It should make your program faster on NT however,
because you'll skip the layer in the Win32 APIs that converts all
ANSI strings into UNICODE.

> 5). Someone told me if I compiled with UNICODE, at run time the
> program will use either the ANSI version or the wide character
> version depending on the host operating system. I really doubt
> this. Is it true ?

Not in my understanding.

Look in the VC documentation and MSDN and you'll find various
pieces describing TCHARs.  It'll all start to make sense after a
while.
-- 
/George V. Reilly      
MicroCrafts, Inc., 17371 NE 67th Ct #205, Redmond, WA 98052, USA.
Tel: 206/250-0014  Fax: 250-0100  Web: http://www.microcrafts.com
Vim 4 (vi clone) for NT & Windows 95: http://www.halcyon.com/gvr/
pgp fingerprint: e2 b4 83 64 11 52 21 ea  bf d8 51 c2 11 00 78 fc


TA -- siemens@inet.uni-c.dk
Wednesday, November 27, 1996

[Mini-digest: 2 responses]

> From: Raja Segar 
> To: mfc-l@netcom.com
> Subject: MFC UNICODE Support ??
> Date: 25. november 1996 12:46
> 
> Environment : VC++4.2b Patched , Win95, NT4.0
> 
> Hi there ...
>  I am confused about this UNICODE development support in MFC.
>  As usual i have a few questions ?
> 
>  1). Do i have to use the macros in TCHAR.H and put __TEXT in front of
> string literals and use
>      TCHAR is instead of char .. etc  or do i just code for ANSI and MFC
> takes care
>      of the rest if i got UNICODE defined ? BTW what are the guidelines
to
> follow ?

Yes, you have to enclose literal strings in the macro _TEXT("")(or _T("")
which is shorthand for the same and saves you some typing). The UNICODE
definition is, of course, only handled by MFC if you use its classes for
string manip., see next answer.
 
>  2). Do i have to do something like the following in my source code to
> support UNICODE?
>        #ifdef _UNICODE
>            x = wcslen(lpMyString);
>        #else
>            x = strlen( lpMyString );

That depends; if you choose to use the CString class for your strings, this
will be taken care of automatically. Using the CString class gives very
little overhead in exchange for the power that lies in it. However, if you
choose to manipulate strings on your own, use the _tcs...() and related
functions which are generic string functions that maps to any of the char.
sets MBCS,SBCS(ANSI) and UNICODE. Look them up in the online doc's.

>        #endif       

> 
>  2). Is there any thing to look out for while doing code that compiles on
> both UNICODE
>      and ANSI ?

Don't know for sure, but I don't think so (other than what I stated in the
previous answers.

>  3). Suppose i want to have just a single EXE to distribute to my WIN95
>      and NT customers. Should i choose the one compiled with UNICODE or
ANSI?
>      If i did choose the UNICODE version will the points mentioned below
apply ?
>       a) Win95 will be doing on the fly conversion to ANSI and back while
> running the code.
>       b) On NT the code would fly all the way.

As far as I know, Win95 doesn't even support UNICODE. I myself use MBCS and
neither UNICODE nor ANSI.
 
>  4). The UNICODE version will take up almost double the string space? 
Yes.
  
>  5). Someone told me if I compiled with UNICODE, at run time the program
> will use either the ANSI
>      version or the wide character version depending on the host
operating
> system. I really doubt this.
>      Is it true ?   

I doubt that too.

>  Hopefully someone can shed some light on this matter.
>  Any help would be greatly appreciated.
>  Thanks in advance.
>  Bye
>  
>  (  _ \/ __)(_   )
>   )   /\__ \ / /_ 
>  (_)\_)(___/(____)@pc.jaring.my
> 

Mike Thomas Jakobsen, 
Siemens A/S, department of engineering
Siemens@inet.uni-c.dk
Siemens A/S
Borupvang 3
DK-2750 Ballerup
+45 4477 4477
-----From: Tom Allen 

1) You definitely don't just code for ANSI and expect MFC to take care =
of the rest.  The proper way to enable MFC to handle ANSI and UNICODE =
reliably is through consistent use of TCHAR, the _TEXT() macro for =
character literal comparisons/assignments, and the _tcsxxx functions =
defined in TCHAR.H. Since UNICODE specifies that all characters are =
16-bits wide, you cannot use the basic 8-bit 'char' datatype in a =
UNICODE application - your array indexing and pointer arithmetic on =
character strings would be incorrect.  Consistently using TCHAR instead =
of CHAR and _T() or _TEXT() in literal comparisons and assignments =
insures portability for ANSI and UNICODE. =20

2a) Rather than clutter your code with #ifdefs, simply use the _tcsxxx =
equivalent functions from TCHAR.H.  If you are compiling _UNICODE, the =
compiler will do the work for you.

2b) Not if you follow the rules precisely and avoid making any =
assumptions about the size of a character.

4) Yes, since all character storage would be 16-bit characters.

There is an excellent backgrounder/white paper in the MSDN Library =
entitled "Multibyte Character Set (MBCS) Survival Guide" that explains =
TCHAR usage and many other considerations for dealing with extended =
character sets.

| Вернуться в корень Архива |