----------------------------------------------------------------------
--- Knud van Eeden --- 31 May 2009 - 09:15 am ------------------------
Computer: Editor: TSE: File: Format: ASCII: Unicode: Operation: Convert: How to save an ASCII file as a UTF-8 file?
---
A UTF-8 file is a file starting with the three bytes 0xEF, 0xBB, 0xBF
followed by zero or more characters (1, 2, 3 or 4 bytes per character).
---
In the special case of an ASCII character, 1 byte per character will be used,
that is the character itself, so in general nothing more should have to be done.
---
So to convert from ASCII to UTF-8:
1. -In this program the 3 bytes are inserted in the beginning of the file
===
Note:
The result has been tested for a few files only, and worked OK there.
1. -Creating or loading some ASCII file
2. -Then loading and saving the same ASCII file in TSE using this macro
3. -Then loading and saving the same ASCII file in Notepad in UTF-8 format
4. -The result was compared for differences
1. -Visually in TSE using the hex editor in a horizontally split window
with the two UTF-8 files created by Notepad and TSE loaded.
===
Steps: Overview:
1. -E.g. create the following program:
--- cut here: begin --------------------------------------------------
FORWARD PROC Main()
// --- MAIN --- //
STRING fileNameInputGS[255] = "" // global variable
STRING fileNameOutputGS[255] = "" // global variable
//
PROC Main()
IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameAsciiS = ", fileNameInputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF
IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameUnicodeS = ", fileNameOutputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF
PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( fileNameInputGS, fileNameOutputGS )
END
<F12> Main()
// --- LIBRARY --- //
// library: file: save: change: convert: ascii: to: unicode: utf8 <description></description> <version>1.0.0.0.7</version> (filenamemacro=savefiuu.s) [kn, ri, sa, 30-05-2009 21:07:17]
PROC PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( STRING fileNameAsciiS, STRING fileNameUtf8S )
// e.g. STRING fileNameInputGS[255] = "" // global variable
// e.g. STRING fileNameOutputGS[255] = "" // global variable
// e.g. //
// e.g. PROC Main()
// e.g. IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameAsciiS = ", fileNameInputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF
// e.g. IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameUnicodeS = ", fileNameOutputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF
// e.g. PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( fileNameInputGS, fileNameOutputGS )
// e.g. END
// e.g.
// e.g. <F12> Main()
//
// Method: only add a 3 byte marker (=BOM) in the beginning of the file
//
// *If* the given file is a pure ASCII file, then
// you only have to
//
// 1. insert a 3 byte marker (0xEF, 0xBB, 0xBF)
//
// 2. and because the ASCII characters are unchanged converted to
// their UTF-8 presentation, you do not have to do anything more
//
EditFile( fileNameAsciiS )
//
SaveAs( fileNameUtf8S, _OVERWRITE_ )
//
EditFile( fileNameUtf8S )
//
BegFile()
//
InsertText( Chr( 0xEF ), _INSERT_ )
//
InsertText( Chr( 0xBB ), _INSERT_ )
//
InsertText( Chr( 0xBF ), _INSERT_ )
//
SaveFile()
//
END
--- cut here: end ----------------------------------------------------
2. -Run the program
3. -Input the ASCII filename
4. -Input the UTF-8 filename in which to store the result
5. -Tested successfully on
Microsoft Windows XP Professional (service pack 3),
running
TSE v4.x
===
Book: see also:
===
Diagram: see also:
===
File: see also:
===
Help: see also:
===
Image: see also:
===
Internet: see also:
===
Podcast: see also:
===
Screencast: see also:
===
Table: see also:
===
Video: see also:
---
----------------------------------------------------------------------