----------------------------------------------------------------------
--- Knud van Eeden --- 31 May 2009 - 09:15 am ------------------------

Computer: Editor: TSE: File: Format: ASCII: Unicode: Operation: Convert: How to save an ASCII file as a UTF-8 file?

---

A UTF-8 file is a file starting with the three bytes 0xEF, 0xBB, 0xBF
followed by zero or more characters (1, 2, 3 or 4 bytes per character).

---

In the special case of an ASCII character, 1 byte per character will be used,
that is the character itself, so in general nothing more should have to be done.

---

So to convert from ASCII to UTF-8:

 1. -In this program the 3 bytes are inserted in the beginning of the file

===

Note:

The result has been tested for a few files only, and worked OK there.

 1. -Creating or loading some ASCII file

 2. -Then loading and saving the same ASCII file in TSE using this macro

 3. -Then loading and saving the same ASCII file in Notepad in UTF-8 format

 4. -The result was compared for differences

     1. -Visually in TSE using the hex editor in a horizontally split window
         with the two UTF-8 files created by Notepad and TSE loaded.


===

Steps: Overview:

 1. -E.g. create the following program:

--- cut here: begin --------------------------------------------------
FORWARD PROC Main() // --- MAIN --- // STRING fileNameInputGS[255] = "" // global variable STRING fileNameOutputGS[255] = "" // global variable // PROC Main() IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameAsciiS = ", fileNameInputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameUnicodeS = ", fileNameOutputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( fileNameInputGS, fileNameOutputGS ) END <F12> Main() // --- LIBRARY --- // // library: file: save: change: convert: ascii: to: unicode: utf8 <description></description> <version>1.0.0.0.7</version> (filenamemacro=savefiuu.s) [kn, ri, sa, 30-05-2009 21:07:17] PROC PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( STRING fileNameAsciiS, STRING fileNameUtf8S ) // e.g. STRING fileNameInputGS[255] = "" // global variable // e.g. STRING fileNameOutputGS[255] = "" // global variable // e.g. // // e.g. PROC Main() // e.g. IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameAsciiS = ", fileNameInputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF // e.g. IF NOT AskFilename( "file: save: change: convert: save: ascii: to: unicode: fileNameUnicodeS = ", fileNameOutputGS, _DEFAULT_, _EDIT_HISTORY_ ) RETURN() ENDIF // e.g. PROCFileSaveChangeConvertAsciiToUnicodeUtf8B( fileNameInputGS, fileNameOutputGS ) // e.g. END // e.g. // e.g. <F12> Main() // // Method: only add a 3 byte marker (=BOM) in the beginning of the file // // *If* the given file is a pure ASCII file, then // you only have to // // 1. insert a 3 byte marker (0xEF, 0xBB, 0xBF) // // 2. and because the ASCII characters are unchanged converted to // their UTF-8 presentation, you do not have to do anything more // EditFile( fileNameAsciiS ) // SaveAs( fileNameUtf8S, _OVERWRITE_ ) // EditFile( fileNameUtf8S ) // BegFile() // InsertText( Chr( 0xEF ), _INSERT_ ) // InsertText( Chr( 0xBB ), _INSERT_ ) // InsertText( Chr( 0xBF ), _INSERT_ ) // SaveFile() // END
--- cut here: end ---------------------------------------------------- 2. -Run the program 3. -Input the ASCII filename 4. -Input the UTF-8 filename in which to store the result 5. -Tested successfully on Microsoft Windows XP Professional (service pack 3), running TSE v4.x === Book: see also: === Diagram: see also: === File: see also: === Help: see also: === Image: see also: === Internet: see also: === Podcast: see also: === Screencast: see also: === Table: see also: === Video: see also: --- ----------------------------------------------------------------------