files:encoding

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
files:encoding [2015/01/21 01:31] – [If there are problems] adminfiles:encoding [2018/10/04 17:14] (current) – external edit 127.0.0.1
Line 3: Line 3:
 ===== Preconditions ===== ===== Preconditions =====
  
-  - File can have one encoding (same as code page). Encoding can be as unicode ( UTF16 LE, BE (1200, 1201), UTF8 (65000) ) as not Unicode (for example 252 (Western European) etc). +  - The file can have one encoding (same as code page). Encoding can be Unicode ( UTF16 LE, BE (1200, 1201), UTF8 (65000) ) and not Unicode (for example 252 (Western European) etc). 
   - There are several places, where encoding conversion can be applied to document: Open, Save As, New, Search and Replace   - There are several places, where encoding conversion can be applied to document: Open, Save As, New, Search and Replace
-  - The encoding can be selected/changed inFile Open/Save dialog, via context menu or status bar, in Project Settings, in Tools->Options->Document settings, in syntax specification (here you can set as preferred encoding, as forced encoding). In addition to this HippoEDIT does an auto detection of the encoding using different algorithms (Check BOM bytes, statistics test for UTF16 LE/BE, statistics test for UTF8, check by encoding strings and same checks as IE uses).  +  - The encoding can be selected/changed in File Open/Save dialog, via context menu or status bar, in Project Settings, in Tools->Options->Document settings, in syntax specification (here you can set as preferred encoding, as forced encoding). In addition to this HippoEDIT does an auto-detection of the encoding using different algorithms (Check BOM bytes, statistics test for UTF16 LE/BE, statistics test for UTF8, check by encoding strings and same checks as IE uses).  
-  - If encoding for document once changed by user, this preference has priority over all the rest of settings. Preferences is machine specificbut can be reseted, if HippoEDIT temp files would be deleted or format of them would change in new version.+  - If encoding for document once changed by the user, this preference has priority over all the rest of settings. Preferences are machine specific but can be reset, if HippoEDIT temp files would be deleted or format of them would change in the new version.
  
 So, how all this works together (or designed to work  ) : So, how all this works together (or designed to work  ) :
  
 ===== New File ==== ===== New File ====
-For the new File encoding selection (if setting is not defined, or set to Automatic next taken).+For the new File encoding selection (if the setting is not defined, or set to Automatic next taken).
   * Syntax force encoding   * Syntax force encoding
   * Current Project settings encoding   * Current Project settings encoding
Line 22: Line 22:
   * Syntax force encoding   * Syntax force encoding
   * Encoding selected in File Open dialog   * Encoding selected in File Open dialog
-  * Auto-detected encoding with usage of all algorithms mentioned above. Set of applied algorithms can be changed in settings.xml+  * Auto-detected encoding with the usage of all algorithms mentioned above. Set of applied algorithms can be changed in settings.xml
   * Syntax preferred encoding   * Syntax preferred encoding
   * Project settings encoding taken   * Project settings encoding taken
Line 32: Line 32:
   * Encoding selected in File Save dialog   * Encoding selected in File Save dialog
   * Current document encoding   * Current document encoding
-  * During save, HippoEDIT checks consistence of current document encoding and encoding found with encoding strings (XML, HTML etc). If encoding does not match, user would be asked to select which encoding to use +  * During save, HippoEDIT checks the consistency of current document encoding and encoding found with encoding strings (XML, HTML etc). If encoding does not match, the user would be asked to select which encoding to use 
-  * Because HippoEDIT internally works with Unicode representation of text (UTF16 LE), on save, can happen that current text could not be saved without lost of information with currently selected encoding. In this case HippoEDIT should pop-up a warning, informing user about possible data loss and suggest to save document as Unicode or using some another encoding. This behavior controlled by flag Check encoding accuracy in Tools->Options->Formatting+  * Because HippoEDIT internally works with Unicode representation of text (UTF16 LE), on saving, can happen that current text could not be saved without loss of information with currently selected encoding. In this caseHippoEDIT should pop-up a warning, informing the user about possible data loss and suggest to save the document as Unicode or using some another encoding. This behavior controlled by flag Check encoding accuracy in Tools->Options->Formatting
  
 ===== Search and Replace ===== ===== Search and Replace =====
-Search and Replace encoding uses same logic as for Open/Save file, just interactive selection of encoding, with Open/Save dialog, not available.+Search and Replace encoding uses same logic as for Open/Save file, but interactive selection of encoding, with Open/Save dialog, not available.
  
 ===== If there are problems ===== ===== If there are problems =====
  
 So, if you see that documents are open with wrong encoding, you have several choices of how to solve this: So, if you see that documents are open with wrong encoding, you have several choices of how to solve this:
-  Explicitly select correct encoding in File Open dialog +  Explicitly select correct encoding in File Open dialog 
-  Set, for syntax you are using, forced encoding:  +  Set, for syntax you are using, forced encoding in SPECIFICATION section of schema spec file:<code xml><Encoding default="852" force="true"/></code> 
-<code xml> +  * Disable extended auto-detection (IE algorithms). It can return the wrong result if data for analysis is not sufficient.
-<Encoding default="852" force="true"/>  +
-</code> +
-in SPECIFICATION section of schema spec file. +
-  - Ordered List ItemDisable extended auto-detection (IE algorithms). It can return wrong resultif data for analysis is not sufficient.+
 It can be done with xml flags in settings.xml, section General: It can be done with xml flags in settings.xml, section General:
 <code xml> <code xml>
Line 53: Line 49:
 </code> </code>
  
-Also from now on, extended encoding detection is enabled by default only for syntaxes inherited from deftext (as Plain Text, XML and HTML).+Also from now on, extended encoding detection is enabled by default only for syntaxes inherited from //deftext// (as Plain Text, XMLand HTML).
  
 You can control encoding even in more granular way by disabling some encoding detection methods, which in most cases do not provide false positives. As: You can control encoding even in more granular way by disabling some encoding detection methods, which in most cases do not provide false positives. As:
 +  * **extended** - heuristically based detection of encoding
 +  * **min_confidence** - minimal confidence level for extendede decoding, default is 90, maybe higher than 100
   * **bom** (default //true//) - use BOM signs for for encoding detection   * **bom** (default //true//) - use BOM signs for for encoding detection
   * **unicode** (default //true//) - use UTF16 (LE/BE) statistic detection logic in addition to BOM detection (if BOM is not defined)   * **unicode** (default //true//) - use UTF16 (LE/BE) statistic detection logic in addition to BOM detection (if BOM is not defined)
   * **enc_strings** (default //true//) - use "magic" encoding strings for detection (check [[syntax:specification|EncodingDetection]] in syntax definition)   * **enc_strings** (default //true//) - use "magic" encoding strings for detection (check [[syntax:specification|EncodingDetection]] in syntax definition)
   * **utf8** (default //true//) - use extended algorithms for UTF8 detection in addition to BOM detection (if BOM is not defined)   * **utf8** (default //true//) - use extended algorithms for UTF8 detection in addition to BOM detection (if BOM is not defined)
 +<code xml>
 +<EncodingDetection extended="false" bom="true" unicode="false" enc_string="true" utf8="true" min_confidence="90"/> 
 +</code>
 ===== How to set default encoding for specific syntax ===== ===== How to set default encoding for specific syntax =====
 So, you would like that documents with specific syntax (html, js, css, clipper) will be always open using predefined code page (encoding).  So, you would like that documents with specific syntax (html, js, css, clipper) will be always open using predefined code page (encoding). 
Line 69: Line 70:
 </Encoding> </Encoding>
 </code> </code>
 +
 +More info can be found in [[syntax:specification|SPECIFICATION]] definition.
  
 There are two parts of logic you can influence by changing [[terms:syntax-schema|syntax schema]]: There are two parts of logic you can influence by changing [[terms:syntax-schema|syntax schema]]:
  
 ==== Encoding for Syntax ==== ==== Encoding for Syntax ====
-if you set default encoding for syntax (attribute default), you give a hint to HE, that if none of detection algoithms succeeded, HE should use this default encoding instead of default encoding defined in Options->Document->Defaults (or from Project if project is active and redefines defualt encoding). In example, default encoding for xml documents (and all inherited syntaxes) set "utf-8" If you want to chnage default encoding for some language to 1251 for example, just add this line:+if you set default encoding for syntax (attribute default), you give a hint to HE, that if none of the detection algorithms succeeded, HE should use this default encoding instead of default encoding defined in Options->Document->Defaults (or from Project if the project is active and redefines default encoding). In the example, the default encoding for XML documents (and all inherited syntaxes) set "utf-8" If you want to change default encoding for some language to 1251 for example, just add this line:
 <code xml> <code xml>
 <Encoding default="1251"/>  <Encoding default="1251"/> 
Line 80: Line 83:
  
 ==== Force default encoding ==== ==== Force default encoding ====
-Ifin your case automatic detection of encoding often makes mistakesand determines wrong encoding, you can as globally disable encoding determination as instruct HE to not use automatic detection for specific syntax.  +If in your caseautomatic detection of encoding often makes mistakes and determines wrong encoding, you can as globally disable encoding determination as instruct HE to not use automatic detection for specific syntax.  
-To disable HE encoding auto detection for syntax you need to extend previously described definition of default encoding with force flag:+To disable HE encoding auto-detection for syntax you need to extend previously described definition of default encoding with force flag:
 <code xml> <code xml>
 <Encoding default="1251" force="true"/> <Encoding default="1251" force="true"/>
 </code> </code>
-If such settings exist in schema, HE will never auto detect encoding and always will use specified encoding for documents with this [[terms:syntax-schema|syntax schema]]. Independently from any default settings for document or project, but with respect to explicit selecting of the encoding with menu or file open dialog.+If such settings exist in the schema, HE will never auto detect encoding and always will use specified encoding for documents with this [[terms:syntax-schema|syntax schema]]. Independently from any default settings for document or project, but with respect to explicit selecting of the encoding with menu or file open dialog.
  
 Doing of changes to [[terms:syntax-schema|syntax schema]] files, please keep in mind that default syntax schemes can be overwritten on update (but modified [[terms:syntax-schema|syntax schema]] will be copied to *.old name).  Doing of changes to [[terms:syntax-schema|syntax schema]] files, please keep in mind that default syntax schemes can be overwritten on update (but modified [[terms:syntax-schema|syntax schema]] will be copied to *.old name). 
-Safest way here will be to create your own [[terms:syntax-schema|syntax schema]], inheriting from default, and overwriting of the settings you do not like.+The safest way here will be to create your own [[terms:syntax-schema|syntax schema]], inheriting from default, and overwriting of the settings you do not like.