Table of Contents

How encoding detection works

Preconditions

  1. The file can have one encoding (same as code page). Encoding can be Unicode ( UTF16 LE, BE (1200, 1201), UTF8 (65000) ) and not Unicode (for example 252 (Western European) etc).
  2. There are several places, where encoding conversion can be applied to document: Open, Save As, New, Search and Replace
  3. The encoding can be selected/changed in File Open/Save dialog, via context menu or status bar, in Project Settings, in Tools→Options→Document settings, in syntax specification (here you can set as preferred encoding, as forced encoding). In addition to this HippoEDIT does an auto-detection of the encoding using different algorithms (Check BOM bytes, statistics test for UTF16 LE/BE, statistics test for UTF8, check by encoding strings and same checks as IE uses).
  4. If encoding for document once changed by the user, this preference has priority over all the rest of settings. Preferences are machine specific but can be reset, if HippoEDIT temp files would be deleted or format of them would change in the new version.

So, how all this works together (or designed to work ) :

New File

For the new File encoding selection (if the setting is not defined, or set to Automatic next taken).

Open File

For Open File encoding selection (if setting is not defined, or set to Automatic next taken)

Saving of the file

For Save File encoding selection (if setting is not defined, or set to Automatic next taken)

Search and Replace

Search and Replace encoding uses same logic as for Open/Save file, but interactive selection of encoding, with Open/Save dialog, not available.

If there are problems

So, if you see that documents are open with wrong encoding, you have several choices of how to solve this:

It can be done with xml flags in settings.xml, section General:

<EncodingDetection extended="false"/> 

Also from now on, extended encoding detection is enabled by default only for syntaxes inherited from deftext (as Plain Text, XML, and HTML).

You can control encoding even in more granular way by disabling some encoding detection methods, which in most cases do not provide false positives. As:

<EncodingDetection extended="false" bom="true" unicode="false" enc_string="true" utf8="true" min_confidence="90"/> 

How to set default encoding for specific syntax

So, you would like that documents with specific syntax (html, js, css, clipper) will be always open using predefined code page (encoding).

Here is example from xml_spec.xml

<Encoding default="utf-8" force="false" bom="false">
...
</Encoding>

More info can be found in SPECIFICATION definition.

There are two parts of logic you can influence by changing syntax schema:

Encoding for Syntax

if you set default encoding for syntax (attribute default), you give a hint to HE, that if none of the detection algorithms succeeded, HE should use this default encoding instead of default encoding defined in Options→Document→Defaults (or from Project if the project is active and redefines default encoding). In the example, the default encoding for XML documents (and all inherited syntaxes) set “utf-8”. If you want to change default encoding for some language to 1251 for example, just add this line:

<Encoding default="1251"/> 

inside of SPECIFICATION part of syntax schema. Of course, if it is already not defined. Take into account, that encoding settings are inherited, so, you can place it also to some base schema to be available in all inherited child schemas, as it done for xml_spec.xml

Force default encoding

If in your case, automatic detection of encoding often makes mistakes and determines wrong encoding, you can as globally disable encoding determination as instruct HE to not use automatic detection for specific syntax. To disable HE encoding auto-detection for syntax you need to extend previously described definition of default encoding with force flag:

<Encoding default="1251" force="true"/>

If such settings exist in the schema, HE will never auto detect encoding and always will use specified encoding for documents with this syntax schema. Independently from any default settings for document or project, but with respect to explicit selecting of the encoding with menu or file open dialog.

Doing of changes to syntax schema files, please keep in mind that default syntax schemes can be overwritten on update (but modified syntax schema will be copied to *.old name). The safest way here will be to create your own syntax schema, inheriting from default, and overwriting of the settings you do not like.