Skip to content

Conversation

Elettrotecnica
Copy link

  • Replace OCIEnvCreate with OCIEnvNlsCreate
  • Introduce two new configuration options corresponding to the extra arguments OCIEnvNlsCreate allows to specify: EnvCharset -> charset and EnvNXCharset -> ncharset

This allows scenarios where the database is encoded differently from NaviServer.

I create this as a draft because we still need to test it. In the meantime, feel free to comment.

One possible improvement would be to specify the client encoding via its human-readable name, but I could not find an idiom that would just do that, it seems one needs either an environment or a handle to already exist... but maybe I am wrong.

See e.g. https://docs.oracle.com/en/database/oracle/oracle-database/21/lnoci/oci-locale-functions.html#GUID-6C255C8A-180F-4328-9300-CE7BA9BEEE6D

…database

 - Replace OCIEnvCreate with OCIEnvNlsCreate
 - Introduce two new configuration options corresponding to the extra arguments OCIEnvNlsCreate allows to specify: EnvCharset -> charset and EnvNXCharset -> ncharset

This allows scenarios where the database is encoded differently from NaviServer.
@gustafn
Copy link
Contributor

gustafn commented Sep 26, 2024

While I cannot comment, whether this "works" or not (no access to such an environment), I have a few comments by looking at the code
a) since the two last arguments of OCIEnvNlsCreate() specify the client character set, this should be set to UTF-8 by default for both values.
b) I wonder, in which situations the client character is NOT set to UTF-8. the nsdb driver converts from and to UTF-8. If here are cases, where nsoracle directly writes to Tcl (there are such cases in nsdbpg), these conversions should be added there (otherwise there will be problems when working with different Tcl versions).
c) Therefore, there should not be any need for new configuration variables for this purpose. But probably, one should add a boolean configuration variable whether OCIEnvNlsCreate() should be used instead of OCIEnvCreate() to allow users to switch back to the old behavior in cases the "Nls" variant causes somewhere problems.
d) compatibility: it seems, that OCIEnvNlsCreate() was introduced with Oracle9i in 2001. Using it should be safe side from this point. Not sure, if there are platform (windows/bsd) differences. By Googling one seese many OCIEnvNlsCreate() failed messages, but it seems, there are due to missing general environment settings.

Somewhat unrelated to the change: I wonder whether some support for the Oracle data type NCHAR is needed in the driver.

@Elettrotecnica
Copy link
Author

My comments on your feedback

a) I do not see why the default behavior should change. It is not the recommented behavior that one would run db and NaviServer with different encodings, just a situation that one might have to deal with, hence the extra configuration option.
b) afaik, the default client encoding is assumed to be that of the database. In our use case, this is set to a latin charset. I would need to assess further if conversions to tcl are needed...
c) if th default behavio stays the same, this would not be a problem
d) if support for older versions was needed we could release the new code as a separate version

We are currently testing the change on development

Ciao

@Elettrotecnica
Copy link
Author

I ran some tests on the first 65536 characters of Unicode with an idiom like:

set i 1
set ffff [expr {int(pow(16,4))}]
while {$i < $ffff} {
    set char_code [format %llx $i]
    set oracle_char [subst -nocommands -novariables \\u$char_code]
    set oracle_db_char [::xo::dco get_value -dbn the_oracle_database char_round_trip {
        select :oracle_char from dual
    }]
 
   ...(assert they are equals)

   incr i
}

And it seems it will succeed on the whole set if the client encoding via the new parameters is set to that of the database and not UTF-8 (WE8ISO8859P1 or code 31 in this case).

The documentation for OCIEnvNlsCreate say that NLS_LANG and NLS_NCHAR settings control the default client charset when not specified. Maybe just setting the relevant environment variables for the client can achieve the same effect (see e.g. https://docs.oracle.com/en/database/oracle/oracle-database/21/nlspg/setting-up-globalization-support-environment.html#GUID-D5C74C82-8622-46F4-8760-0F8ABA28A816)...

@gustafn
Copy link
Contributor

gustafn commented Sep 27, 2024

When setting the charset parameters to WE8ISO8859P1, you are telling the server that the client is iso8859.1, which is not true. The C interface expects valid UTF-8, otherwise, it can and will crash, especially with Tcl9, which is less tolerant to invalid UTF-8.

With your test, you are testing the transparency (can bytes travel unchanged between Tcl and the DB?), but not the conversion, which is necessary, when e.g. an umlaut (such as "ö", in iso8839-1 the single byte F6) is stored from some e.g. some other application in the database with the encoding WE8ISO8859P1. This byte must be converted to the 2-byte representation for UTF-8. Similarly, when storing a 2-byte umlaut from Tcl in the DB, it should show up there with the single byte representation.

You can check the representation in the Database via sqlplus using e.g.

select VALUE, dump(VALUE,1016) from TABLE ....

where you get the character set value from the field, its length and the HEX representation.

What is the exact problem you are addressing?
a) a single byte umlaut stored in the DB is retrieved mangled in Tcl?
b) saving a 2-byte umlaut from Tcl appears in the DB as mangled (from other applications)
c) both?

It might be the case, that the problem is not the usage of OCIEnvNlsCreate() but somewhere else in the driver.

I will commit a version which makes the Tcl requirements explicit, but I am not fully confident that it addresses your problem, since the default behavior should be already correct for most cases. To activate it, set the parameter "ConvertEncoding" true.

@Elettrotecnica
Copy link
Author

Ok, when the change is there we can pull it and see how it works! Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants