Jose Manuel Barrueco writes
I've managed to see correct utf8 characters in:
CitEc database -> AMF files
but now the problem is in the ACIS database.
A bit of background here. JMBC and I have been working on the issue of citations lost between RAS and CitEc. It appears that there are issues in the character sets of the reference string (refstring). CitEc produced latin-1 refstrings and stuck them into the AMF files. We changed the column of the reference to utf-8.
There, the character set used in the citations table is still latin1. I've re-processed a document with problems in characters (RePEc:mar:volksw:200425) to test the changes. Before, the characters were ok in ACIS but wrong in CitEc. Not we have the problem in the other side. Try for instance:
mysql> select clid,cnid,ostring from citations where ostring like "SALMON, P. (2003), As%";
Should we change the character set for ACIS too?
I think this will have to be done. I am not sure how it is best to be done and hope that Ivan can advice. We can change the columns to utf-8 and reload all the citations. Maybe at this stage we will remove the link to the citations screen temporarily so that we have a chance to test things. Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel skype: thomaskrichel
If nobody has comments or suggestion on that I would suggest to go ahead with the change latin-1 to utf8. If we proceede like in CitEc it should not be necesary to reload all citations since the proporcion of references afected is quite small... On Sat, 21 Feb 2009, Thomas Krichel wrote:
Jose Manuel Barrueco writes
I've managed to see correct utf8 characters in:
CitEc database -> AMF files
but now the problem is in the ACIS database.
A bit of background here. JMBC and I have been working on the issue of citations lost between RAS and CitEc. It appears that there are issues in the character sets of the reference string (refstring). CitEc produced latin-1 refstrings and stuck them into the AMF files. We changed the column of the reference to utf-8.
There, the character set used in the citations table is still latin1. I've re-processed a document with problems in characters (RePEc:mar:volksw:200425) to test the changes. Before, the characters were ok in ACIS but wrong in CitEc. Not we have the problem in the other side. Try for instance:
mysql> select clid,cnid,ostring from citations where ostring like "SALMON, P. (2003), As%";
Should we change the character set for ACIS too?
I think this will have to be done. I am not sure how it is best to be done and hope that Ivan can advice. We can change the columns to utf-8 and reload all the citations. Maybe at this stage we will remove the link to the citations screen temporarily so that we have a chance to test things.
Cheers,
Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel skype: thomaskrichel
--- José Manuel Barrueco http://www.uv.es/=barrueco
Jose Manuel Barrueco writes
If nobody has comments or suggestion on that I would suggest to go ahead with the change latin-1 to utf8. If we proceede like in CitEc it should not be necesary to reload all citations since the proporcion of references afected is quite small...
I have the following log of a skype chat with Ivan. He seems to think that the change should be done. da? meaning we can change the colon type [5:12:44 PM] Иван В. Курманов: hi Thomas [5:12:50 PM] Иван В. Курманов: :) [5:13:05 PM] Иван В. Курманов: since you asked... [5:13:13 PM] Иван В. Курманов: it should be in utf 8 already [5:13:24 PM] Thomas Krichel: it is not [5:13:25 PM] Иван В. Курманов: it should have been in utf8 from the start [5:13:29 PM] Иван В. Курманов: how do you know? [5:13:39 PM] Thomas Krichel: we looked it up in phpmyadmin [5:13:55 PM] Иван В. Курманов: ok [5:14:07 PM] Иван В. Курманов: but that's just the mysql type [5:14:34 PM] Иван В. Курманов: it matters for comparisons and string functions [5:14:37 PM] Thomas Krichel: JMBC was amazed taht you turned his latin-1 to utf-8 somehowe [5:14:49 PM] Иван В. Курманов: ? [5:15:25 PM] Thomas Krichel: you took his latin-1 in amf, and made it somewho utf-8 [5:15:33 PM] Thomas Krichel: we did not see a conversion. [5:16:02 PM] Иван В. Курманов: did he put latin-1 in AMF? did it have <?xml declaration with encoding='latin-1' ? [5:16:10 PM] Thomas Krichel: no [5:16:14 PM] Thomas Krichel: apparently not [5:16:28 PM] Thomas Krichel: the whole thing is very strange. [5:17:27 PM] Thomas Krichel: I don't want to bother you with this, because this is not an emmergency, so I think we can just cange the type and see what comes out. [5:17:48 PM] Thomas Krichel: I will also check the acis code that creates the tables. [5:18:03 PM] Иван В. Курманов: the fact that something in ACIS database does not have UTF8 type is wrong, but I don't think it is responsible for any real trouble you are having [5:18:31 PM] Thomas Krichel: no, the origin was latin-1 from jmbc [5:18:46 PM] Иван В. Курманов: i might be wrong, (as things change), but this type does not by itself causes any charset conversion [5:18:59 PM] Thomas Krichel: correct. [5:19:12 PM] Thomas Krichel: but it will convert if you change the colon type [5:19:18 PM] Thomas Krichel: I think. [5:19:35 PM] Иван В. Курманов: i doubt; re-check that [5:20:05 PM] Thomas Krichel: I don't know how to check ,but the query took a very long time to run when we made the change on mutabor. [5:20:14 PM] Thomas Krichel: so it must have been converting something. [5:20:31 PM] Иван В. Курманов: "I don't know how to check" -- documentation [5:21:11 PM] Thomas Krichel: I know I have to lookup the docementation. I have not seen it in there last time I looked, but I did not investigate. [5:21:38 PM] Иван В. Курманов: in this case I meant the mysql database [5:22:07 PM] Иван В. Курманов: which would definitely answer if the data conversion takes place if you change the column charset [5:23:10 PM] Thomas Krichel: I know you mean the database. [5:23:23 PM] Thomas Krichel: I am not sure what command phpmyadmin runs. [5:26:15 PM] Thomas Krichel: one think for sure is that under latin-1 the column does not take the utf-8 output that JMBC produces now. [5:26:43 PM] Thomas Krichel: I think we can close the matter here so far. [5:26:52 PM] Thomas Krichel: I hope you are doing fine. [5:27:07 PM] Иван В. Курманов: alright. thanks, Thomas, I'm ok I suggest that JMBC proceeds monday morning, I will be at hand monday to watch out for problems. Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel skype: thomaskrichel
participants (2)
-
Jose Manuel Barrueco -
Thomas Krichel