Unicode problem

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Unicode problem

papaiking
Newgenlib supports Unicode 32, but Search result in OPAC is empty.
It seem caused by Apache Solr doesn't support Unicode.

I want to input Unicode UTF-8 for Vietnamese character (This is being standard character code in Vietnam) and make search engine support it.
How do I solve this issue?

Thank developer team for your support.
Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

verussolutions
Administrator
1. Stop NewGenLib server (apache-tomcat)
2. Go to apache-tomcat-6.0.XX/conf directory. Open server.xml with your favourite text editor
3. In approximately lines numbers 69 to 71 you will find the below lines
<Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443"/>

You need to add URIEncoding as UTF-8 and maxHttpHeaderSize=16000.
4. Hence your above lines must be replaced with these
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" URIEncoding="UTF-8" maxHttpHeaderSize="8192" />
5. Start NewGenLib Server
6. Reindex. (Double click on BuildIndex.bat file available in the downloaded InstallNGL3.0)

Now you must see your Vietnamese records

Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

papaiking
I did it, but the result no change.
I think the problem at:
MarcReader reader = new MarcStreamReader(input);

MARC4j may use iso8859-1 as default instead of UTF-8.
We need to specify UTF-8 when using MarcStreamReader.
Here is my console output in server:

org.marc4j.MarcException: error parsing data field for tag: 245 with data:   aBàn về t�
        at org.marc4j.MarcStreamReader.next(MarcStreamReader.java:220)
        at newgenlib.marccomponent.conversion.Converter.getMarcModelsFromMarc(Converter.java:469)
        at org.verus.ngl.indexing.NewBibliographicSolrIndexCreator.indexingData(NewBibliographicSolrIndexCreator.java:113)
        at eof.techProcessing.BuildIndexingPanel.buildIndex(BuildIndexingPanel.java:216)
        at eof.techProcessing.BuildIndexingPanel$2.construct(BuildIndexingPanel.java:198)
        at tools.SwingWorker$2.run(SwingWorker.java:119)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: subfield not terminated
Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

verussolutions
Administrator
Please download the latest Indexer file from http://sourceforge.net/projects/newgenlib/files/NewGenLib/NGL3Indexer/Indexer.zip/download

1. Extract the above zip file. You will get a directory called Indexer. In that directory there is one directory called lib and a file named NGLIndexer.jar. Copy them
2. Paste the above file and directory into your InstallNGL3.0 directory. You will be replacing the old lib directory and old NGLIndexer file already available in the InstallNGL3.0 directory
3. Run NewGenLib Server
4. Double click on BuildIndex.bat in InstallNGL3.0 directory
The indexing must take place now

The latest Indexer is already a part of InstallNGL3.0U1(Update 1)

Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

papaiking
There are many functions caused error like above.
I want to explain MarcReader reader = new MarcStreamReader(input, "UTF-8"); cannot  run correctly if input stream in Unicode
Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

verussolutions
Administrator
Hi,
Thats true.
1. Did you put the new Indexer as instructed earlier?
2. Is your database encoding is set as UTF-8

Can you please send us the database backup for examination by the development team?
Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

verussolutions
Administrator
Please email the backup to info@verussolutions.biz
Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

papaiking
This is my database properties"

CREATE DATABASE newgenlib
  WITH OWNER = newgenlib
       ENCODING = 'UTF8';


Reply | Threaded
Open this post in threaded view
|

Re: Unicode problem

verussolutions
Administrator
In reply to this post by verussolutions
Hi,
The database has been examined. A new Indexer has been uploaded to sourceforge.net. I am pasting the installation instructions for your convenience.
Please download the latest Indexer file from http://sourceforge.net/projects/newgenlib/files/NewGenLib/NGL3Indexer/Indexer.zip/download

1. Extract the above zip file. You will get a directory called Indexer. In that directory there is one directory called lib and a file named NGLIndexer.jar. Copy them
2. Paste the above file and directory into your InstallNGL3.0 directory. You will be replacing the old lib directory and old NGLIndexer file already available in the InstallNGL3.0 directory
3. Run NewGenLib Server
4. Double click on BuildIndex.bat in InstallNGL3.0 directory
The indexing must take place now

-------Reason for problem
We have seen that you created a new library. And the library id for the same is 2. And all the catalog records are created under that new library. We strongly recommend to have only one library and its library id must be 1.
NewGenLib multi-library creation procedures are different and is not done just by creating a row in the library table. Hence we request you to wait till Multi-library routines are available.
Currently to change your library name go to Administration->Configure System-> General Menu -> Library.
Enter library name and put the same string in the Network name also. Save it and restart NewGenlib server.