Copyright (C) 1996 UEKI Masahiro
1. Tools to analyse Japanese sentences using EDR Japanese word
dictionary
(1) Overview
This system integrates morphological and syntactic analysis.
(2) Features
* Using EDR Japanese word dictionary
* Integrating morphological and syntactic analysis
(3) Functions
This system uses EDR Japanese word dictionary developed by
EDR, and integrates morphological and syntactic analysis. EDR Japanese
word dictionary is a quite enormous dictionary containing about
250,000 words. EDR Japanese word dictionary is a part of series of EDR
dictionaries. There are another kinds of dictionaries like conceptual
dictionary, bilingual dicionary etc. These dictionaries are usefull in
many fields. This system provides morphological and syntactic
analysis. Combining with other dictionaries and analyser, it will be
more useful for much broad purpose.
2. Environment
Machine type:
Sun SparcStation10/51
Operating System:
SunOS 4.1.3
Languages:
C (gcc 2.7.2)
protcl ver1.4 (sicstus prolog 2.1#9 + tcl 7.4 + tk 4.0)
perl 4.036
Size of source code:
45k bytes dictionary server (necessary)
71k bytes LR table generator (optional)
72k bytes MSLR parser (not including LR table)
Notes: This package doesn't contain EDR dictionary, because it is not
a free dictionary.
(for detail, see http://www.iijnet.or.jp/edr)
This system uses search library (written in C) by EDR. It is
available from http://www.iijnet.or.jp/edr/E_Tool.html
3. List of files
README.euc this file
dsv/ dictionary server
Makefile
check_dic.c
consult.c
dsv.h
hash.c
hash.h
hinsi_sai2.h
hira_kana2.c
make_exe.c
serv.c
mlr/ LR table generator
LR.c
LR.h
Makefile
README
ReadRules.c
WriteLR.c
WriteLR_cpm.c
config.h
connect.c
cpm.c
error.c
fileio.c
follow.c
gen_sglr.c
grform.c
make_sglr.prl
mlr2.c
time.c
mslr/ MSLR parser
bigram_sai.pl
client.pl
compile.pl
cond.pl
consult.pl
hinshi.pl
hostdsv.prl
jiritsugo.pl
load.pl
pack.pl
parser1-2.sglr.pl
sglr.pl
stack.pl
tree.pl
unpack.pl
usage
util.pl
xmslr.tk
sample/ example
dtrans.patch
sample.bun
sample.con
sample.dic
sample.gr
search.patch
sicstusrc.sample
4. Installation
This system consists of 4 modules, LR table generator, utility
to transform LR table, dictionary server and MSLR parser. MSLR parser
module in this package contains LR table. If you don't want to change
grammar, you need dictionary server and MSLR parser only.
About LR table generator, see appendix.
(1) dictionary server
First, you have to make EDR dictionary search library and
system dictionary for Japanese word dictionary. (see appendix)
Set two environment variables "EDR_LIB_DIR" and
"EDR_INCLUDE_DIR" to proper path.
Make in the dsv directory.
(2) MSLR parser
No particular thing to do. But you had better 'compile' prolog
sources. To compile, execute sicstus prolog and then just 'consult'
"compile.pl".
| ?- [compile].
If fastcode compile is available, you had better set
prolog_flag befor compiling.
| ?- prolog_flag(compiling,_,fastcode).
Note that compiled objects (*.ql) are about 10 times as big as
original sources (*.pl).
The size of LR table (sglr.pl) in the package is quite big, so it
takes long time to 'consult'. You had better 'save' prolog status.
(load library)
| ?- use_module(library(ordsets)).
| ?- use_module(library(lists)).
(if you compiled prolog sources)
| ?- [load].
(if you didn't compile prolog sources)
| ?- [consult].
(to save)
| ?- save(mslr).
To run MSLR parser, run saved object 'mslr'.
% ./mslr
5. How to use
Before run MSLR parser, don't forget run dictionary server 'dsv'.
(1) To analyse all sentences in a file
| ?- mslr(filename).
(2) To analyse some sentences in a file
| ?- kioku(filename). <-- read all sentences from a file
| ?- disp_sentence. <-- display input sentences
| ?- mslr(number). <-- analyse a sentence specified by number
You can use list also.
| ?- mslr([filename1, filename2, filename3]).
| ?- mslr([number1, number2, number3]).
To keep a log, set option 'output'.
| ?- set_option(output).
* Additional function is available only for protcl users.
When you use protcl, you can see parse tree on Tk canvas.
When you already analyse some sentences, just specify sentence number.
| ?- xdisp_tree(number).
Otherwise, you use xmslr instead of mslr.
| ?- xmslr(number).
'xmslr' first analyse sentence, and then call xdisp_tree
automatically.
----------------------------------------------------------------------
Appendix
A. LR table generation
To generate LR table, you need two programs, mlr2 and gen_sglr,
and two data, grammar and connection matrix. First make programs
in proper directory.
Grammar must be written in CFG format. Current system always require
connection matrix, even if you don't want to use it. In such case, prepare a
matrix that all the pairs can connect.
To make LR table that MSLR parser can use, run script as follows.
% ./make_sglr.prl -g grammar -c connection_matrix -o sglr.pl
B. EDR dictionary search library and EDR system dictionary
Some programs related to EDR dictionary are available via WWW.
http://www.iijnet.or.jp/edr/E_Tool.html
Download dictionary browser for windows(Tool-WIndows-Archives.tar.gz).
Directory src/search/ contains sources of dictionary search library, and
src/dtrans/unix contains those of dictionary translator. Note that sources for
search library are written using S-JIS code, so before compile, don't forget
convert kanji code. This package contains some patches. Before compile
programs, update with patches.
Then make in each directories.
To make system dictionary, run dtrans.
% dtrans -WD < JWD.DIC
C. examples
% ./mslr
yes
| ?- kioku('../sample/sample.bun').
yes
| ?- disp_sentence.
yes
| ?- mslr(1).
consulting dictionary ... done
time for consult = 1260 msec
analyzing sentence o.o..oooooooooooooo..oooooooo..oooooooooooooooooooooooooooo.oooooo..o..ooooo done
time for analyze = 3240 msec
unpacking forest ... done
time for unpack = 90 msec
extracting bunsetsu ... done
time for extracting bunsetsu = 130 msec
Number of result = 1
time for total = 4780 msec
yes
D. Sample dictionary
EDR institute allows me to include sample dictionary in this package.
It contains words in the example sentence above.