diff options
| author | Monika <monika_hedman@brown.edu> | 2019-06-26 17:54:18 -0400 |
|---|---|---|
| committer | Monika <monika_hedman@brown.edu> | 2019-06-26 17:54:18 -0400 |
| commit | f08914e7b376b92e9046dd8bd4bc4dc2f5996e6f (patch) | |
| tree | 20cace0012e229a5419b360227025241e3cf9fdb /solr-8.1.1/example/files/conf/lang/userdict_ja.txt | |
| parent | 0df1e6093ee5cc2b9b7510b8f4ea5325fd47ffe8 (diff) | |
installed solr
Diffstat (limited to 'solr-8.1.1/example/files/conf/lang/userdict_ja.txt')
| -rw-r--r-- | solr-8.1.1/example/files/conf/lang/userdict_ja.txt | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/solr-8.1.1/example/files/conf/lang/userdict_ja.txt b/solr-8.1.1/example/files/conf/lang/userdict_ja.txt new file mode 100644 index 000000000..6f0368e4d --- /dev/null +++ b/solr-8.1.1/example/files/conf/lang/userdict_ja.txt @@ -0,0 +1,29 @@ +# +# This is a sample user dictionary for Kuromoji (JapaneseTokenizer) +# +# Add entries to this file in order to override the statistical model in terms +# of segmentation, readings and part-of-speech tags. Notice that entries do +# not have weights since they are always used when found. This is by-design +# in order to maximize ease-of-use. +# +# Entries are defined using the following CSV format: +# <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag> +# +# Notice that a single half-width space separates tokens and readings, and +# that the number tokens and readings must match exactly. +# +# Also notice that multiple entries with the same <text> is undefined. +# +# Whitespace only lines are ignored. Comments are not allowed on entry lines. +# + +# Custom segmentation for kanji compounds +日本経済新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞 +関西国際空港,関西 国際 空港,カンサイ コクサイ クウコウ,カスタム名詞 + +# Custom segmentation for compound katakana +トートバッグ,トート バッグ,トート バッグ,かずカナ名詞 +ショルダーバッグ,ショルダー バッグ,ショルダー バッグ,かずカナ名詞 + +# Custom reading for former sumo wrestler +朝青龍,朝青龍,アサショウリュウ,カスタム人名 |
