############################################## Haystack + Elasticsearch + kuromoji コトハジメ ############################################## :更新: 2013-09-28 :バージョン: 0.0.9 :作者: @voluntas :URL: http://voluntas.github.io/ **Django + Elasticsearch コトハジメの補足記事です** https://gist.github.com/voluntas/21759d5c45aacc0e6656/ TODO ==== 概要 ==== 目的 ==== - Haystack から簡単に日本語全文検索が出来るようにする - Haystack の Kuromoji 対応 Elasticsearch バックエンド作成する 環境 ==== :Python: 2.7.5 :Elasticsearch: 0.90.5 :redis: 2.6.16 セットアップ ============ Elasticsearch は 0.90.5 がインストールされている前提 kuromoji をインストール ----------------------- :github: https://github.com/elasticsearch/elasticsearch-analysis-kuromoji インストールはコマンドで一発で行けます。 :: $ cd elasticsearch-0.90.5 $ bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 -> Installing elasticsearch/elasticsearch-analysis-kuromoji/1.5.0... Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-analysis-kuromoji/elasticsearch-analysis-kuromoji-1.5.0.zip... Downloading .......................................DONE Installed elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 into /Users/nakai/src/other/elasticsearch-0.90.5/plugins/analysis-kuromoji kuromoji を使うよう elasticsearch-0.90.5/config/elasticsearch.yml を編集する .. code-block:: yml index.analysis.analyzer.default.type: custom index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer 設定はソース参照、一応ハッシュ付きで URL を張っておく。 :url: https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/fc23bfd8f2fc66b32bec0ab292c2cb9a50ef1783/src/test/java/org/elasticsearch/index/analysis/kuromoji_analysis.json .. code-block:: python { "index":{ "analysis":{ "filter":{ "kuromoji_rf":{ "type":"kuromoji_readingform", "use_romaji" : "true" }, "kuromoji_pos" : { "type": "kuromoji_part_of_speech", "enable_position_increment" : "false", "stoptags" : ["# verb-main:", "動詞-自立"] }, "kuromoji_ks" : { "type": "kuromoji_stemmer", "minimum_length" : 6 } }, "tokenizer" : { "kuromoji" : { "type":"kuromoji_tokenizer" } }, "analyzer" : { "kuromoji_analyzer" : { "type" : "custom", "tokenizer" : "kuromoji_tokenizer" } } } } } KuromojiElasticBackend ====================== Kuromoji を追加した SETTINGS を追加する .. code-block:: python from haystack.backends.elasticsearch_backend import ( ElasticsearchSearchBackend, ElasticsearchSearchEngine, ) class KuromojiElasticBackend(ElasticsearchSearchBackend): def __init__(self, connection_alias, **connection_options): super(KuromojiElasticBackend, self).__init__( connection_alias, **connection_options) SETTINGS = { 'settings': { "analysis": { "analyzer": { "ngram_analyzer": { "type": "custom", "tokenizer": "lowercase", "filter": ["haystack_ngram"] }, "edgengram_analyzer": { "type": "custom", "tokenizer": "lowercase", "filter": ["haystack_edgengram"] }, "kuromoji_analyzer" : { "type" : "custom", "tokenizer" : "kuromoji_tokenizer" }, }, "tokenizer": { "haystack_ngram_tokenizer": { "type": "nGram", "min_gram": 3, "max_gram": 15, }, "haystack_edgengram_tokenizer": { "type": "edgeNGram", "min_gram": 2, "max_gram": 15, "side": "front" }, "kuromoji" : { "type":"kuromoji_tokenizer" }, }, "filter": { "haystack_ngram": { "type": "nGram", "min_gram": 3, "max_gram": 15 }, "haystack_edgengram": { "type": "edgeNGram", "min_gram": 5, "max_gram": 15 }, "kuromoji_rf":{ "type":"kuromoji_readingform", "use_romaji" : "true" }, "kuromoji_pos" : { "type": "kuromoji_part_of_speech", "enable_position_increment" : "false", "stoptags" : ["# verb-main:", "動詞-自立"] }, "kuromoji_ks" : { "type": "kuromoji_stemmer", "minimum_length" : 6 }, } } } } setattr(self, 'DEFAULT_SETTINGS', SETTINGS) class KuromojiElasticSearchEngine(ElasticsearchSearchEngine): backend = KuromojiElasticBackend :: ELASTICSEARCH_DEFAULT_ANALYZER = "snowball" おまけ ====== elasticsearch-head をインストール --------------------------------- :github: https://github.com/mobz/elasticsearch-head :url: http://mobz.github.io/elasticsearch-head/ Elasticsearch Cluster を WebUI から見れるプラグイン。 Elasticsearch のプラグインとしてインストールが可能です。 :: $ bin/plugin -install mobz/elasticsearch-head $ open http://127.0.0.1:9200/_plugin/head/ 参考 ==== elasticsearch/elasticsearch-py https://github.com/elasticsearch/elasticsearch-py Python Elasticsearch Client — Elasticsearch 0.4.1 documentation http://elasticsearch-py.readthedocs.org/en/latest/ Stretching Haystack's ElasticSearch Backend — The Wellfire Blog http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/ ElasticSearch で kuromoji を使う (ES 0.90.Beta1 + kuromoji 1.2.0篇) - Qiita [キータ] http://qiita.com/hotchpotch/items/134b049a59fe396c9475 elasticsearch での Kuromoji の使い方 - akishin999の日記 http://d.hatena.ne.jp/akishin999/20130307/1362611100 elasticsearchとkuromojiプラグインで日本語の全文検索 - yuhei.kagaya http://yuheikagaya.hatenablog.jp/entry/2013/08/06/012150 elasticsearchのGUI「elasticsearch-head」がとても便利 - yuhei.kagaya http://yuheikagaya.hatenablog.jp/entry/2013/07/14/185752 elasticsearch - EdgeNgramField min and max letters in django haystack - Stack Overflow http://stackoverflow.com/questions/18908131/edgengramfield-min-and-max-letters-in-django-haystack