Analyzers in ElasticSearch not working -
i using elasticsearch store tweets receive twitter streaming api. before storing them i'd apply english stemmer tweet content, , i'm trying use elasticsearch analyzers no luck.
this current template using:
put _template/twitter { "template": "139*", "settings" : { "index":{ "analysis":{ "analyzer":{ "english":{ "type":"custom", "tokenizer":"standard", "filter":["lowercase", "en_stemmer", "stop_english", "asciifolding"] } }, "filter":{ "stop_english":{ "type":"stop", "stopwords":["_english_"] }, "en_stemmer" : { "type" : "stemmer", "name" : "english" } } } } }, "mappings": { "tweet": { "_timestamp": { "enabled": true, "store": true, "index": "analyzed" }, "_index": { "enabled": true, "store": true, "index": "analyzed" }, "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } }, "text": { "type": "string", "analyzer": "english" } } } } }
when start streaming , index created, mappings i've defined seem apply correctly, text stored comes twitter, raw. index metadata shows:
"settings" : { "index" : { "uuid" : "xiokecoysaezorr7pjetng", "analysis" : { "filter" : { "en_stemmer" : { "type" : "stemmer", "name" : "english" }, "stop_english" : { "type" : "stop", "stopwords" : [ "_english_" ] } }, "analyzer" : { "english" : { "type" : "custom", "filter" : [ "lowercase", "en_stemmer", "stop_english", "asciifolding" ], "tokenizer" : "standard" } } }, "number_of_replicas" : "1", "number_of_shards" : "5", "version" : { "created" : "1010099" } } }, "mappings" : { "tweet" : { [...] "text" : { "analyzer" : "english", "type" : "string" }, [...] } }
what doing wrong? analyzers seems applied correctly, nothing happening :/
thank you!
ps: search query use realize analyzer not being applied:
curl -xget 'http://localhost:9200/_all/_search?pretty' -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "_index:1397574496990" } } ] } }, "filter": { "bool": { "must": [ { "match_all": {} }, { "exists": { "field": "geo.coordinates" } } ] } } } }, "fields": [ "geo.coordinates", "text" ], "size": 50000 }'
this should return stemmed text 1 of fields, response is:
{ "took": 29, "timed_out": false, "_shards": { "total": 47, "successful": 47, "failed": 0 }, "hits": { "total": 2, "max_score": 0.97402453, "hits": [ { "_index": "1397574496990", "_type": "tweet", "_id": "456086643423068161", "_score": 0.97402453, "fields": { "geo.coordinates": [ -118.21122533, 33.79349318 ], "text": [ "happy turtle tuesday ! week crawling wednesday morning 🌊🐢🐢🐢☀️#turtles… http://t.co/wavmcxnf76" ] } }, { "_index": "1397574496990", "_type": "tweet", "_id": "456086701451259904", "_score": 0.97333175, "fields": { "geo.coordinates": [ -81.017636, 33.998741 ], "text": [ "tuesday twins day on here, apparently (it's far occurrence) #tuesdaytwinsday… http://t.co/umhtp6sox6" ] } } ] } }
the text field same came twitter (i'm using streaming api). expect text fields stemmed, analyzer applied.
analyzers don't affect way data stored. so, no matter analyzer using same text source , stored fields. analyzer applied when search. searching text:twin
, finding records word twins
, know stemmer applied.
Comments
Post a Comment