Elasticsearch基本概念和应用

ES简介

Elasticsearch是一个基于Apache Lucene（TM）的开源搜索引擎。无论在开源还是专有领域，Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。

历史

多年前，一个叫做Shay Banon的刚结婚不久的失业开发者，由于妻子要去伦敦学习厨师，他便跟着也去了。在他找工作的过程中，为了给妻子构建一个食谱的搜索引擎，他开始构建一个早期版本的Lucene。

直接基于Lucene工作会比较困难，所以Shay开始抽象Lucene代码以便Java程序员可以在应用中添加搜索功能。他发布了他的第一个开源项目，叫做 “Compass” 。

后来Shay找到一份工作，这份工作处在高性能和内存数据网格的分布式环境中，因此高性能的、实时的、分布式的搜索引擎也是理所当然需要的。然后他决定重写Compass库使其成为一个独立的服务叫做Elasticsearch。

第一个公开版本出现在2010年2月，在那之后Elasticsearch已经成为Github上最受欢迎的项目之一，代码贡献者超过300人。一家主营Elasticsearch的公司就此成立，他们一边提供商业支持一边开发新功能，不过Elasticsearch将永远开源且对所有人可用。

Shay的妻子依旧等待着她的食谱搜索……

ES vs. RDB

Elasticsearch -> 索引(index)->类型(type)->文档(doc.)->字段(field)

Relational DB -> 数据库 ->表 -> 行 -> 列

RESTful API

GET /_search
{
 "query": {
 "match_all": {}
 }
}
PUT /test/employee/1
{
 "first_name" : "John",
 "last_name" : "Smith",
 "age" : 25,
 "about" : "I love to go rock climbing",
 "interests": [ "sports", "music" ]
}
DELETE /index
DELETE /pre_*

简单的搜索

URL形式:
http://localhost:9200/test/_search?q=-about:love
http://localhost:9200/test/_search?q=collect|build
http://localhost:9200/test/_search?q=age:>30

DSL形式:
GET /test/employee/_search
{
 "query": {
 "match": {
 "last_name": "smith"
 }
 }
}

返回结果

took:耗时
shards:分片
hits:命中（结果）
_score:评分
_source:文档

稍复杂的搜索

GET /test/employee/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "smith"
                }
            }
        }
    }
}

匹配方式

match(全文搜索)
match_phrase(短语搜索)
term(精确匹配)
range(范围)
exists & missing(存在&不存在)

match:全文搜索

GET /test/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

match_phrases:短语搜索

GET /test/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

term:精确匹配

GET /test/employee/_search
{
    "query" : {
        "term" : {
            "age" : 25
        }
    }
}

range:区间匹配

GET /test/employee/_search
{
    "query" : {
        "range": {
            "age": {
                "gte":  20,
                "lt":   30
            }
        }
    }
}

bool:逻辑组合

GET /test/employee/_search
{
    "query" : {
        "bool": {
            "should": [
                {
                    "bool":{
                        "must":     { "match": { "about": "rock" }},
                        "must_not": { "range": { "age": {"lt":30}}}
                    }
                },
                { "term": { "first_name":  "john"   }}
            ]
        }
    }
}

must:: (AND)查询指定文档一定要被包含。
must_not:: (NOT)查询指定文档一定不要被包含。
should:: (OR)查询指定文档，有则可以为文档相关性加分。

filter过滤

过滤与匹配似乎功能类似，事实上将过滤条件写入query语句中也可以得到同等的结果集。区别是query中的条件会影响评分，而filter中的条件只做过滤，不影响评分和排序。

GET /_search
{
    "query": {
        "filtered": {
            "query": {"match": {"about": "rock"}}, 
            "filter":   { "range": { "age": {"gt":30} }}
        }
    }
}

aggs:聚合

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

GET /test/employee/_search
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

highlight匹配高亮

GET /test/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {"about" : {}}
    }
}

from,size:分页查询

GET /test/employee/_search
{
    "query": {
        "match": {
            "last_name": "smith"
        }
    },
    "from": 1,
    "size": 2
}

发表评论