经常有朋友在群里问,solr要修改打分机制怎么改?
大多数回答:Similarity是个不错的方案.
但是具体怎么弄很少有明确的描述,官方wiki也只是说可以自定义,具体如何做,没有例子.

首先,solr4.0本身提供了多种评分方法:

org.apache.solr.search.similarities.BM25SimilarityFactory
org.apache.solr.search.similarities.DefaultSimilarityFactory
org.apache.solr.search.similarities.DFRSimilarityFactory
org.apache.solr.search.similarities.IBSimilarityFactory
org.apache.solr.search.similarities.LMDirichletSimilarityFactory
org.apache.solr.search.similarities.LMJelinekMercerSimilarityFactory
org.apache.solr.search.similarities.SchemaSimilarityFactory

每一个有什么不同不在今天讨论范围内,说下怎么配置.

schema.xml

<field name=”bm25_test” type=”text_bm25″ indexed=”true” stored=”true” required=”false” multiValued=”true”/>

<fieldType name=”text_bm25″ class=”solr.TextField”>
<similarity class=”solr.BM25SimilarityFactory”>
</similarity>
</fieldType>

</types>
<similarity class=”solr.SchemaSimilarityFactory”/>
</schema>

bm25_test字段就会按照BM25SimilarityFactory机制评分.那么如何进一步自定义呢,不废话看代码:

package org.nlp.lucene.search.similarities;

 

import org.apache.lucene.search.similarities.Similarity;

import org.apache.solr.schema.SimilarityFactory;

 

public class NlpSimilarityFactory extends SimilarityFactory {

public Similarity getSimilarity() {

return new NlpSimilarity();

}

}

package org.nlp.lucene.search.similarities;

import org.apache.lucene.search.similarities.DefaultSimilarity;

public class NlpSimilarity extends DefaultSimilarity {
@Override  //idf值全部为1.0f
public float idf(long docFreq, long numDocs) {
return 1.0F;
}

@Override //tf值全部为1.0f
public float tf(float freq) {
return 1.0F;
}

@Override
public String toString() {
return “nlpSimilarity”;
}

}

SimilarityFactory照葫芦画瓢就是,NlpSimilarity可以重写哪些方法看一下DefaultSimilarity就好,当然BM25SimilarityFactory这些都是可以搞的,自己看着办吧,哥只能帮你到这里了.

什么,怎么确定评分的修改是否生效?

查询的时候加上参数debug=true就好.