经常有朋友在群里问,solr要修改打分机制怎么改?
大多数回答:Similarity是个不错的方案.
但是具体怎么弄很少有明确的描述,官方wiki也只是说可以自定义,具体如何做,没有例子.
首先,solr4.0本身提供了多种评分方法:
org.apache.solr.search.similarities.BM25SimilarityFactory
org.apache.solr.search.similarities.DefaultSimilarityFactory
org.apache.solr.search.similarities.DFRSimilarityFactory
org.apache.solr.search.similarities.IBSimilarityFactory
org.apache.solr.search.similarities.LMDirichletSimilarityFactory
org.apache.solr.search.similarities.LMJelinekMercerSimilarityFactory
org.apache.solr.search.similarities.SchemaSimilarityFactory
每一个有什么不同不在今天讨论范围内,说下怎么配置.
schema.xml
<field name=”bm25_test” type=”text_bm25″ indexed=”true” stored=”true” required=”false” multiValued=”true”/>
<fieldType name=”text_bm25″ class=”solr.TextField”>
<similarity class=”solr.BM25SimilarityFactory”>
</similarity>
</fieldType>
</types>
<similarity class=”solr.SchemaSimilarityFactory”/>
</schema>
bm25_test字段就会按照BM25SimilarityFactory机制评分.那么如何进一步自定义呢,不废话看代码:
package org.nlp.lucene.search.similarities;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.solr.schema.SimilarityFactory;
public class NlpSimilarityFactory extends SimilarityFactory {
public Similarity getSimilarity() {
return new NlpSimilarity();
}
}
package org.nlp.lucene.search.similarities;
import org.apache.lucene.search.similarities.DefaultSimilarity;
public class NlpSimilarity extends DefaultSimilarity {
@Override //idf值全部为1.0f
public float idf(long docFreq, long numDocs) {
return 1.0F;
}
@Override //tf值全部为1.0f
public float tf(float freq) {
return 1.0F;
}
@Override
public String toString() {
return “nlpSimilarity”;
}
}
SimilarityFactory照葫芦画瓢就是,NlpSimilarity可以重写哪些方法看一下DefaultSimilarity就好,当然BM25SimilarityFactory这些都是可以搞的,自己看着办吧,哥只能帮你到这里了.
什么,怎么确定评分的修改是否生效?
查询的时候加上参数debug=true就好.