Tuesday, January 13, 2015

Do you want to build phrasequery with words of phrase having wildcards?

public static Query getPhraseQueryOfWildcards(String text, String field) {
        String phraseWords[] = text.split("\\s");
        SpanQuery[] queryParts = new SpanQuery[phraseWords.length];
        for (int i = 0; i < phraseWords.length; i++) {
            WildcardQuery wildQuery = new WildcardQuery(new Term(field,
                    phraseWords[i]));
            queryParts[i] = new SpanMultiTermQueryWrapper<WildcardQuery>(
                    wildQuery);
        }
        return new SpanNearQuery(queryParts, 0, true);
    }

Monday, January 12, 2015

Construct your own phraseQuery with TokenStream of lucene

    public PhraseQuery getPhraseQuery(String text, String fieldName) throws IOException, ParseException {
        PhraseQuery pq=new PhraseQuery();
        pq.setSlop(0);
        TokenStream ts=new StandardAnalyzer().tokenStream(fieldName, text);
        ts.addAttribute(CharTermAttribute.class);
        ts.addAttribute(PositionIncrementAttribute.class);
        ts.reset();
        int pos=0;
        while (ts.incrementToken()) {
            CharTermAttribute charTermAttr=ts.getAttribute(CharTermAttribute.class);
            PositionIncrementAttribute posIncrAttr=ts.getAttribute(PositionIncrementAttribute.class);
            pos+=posIncrAttr.getPositionIncrement();
            pq.add(new Term(fieldName, charTermAttr.toString()),  pos-1);
        }
        return pq;
    }

Monday, January 5, 2015

Join in lucene: ToParentBlockJoinQuery and ToChildBlockJoinQuery

ToParentBlockJoinQuery:

-Result in groups where each group represent parent and corresponding matched child’s.


ToChildBlockJoinQuery:

for given parent list all the child's of him.

 ***Examples***
  1. Use ToParentBlockJoinQuery to retrieve parent doc based on matched child doc.
private List<Document> searchParentDocs(Query childQuery, Query parentQuery)
            throws IOException {
        List<Document> parentDocs = new ArrayList<Document>();

        Filter parentFilter = new FixedBitSetCachingWrapperFilter(
                new QueryWrapperFilter(new TermQuery(new Term("path", "root"))));

       
        TotalHitCountCollector tc = new TotalHitCountCollector();
        SearchUtility.getFixSearcher().search(parentQuery, tc);
       
        ToParentBlockJoinQuery parentJoinQuery = new ToParentBlockJoinQuery(
                childQuery, parentFilter, ScoreMode.None);

        logger.debug("Total parent matching parentQuery:" + tc.getTotalHits());

        ToParentBlockJoinCollector pjc = new ToParentBlockJoinCollector(
                Sort.RELEVANCE,  tc.getTotalHits() , false, false);

        BooleanQuery searchQuery = new BooleanQuery();
        searchQuery.add(parentJoinQuery, Occur.MUST);
        searchQuery.add(parentQuery, Occur.MUST);

        SearchUtility.getFixSearcher().search(searchQuery, pjc);

        TopGroups<Integer> topgroups = pjc.getTopGroupsWithAllChildDocs(
                parentJoinQuery, Sort.RELEVANCE, 0, 0, true);

        if (topgroups == null)
            return parentDocs;

        GroupDocs<Integer>[] groupdocs = topgroups.groups;

        logger.debug("Total groupdocs:" + groupdocs.length);

        for (GroupDocs<Integer> g : groupdocs) {
            Document parentDoc = SearchUtility.getFixSearcher().doc(
                    g.groupValue);
            logger.debug("Parent DocId:" + g.groupValue);
            parentDocs.add(parentDoc);
             }
        return parentDocs;
    }

Thursday, January 1, 2015

PhraseQuery in lucene: PhraseQuery not searching on capitalized text such as ABC

Indexed text: I am learning ABC.
  •  Suppose You indexed above text as TextField and using StandardAnalyzer for indexwriter
  • doc.add(new TextField("fieldName", "I am learning ABC", Field.Strore.YES));
  • You are trying to search with PhraseQuery like
  • PhraseQuery pq=new PhraseQuery();
            pq.add(new Term("fieldName", "learning"));
            pq.add(new Term("fieldName", "ABC"));
            pq.setSlop(0);
  • Search will not contain above document, because when textfield is indexed with the help of StandardAnalyzer terms are converted to lowercase and then indexed.