Lucene help: January 2015

Tuesday, January 13, 2015

Do you want to build phrasequery with words of phrase having wildcards?

public static Query getPhraseQueryOfWildcards(String text, String field) {
        String phraseWords[] = text.split("\\s");
        SpanQuery[] queryParts = new SpanQuery[phraseWords.length];
        for (int i = 0; i < phraseWords.length; i++) {
            WildcardQuery wildQuery = new WildcardQuery(new Term(field,
                    phraseWords[i]));
            queryParts[i] = new SpanMultiTermQueryWrapper<WildcardQuery>(
                    wildQuery);
        }
        return new SpanNearQuery(queryParts, 0, true);
    }

Monday, January 12, 2015

Construct your own phraseQuery with TokenStream of lucene

    public PhraseQuery getPhraseQuery(String text, String fieldName) throws IOException, ParseException {
       PhraseQuery pq=new PhraseQuery();
       pq.setSlop(0);
       TokenStream ts=new StandardAnalyzer().tokenStream(fieldName, text);
       ts.addAttribute(CharTermAttribute.class);
       ts.addAttribute(PositionIncrementAttribute.class);
       ts.reset();
       int pos=0;
       while (ts.incrementToken()) {
           CharTermAttribute charTermAttr=ts.getAttribute(CharTermAttribute.class);
           PositionIncrementAttribute posIncrAttr=ts.getAttribute(PositionIncrementAttribute.class);
           pos+=posIncrAttr.getPositionIncrement();
           pq.add(new Term(fieldName, charTermAttr.toString()), pos-1);
       }
       return pq;
   }

Monday, January 5, 2015

Join in lucene: ToParentBlockJoinQuery and ToChildBlockJoinQuery

ToParentBlockJoinQuery:

-Result in groups where each group represent parent and corresponding matched child’s.

ToChildBlockJoinQuery:

for given parent list all the child's of him.

***Examples***

Use ToParentBlockJoinQuery to retrieve parent doc based on matched child doc.

private List<Document> searchParentDocs(Query childQuery, Query parentQuery)
            throws IOException {
        List<Document> parentDocs = new ArrayList<Document>();

        Filter parentFilter = new FixedBitSetCachingWrapperFilter(
                new QueryWrapperFilter(new TermQuery(new Term("path", "root"))));


        TotalHitCountCollector tc = new TotalHitCountCollector();
        SearchUtility.getFixSearcher().search(parentQuery, tc);

        ToParentBlockJoinQuery parentJoinQuery = new ToParentBlockJoinQuery(
                childQuery, parentFilter, ScoreMode.None);

        logger.debug("Total parent matching parentQuery:" + tc.getTotalHits());

        ToParentBlockJoinCollector pjc = new ToParentBlockJoinCollector(
                Sort.RELEVANCE, tc.getTotalHits() , false, false);

        BooleanQuery searchQuery = new BooleanQuery();
        searchQuery.add(parentJoinQuery, Occur.MUST);
        searchQuery.add(parentQuery, Occur.MUST);

        SearchUtility.getFixSearcher().search(searchQuery, pjc);

        TopGroups<Integer> topgroups = pjc.getTopGroupsWithAllChildDocs(
                parentJoinQuery, Sort.RELEVANCE, 0, 0, true);

        if (topgroups == null)
            return parentDocs;

        GroupDocs<Integer>[] groupdocs = topgroups.groups;

        logger.debug("Total groupdocs:" + groupdocs.length);

        for (GroupDocs<Integer> g : groupdocs) {
            Document parentDoc = SearchUtility.getFixSearcher().doc(
                    g.groupValue);
            logger.debug("Parent DocId:" + g.groupValue);
            parentDocs.add(parentDoc);
            }
        return parentDocs;
    }

Thursday, January 1, 2015

PhraseQuery in lucene: PhraseQuery not searching on capitalized text such as ABC

Indexed text: I am learning ABC.

Suppose You indexed above text as TextField and using StandardAnalyzer for indexwriter
doc.add(new TextField("fieldName", "I am learning ABC", Field.Strore.YES));
You are trying to search with PhraseQuery like
PhraseQuery pq=new PhraseQuery();
        pq.add(new Term("fieldName", "learning"));
        pq.add(new Term("fieldName", "ABC"));
        pq.setSlop(0);
Search will not contain above document, because when textfield is indexed with the help of StandardAnalyzer terms are converted to lowercase and then indexed.