Skip to content

Conversation

@moshaad7
Copy link
Contributor

@moshaad7 moshaad7 commented Oct 10, 2023

Description

Aim is to let embedder register analyzers in bleve, at run time.
These registered analyzers can then be specified in the index mapping as analyzers for fields.

change log

  • new Analyzer interface

    • new Type() method
    • Analyze() method now returns an interface{} instead of TokenStream
      • caller can cast returned value to appropriate type based on analyzer.Type()
      • for example, some analyzers like to return TokenStream while some would return TokenStream and error.
  • updates in Field interface

    • Analyze() method of a field can now return an error.
    • error handling will be done by scorch/upside_down
  • New Registry to store embedder submitted analysis hooks

  • update analyzer registry to also hold analyzers created using hooks

Related changes:

const (
TokensAnalyzerType = "token"
HookTokensAnalyzerType = "hook_token"
VectorAnalyzerType = "vector"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move vector related stuff to a separate file with "vector" build tag

Err error
}

func AnalyzeForTokens(analyzer Analyzer, input []byte) (TokenStream, error) {
Copy link
Contributor Author

@moshaad7 moshaad7 Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment:
// A utility function, helpful for analyzing an input to generate TokenStream ( and error, if any )

Previously, Analyze() method of an analyzer to return TokenStream.
But as per the change in this PR, Analyze() method will now return a value of type interface{}.
( Validating and using it can be done based on analyzer.Type() )

Thus, For the benefit of users of old Analyzer interface, this utiity will come handly , to migrate to new Analyzer interface.

analyzerType := analyzer.Type()
if analyzerType != TokensAnalyzerType &&
analyzerType != HookTokensAnalyzerType {
return nil, fmt.Errorf("cannot analyze text with analyzer of type: %s",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternate error msg: "given analyzer is not compatible to be used as a token analyzer"

@moshaad7 moshaad7 self-assigned this Oct 10, 2023
- While analyzing a doc, analysis of few fields can fail.
- We want to index the part of doc for which analysis succeeded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants