abacusai.document_retriever
===========================

.. py:module:: abacusai.document_retriever


Classes
-------

.. autoapisummary::

   abacusai.document_retriever.VectorStoreConfig
   abacusai.document_retriever.DocumentRetrieverConfig
   abacusai.document_retriever.DocumentRetrieverVersion
   abacusai.document_retriever.AbstractApiClass
   abacusai.document_retriever.DocumentRetriever


Module Contents
---------------

.. py:class:: VectorStoreConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`


   Config for indexing options of a document retriever. Default values of optional arguments are heuristically selected by the Abacus.AI platform based on the underlying data.

   :param chunk_size: The size of text chunks in the vector store.
   :type chunk_size: int
   :param chunk_overlap_fraction: The fraction of overlap between chunks.
   :type chunk_overlap_fraction: float
   :param text_encoder: Encoder used to index texts from the documents.
   :type text_encoder: VectorStoreTextEncoder
   :param chunk_size_factors: Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
   :type chunk_size_factors: list
   :param score_multiplier_column: If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
   :type score_multiplier_column: str
   :param prune_vectors: Transform vectors using SVD so that the average component of vectors in the corpus are removed.
   :type prune_vectors: bool


   .. py:attribute:: chunk_size
      :type:  int


   .. py:attribute:: chunk_overlap_fraction
      :type:  float


   .. py:attribute:: text_encoder
      :type:  abacusai.api_class.enums.VectorStoreTextEncoder


   .. py:attribute:: chunk_size_factors
      :type:  list


   .. py:attribute:: score_multiplier_column
      :type:  str


   .. py:attribute:: prune_vectors
      :type:  bool


.. py:class:: DocumentRetrieverConfig(client, chunkSize=None, chunkOverlapFraction=None, textEncoder=None, scoreMultiplierColumn=None, pruneVectors=None)

   Bases: :py:obj:`abacusai.return_class.AbstractApiClass`


   A config for document retriever creation.

   :param client: An authenticated API Client instance
   :type client: ApiClient
   :param chunkSize: The size of chunks for vector store, i.e., maximum number of words in the chunk.
   :type chunkSize: int
   :param chunkOverlapFraction: The fraction of overlap between two consecutive chunks.
   :type chunkOverlapFraction: float
   :param textEncoder: The text encoder used to encode texts in the vector store.
   :type textEncoder: str
   :param scoreMultiplierColumn: The values in this metadata column are used to modify the relevance scores of returned chunks.
   :type scoreMultiplierColumn: str
   :param pruneVectors: Corpus specific transformation of vectors that applies dimensional reduction techniques to strip common components from the vectors.
   :type pruneVectors: bool


   .. py:method:: __repr__()

      Return repr(self).



   .. py:method:: to_dict()

      Get a dict representation of the parameters in this class

      :returns: The dict value representation of the class parameters
      :rtype: dict



.. py:class:: DocumentRetrieverVersion(client, documentRetrieverId=None, documentRetrieverVersion=None, createdAt=None, status=None, deploymentStatus=None, featureGroupId=None, featureGroupVersion=None, error=None, numberOfChunks=None, embeddingFileSize=None, warnings=None, resolvedConfig={})

   Bases: :py:obj:`abacusai.return_class.AbstractApiClass`


   A version of document retriever.

   :param client: An authenticated API Client instance
   :type client: ApiClient
   :param documentRetrieverId: The unique identifier of the Document Retriever.
   :type documentRetrieverId: str
   :param documentRetrieverVersion: The unique identifier of the Document Retriever version.
   :type documentRetrieverVersion: str
   :param createdAt: When the Document Retriever was created.
   :type createdAt: str
   :param status: The status of creating Document Retriever version.
   :type status: str
   :param deploymentStatus: The status of deploying the Document Retriever version.
   :type deploymentStatus: str
   :param featureGroupId: The feature group id associated with the document retriever.
   :type featureGroupId: str
   :param featureGroupVersion: The unique identifier of the feature group version at which the Document Retriever version is created.
   :type featureGroupVersion: str
   :param error: The error message when it failed to create the document retriever version.
   :type error: str
   :param numberOfChunks: The number of chunks for the document retriever.
   :type numberOfChunks: int
   :param embeddingFileSize: The size of embedding file for the document retriever.
   :type embeddingFileSize: int
   :param warnings: The warning messages when creating the document retriever.
   :type warnings: list
   :param resolvedConfig: The resolved configurations, such as default settings, for indexing documents.
   :type resolvedConfig: DocumentRetrieverConfig


   .. py:method:: __repr__()

      Return repr(self).



   .. py:method:: to_dict()

      Get a dict representation of the parameters in this class

      :returns: The dict value representation of the class parameters
      :rtype: dict



   .. py:method:: refresh()

      Calls describe and refreshes the current object's fields

      :returns: The current object
      :rtype: DocumentRetrieverVersion



   .. py:method:: describe()

      Describe a document retriever version.

      :param document_retriever_version: A unique string identifier associated with the document retriever version.
      :type document_retriever_version: str

      :returns: The document retriever version object.
      :rtype: DocumentRetrieverVersion



   .. py:method:: wait_for_results(timeout=3600)

      A waiting call until document retriever version is complete.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out.
      :type timeout: int



   .. py:method:: wait_until_ready(timeout=3600)

      A waiting call until the document retriever version is ready.  It restarts the document retriever if it is stopped.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out.
      :type timeout: int



   .. py:method:: wait_until_deployment_ready(timeout = 3600)

      A waiting call until the document retriever deployment is ready to serve.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds.
      :type timeout: int



   .. py:method:: get_status()

      Gets the status of the document retriever version.

      :returns: A string describing the status of a document retriever version (pending, complete, etc.).
      :rtype: str



   .. py:method:: get_deployment_status()

      Gets the status of the document retriever version.

      :returns: A string describing the deployment status of a document retriever version (pending, deploying, etc.).
      :rtype: str



.. py:class:: AbstractApiClass(client, id)

   .. py:method:: __eq__(other)

      Return self==value.



   .. py:method:: _get_attribute_as_dict(attribute)


.. py:class:: DocumentRetriever(client, name=None, documentRetrieverId=None, createdAt=None, featureGroupId=None, featureGroupName=None, indexingRequired=None, latestDocumentRetrieverVersion={}, documentRetrieverConfig={})

   Bases: :py:obj:`abacusai.return_class.AbstractApiClass`


   A vector store that stores embeddings for a list of document trunks.

   :param client: An authenticated API Client instance
   :type client: ApiClient
   :param name: The name of the document retriever.
   :type name: str
   :param documentRetrieverId: The unique identifier of the vector store.
   :type documentRetrieverId: str
   :param createdAt: When the vector store was created.
   :type createdAt: str
   :param featureGroupId: The feature group id associated with the document retriever.
   :type featureGroupId: str
   :param featureGroupName: The feature group name associated with the document retriever.
   :type featureGroupName: str
   :param indexingRequired: Whether the document retriever is required to be indexed due to changes in underlying data.
   :type indexingRequired: bool
   :param latestDocumentRetrieverVersion: The latest version of vector store.
   :type latestDocumentRetrieverVersion: DocumentRetrieverVersion
   :param documentRetrieverConfig: The config for vector store creation.
   :type documentRetrieverConfig: DocumentRetrieverConfig


   .. py:method:: __repr__()

      Return repr(self).



   .. py:method:: to_dict()

      Get a dict representation of the parameters in this class

      :returns: The dict value representation of the class parameters
      :rtype: dict



   .. py:method:: rename(name)

      Updates an existing document retriever.

      :param name: The name to update the document retriever with.
      :type name: str

      :returns: The updated document retriever.
      :rtype: DocumentRetriever



   .. py:method:: create_version(feature_group_id = None, document_retriever_config = None)

      Creates a document retriever version from the latest version of the feature group that the document retriever associated with.

      :param feature_group_id: The ID of the feature group to update the document retriever with.
      :type feature_group_id: str
      :param document_retriever_config: The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.
      :type document_retriever_config: VectorStoreConfig

      :returns: The newly created document retriever version.
      :rtype: DocumentRetrieverVersion



   .. py:method:: refresh()

      Calls describe and refreshes the current object's fields

      :returns: The current object
      :rtype: DocumentRetriever



   .. py:method:: describe()

      Describe a Document Retriever.

      :param document_retriever_id: A unique string identifier associated with the document retriever.
      :type document_retriever_id: str

      :returns: The document retriever object.
      :rtype: DocumentRetriever



   .. py:method:: list_versions(limit = 100, start_after_version = None)

      List all the document retriever versions with a given ID.

      :param limit: The number of vector store versions to retrieve.
      :type limit: int
      :param start_after_version: An offset parameter to exclude all document retriever versions up to this specified one.
      :type start_after_version: str

      :returns: All the document retriever versions associated with the document retriever.
      :rtype: list[DocumentRetrieverVersion]



   .. py:method:: get_document_snippet(document_id, start_word_index = None, end_word_index = None)

      Get a snippet from documents in the document retriever.

      :param document_id: The ID of the document to retrieve the snippet from.
      :type document_id: str
      :param start_word_index: If provided, will start the snippet at the index (of words in the document) specified.
      :type start_word_index: int
      :param end_word_index: If provided, will end the snippet at the index of (of words in the document) specified.
      :type end_word_index: int

      :returns: The documentation snippet found from the document retriever.
      :rtype: DocumentRetrieverLookupResult



   .. py:method:: restart()

      Restart the document retriever if it is stopped. This will start the deployment of the document retriever,

      but will not wait for it to be ready. You need to call wait_until_ready to wait until the deployment is ready.


      :param document_retriever_id: A unique string identifier associated with the document retriever.
      :type document_retriever_id: str



   .. py:method:: wait_until_ready(timeout = 3600)

      A waiting call until document retriever is ready. It restarts the document retriever if it is stopped.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds.
      :type timeout: int



   .. py:method:: wait_until_deployment_ready(timeout = 3600)

      A waiting call until the document retriever deployment is ready to serve.

      :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds.
      :type timeout: int



   .. py:method:: get_status()

      Gets the indexing status of the document retriever.

      :returns: A string describing the status of a document retriever (pending, complete, etc.).
      :rtype: str



   .. py:method:: get_deployment_status()

      Gets the deployment status of the document retriever.

      :returns: A string describing the deployment status of document retriever (pending, deploying, active, etc.).
      :rtype: str



   .. py:method:: get_matching_documents(query, filters = None, limit = None, result_columns = None, max_words = None, num_retrieval_margin_words = None, max_words_per_chunk = None, score_multiplier_column = None, min_score = None, required_phrases = None, filter_clause = None, crowding_limits = None)

      Lookup document retrievers and return the matching documents from the document retriever deployed with given query.

      Original documents are split into chunks and stored in the document retriever. This lookup function will return the relevant chunks
      from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they
      are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance.


      :param query: The query to search for.
      :type query: str
      :param filters: A dictionary mapping column names to a list of values to restrict the retrieved search results.
      :type filters: dict
      :param limit: If provided, will limit the number of results to the value specified.
      :type limit: int
      :param result_columns: If provided, will limit the column properties present in each result to those specified in this list.
      :type result_columns: list
      :param max_words: If provided, will limit the total number of words in the results to the value specified.
      :type max_words: int
      :param num_retrieval_margin_words: If provided, will add this number of words from left and right of the returned chunks.
      :type num_retrieval_margin_words: int
      :param max_words_per_chunk: If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting.
      :type max_words_per_chunk: int
      :param score_multiplier_column: If provided, will use the values in this column to modify the relevance score of the returned chunks. Values in this column must be numeric.
      :type score_multiplier_column: str
      :param min_score: If provided, will filter out the results with score lower than the value specified.
      :type min_score: float
      :param required_phrases: If provided, each result will have at least one of the phrases.
      :type required_phrases: list
      :param filter_clause: If provided, filter the results of the query using this sql where clause.
      :type filter_clause: str
      :param crowding_limits: A dictionary mapping metadata columns to the maximum number of results per unique value of the column. This is used to ensure diversity of metadata attribute values in the results. If a particular attribute value has already reached its maximum count, further results with that same attribute value will be excluded from the final result set.
      :type crowding_limits: dict

      :returns: The relevant documentation results found from the document retriever.
      :rtype: list[DocumentRetrieverLookupResult]



