Skip to main content

5 posts tagged with "v0.3.1"

View All Tags

· 2 min read
Yung-Hsiang Hu

Kuwa's RAG application (DocQA/WebQA/DatabaseQA/SearchQA) supports customization of advanced parameters through the Bot's model file starting from version v0.3.1, allowing a single Executor to be virtualized into multiple RAG applications. Detailed parameter descriptions and examples are as follows.

Parameter Description

The following parameter contents are the default values for the v0.3.1 RAG application.

Shared Parameters for All RAGs

PARAMETER retriever_embedding_model "thenlper/gte-base-zh" # Embedding model name
PARAMETER retriever_mmr_fetch_k 12 # MMR fetch k chunks
PARAMETER retriever_mmr_k 6 # MMR fetch k chunks
PARAMETER retriever_chunk_size 512 # Length of each chunk in characters (not restricted for DatabaseQA)
PARAMETER retriever_chunk_overlap 128 # Overlap length between chunks in characters (not restricted for DatabaseQA)
PARAMETER generator_model None # Specify which model to answer, None means auto-selection
PARAMETER generator_limit 3072 # Length limit of the entire prompt in characters
PARAMETER display_hide_ref False # Do not show references

DocQA, WebQA, SearchQA Specific Parameters

PARAMETER crawler_user_agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36" # Crawler UA string

SearchQA Specific Parameters

PARAMETER search_advanced_params "" # Advanced search parameters (SearchQA only)
PARAMETER search_num_url 3 # Number of search results to retrieve [1~10] (SearchQA only)

DatabaseQA Specific Parameters

PARAMETER retriever_database None # Path to vector database on local Executor

Usage Example

Suppose you want to create a DatabaseQA knowledge base and specify a model to answer, you can create a Bot,
select DocQA as the base model, and fill in the following Modelfile.

PARAMETER generator_model "model_access_code" # Specify which model to answer, None means auto-selection
PARAMETER generator_limit 3072 # Length limit of the entire prompt in characters
PARAMETER retriever_database "/path/to/local/database/on/executor" # Path to vector database on local Executor

· 5 min read
Yung-Hsiang Hu

Hi everyone, Kuwa v0.3.1 is out, and this update mainly focuses on multimodal input and output, which now supports both speech and images. Combined with the previously launched Bot system and group chat functions, this allows for practical functions such as meeting summaries, speech summaries, simple image generation, and image editing:

  1. Supports the Whisper speech-to-text model, which can output transcripts from uploaded audio files, and features multi-speaker recognition and timestamps.
  2. Supports the Stable Diffusion image generation model, which can generate images from text input or modify uploaded images based on user instructions.
  3. Huggingface executor supports integration with vision-language models such as Phi-3-Vision and LLaVA.
  4. RAG supports direct parameter adjustment through the Web UI and Modelfile, simplifying the calibration process.
  5. RAG supports displaying original documents and cited passages, making it easier to review search results and identify hallucinations.
  6. Supports importing pre-built RAG vector databases, facilitating knowledge sharing across different systems.
  7. Simplified selection of various open models during installation.
  8. Multi-chat Web UI supports direct export of chat records in PDF, Doc/ODT formats.
  9. Multi-chat Web UI supports Modelfile syntax highlighting, making it easy to edit Modelfiles.
  10. Kernel API supports passing website language, allowing the Executor to customize based on user language.
  11. The Executor removes the default System prompt to avoid compromising model performance.
info

kuwa-v0.3.1 Download information: https://github.com/kuwaai/genai-os/releases/tag/v0.3.1 kuwa-v0.3.1 Single executable download link: https://dl.kuwaai.org/kuwa-os/v0.3.1/

· One min read
Yung-Hsiang Hu

Kuwa v0.3.1 has preliminary support for commonly used visual language models (VLMs). In addition to text inputs, such models can also take images as input and respond to user instructions based on the content of the images. This tutorial will guide you through the initial setup and usage of VLMs.

· 5 min read
Yung-Hsiang Hu

Kuwa v0.3.1 adds Kuwa Speech Recognizer based on the Whisper speech recognition model, which can generate transcripts by uploading audio files, supporting timestamps and speaker labels.

Known Issues and Limitations

Hardware requirements

The default Whisper medium model is used with speaker diarization disabled. The VRAM consumption on GPU is shown in the following table.

Model NameNumber of parametersVRAM requirementRelative recognition speed
tiny39 M~1 GB~32x
base74 M~1 GB~16x
small244 M~2 GB~6x
medium769 M~5 GB~2x
large1550 M~10 GB1x
pyannote/speaker-diarization-3.1
(Speaker Diarization)
-~3GB-

Known limitations

  1. Currently, the input language cannot be detected automatically and must be specified manually.
  2. Currently, the speaker identification module is multi-threaded, causing the model to be reloaded each time, resulting in a longer response time.
  3. Content is easily misjudged when multiple speakers speak at the same time.