Artwork

Indhold leveret af HPR Volunteer and Hacker Public Radio. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af HPR Volunteer and Hacker Public Radio eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

HPR4337: Open Web UI

 
Del
 

Manage episode 471946105 series 32765
Indhold leveret af HPR Volunteer and Hacker Public Radio. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af HPR Volunteer and Hacker Public Radio eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

This show has been flagged as Explicit by the host.

OpenWebUI notes ...

Open WebUI installer: https://github.com/freeload101/SCRIPTS/blob/master/Bash/OpenWebUI_Fast.bash

Older Professor synapse prompt you can use: https://raw.githubusercontent.com/freeload101/SCRIPTS/refs/heads/master/Prof%20Synapse%20Old.txt

Fabric prompts you can import into openwebui !!! ( https://github.com/danielmiessler/fabric/tree/main/patterns ) https://github.com/freeload101/SCRIPTS/blob/master/MISC/Fabric_Prompts_Open_WebUI_OpenWebUI_20241112.json

Example AT windows task startup script to make it start and not die on boot https://github.com/freeload101/SCRIPTS/blob/master/MISC/StartKokoro.xml

Open WebUI RAG fail sause ... https://youtu.be/CfnLrTcnPtY

Open registration

Model list / order

NAME ID SIZE MODIFIED 
hf.co/mradermacher/L3-8B-Stheno-v3.2-i1-GGUF:Q4_K_S 017d7a278e7e 4.7 GB 2 days ago 
qwen2.5:32b 9f13ba1299af 19 GB 3 days ago 
deepsex:latest c83a52741a8a 20 GB 3 days ago 
HammerAI/openhermes-2.5-mistral:latest d98003b83e17 4.4 GB 2 weeks ago 
Sweaterdog/Andy-3.5:latest d3d9dc04b65a 4.7 GB 2 weeks ago 
nomic-embed-text:latest 0a109f422b47 274 MB 2 weeks ago 
deepseek-r1:32b 38056bbcbb2d 19 GB 4 weeks ago 
psyfighter2:latest c1b3d5e5be73 7.9 GB 2 months ago 
CognitiveComputations/dolphin-llama3.1:latest ed9503dedda9 4.7 GB 2 months ago 

Disable Arena models

Documents WIP RAG is not good .

Discord notes;

https://discord.com/channels/1170866489302188073/1340112218808909875

  • Abhi Chaturvedi: @(Operat0r) try this To reduce latency and improve accuracy, modify the .env file: Enable RAG ENABLE_RAG=true
  • Use Hybrid Mode (Retrieval + Reranking for better context)
  • RAG_MODE=hybrid
  • Reduce the number of retrieved documents (default: 5)
  • RETRIEVAL_TOP_K=3
  • Use a Fast Embedding Model (instead of OpenAI's Ada-002)
  • EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster and lightweight . Optimize the Vector Database VECTOR_DB_TYPE=chroma CHROMA_DB_IMPL=hnsw # Faster search CHROMA_DB_PATH=/root/open-webui/backend/data/vector_db. Optimize Backend Performance # Increase Uvicorn worker count (improves concurrency) UVICORN_WORKERS=4
  • Increase FastAPI request timeout (prevents RAG failures)
  • FASTAPI_TIMEOUT=60
  • Optimize database connection pool (for better query performance)
  • SQLALCHEMY_POOL_SIZE=10
  • So probably the first thing to do is increase the top K value in admin -> settings -> documents, or you could try the new "full context mode" for rag documents. You may also need to increase the context size on the model, but it will make it slower, so you probably don't want to do that unless you start seeing the "truncating input" warnings.
  • @JamesK
  • So probably the first thing to do is increase the top K value in admin -> settings -> documents, or you could try the new "full context mode" for rag documents. You may also need to increase the context size on the model, but it will make it slower, so you probably don't want to do that unless you start seeing the "truncating input" warnings.
  • M]
  • JamesK: Ah, I see. The rag didn't work great for you in this prompt. There are three hits and the first two are duplicates, so there isn't much data for the model to work with
  • [9:12 PM] JamesK: context section
  • I see a message warning that you are using the default 2048 context length, but not the message saying you've hit that limit (from my logs the warning looks like
  • level=WARN source=runner.go:126 msg="truncating input prompt" limit=32768 prompt=33434 numKeep=5
  • [6:06 AM] JamesK: If you set the env var OLLAMA_DEBUG=1 before running ollama serve it will dump the full prompt being sent to the model, that should let you confirm what the rag has put in the prompt
  • JamesK: Watch the console output from ollama and check for warnings about overflowing the context. If you have the default 2k context you may need to increase it until the warnings go away
  • [8:58 PM] JamesK: But also, if you're using the default rag, it chunks the input into small fragments, then matches the fragments against your prompt and only inserts a few fragments into the context, not the entire document. So it's easily possible for the information you want to not be present.

Auto updates

echo '0,12 */4 * * * docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui' >> /etc/crontab 

Search

red note for API keys

  1. Go to Google Developers, use Programmable Search Engine , and log on or create account.
  2. Go to control panel and click Add button
  3. Enter a search engine name, set the other properties to suit your needs, verify you're not a robot and click Create button.
  4. Generate API key and get the Search engine ID . (Available after the engine is created)
  5. With API key and Search engine ID , open Open WebUI Admin panel and click Settings tab, and then click Web Search
  6. Enable Web search and Set Web Search Engine to google_pse
  7. Fill Google PSE API Key with the API key and Google PSE Engine Id (# 4)
  8. Click Save

Note

You have to enable Web search in the prompt field, using plus ( + ) button. Search the web ;-)

Kokoro / Open Webui

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file

apt update 
apt upgrade 
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list 
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list 
sudo apt-get update 
sudo apt-get install -y nvidia-container-toolkit 
apt install docker.io -y 
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2 

http://localhost:8880/v1

af_bella

Import fabric prompts

https://raw.githubusercontent.com/freeload101/Python/46317dee34ebb83b01c800ce70b0506352ae2f3c/Fabric_Prompts_Open_WebUI_OpenWebUI.py

Provide feedback on this episode.

  continue reading

859 episoder

Artwork

HPR4337: Open Web UI

Hacker Public Radio

143 subscribers

published

iconDel
 
Manage episode 471946105 series 32765
Indhold leveret af HPR Volunteer and Hacker Public Radio. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af HPR Volunteer and Hacker Public Radio eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

This show has been flagged as Explicit by the host.

OpenWebUI notes ...

Open WebUI installer: https://github.com/freeload101/SCRIPTS/blob/master/Bash/OpenWebUI_Fast.bash

Older Professor synapse prompt you can use: https://raw.githubusercontent.com/freeload101/SCRIPTS/refs/heads/master/Prof%20Synapse%20Old.txt

Fabric prompts you can import into openwebui !!! ( https://github.com/danielmiessler/fabric/tree/main/patterns ) https://github.com/freeload101/SCRIPTS/blob/master/MISC/Fabric_Prompts_Open_WebUI_OpenWebUI_20241112.json

Example AT windows task startup script to make it start and not die on boot https://github.com/freeload101/SCRIPTS/blob/master/MISC/StartKokoro.xml

Open WebUI RAG fail sause ... https://youtu.be/CfnLrTcnPtY

Open registration

Model list / order

NAME ID SIZE MODIFIED 
hf.co/mradermacher/L3-8B-Stheno-v3.2-i1-GGUF:Q4_K_S 017d7a278e7e 4.7 GB 2 days ago 
qwen2.5:32b 9f13ba1299af 19 GB 3 days ago 
deepsex:latest c83a52741a8a 20 GB 3 days ago 
HammerAI/openhermes-2.5-mistral:latest d98003b83e17 4.4 GB 2 weeks ago 
Sweaterdog/Andy-3.5:latest d3d9dc04b65a 4.7 GB 2 weeks ago 
nomic-embed-text:latest 0a109f422b47 274 MB 2 weeks ago 
deepseek-r1:32b 38056bbcbb2d 19 GB 4 weeks ago 
psyfighter2:latest c1b3d5e5be73 7.9 GB 2 months ago 
CognitiveComputations/dolphin-llama3.1:latest ed9503dedda9 4.7 GB 2 months ago 

Disable Arena models

Documents WIP RAG is not good .

Discord notes;

https://discord.com/channels/1170866489302188073/1340112218808909875

  • Abhi Chaturvedi: @(Operat0r) try this To reduce latency and improve accuracy, modify the .env file: Enable RAG ENABLE_RAG=true
  • Use Hybrid Mode (Retrieval + Reranking for better context)
  • RAG_MODE=hybrid
  • Reduce the number of retrieved documents (default: 5)
  • RETRIEVAL_TOP_K=3
  • Use a Fast Embedding Model (instead of OpenAI's Ada-002)
  • EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster and lightweight . Optimize the Vector Database VECTOR_DB_TYPE=chroma CHROMA_DB_IMPL=hnsw # Faster search CHROMA_DB_PATH=/root/open-webui/backend/data/vector_db. Optimize Backend Performance # Increase Uvicorn worker count (improves concurrency) UVICORN_WORKERS=4
  • Increase FastAPI request timeout (prevents RAG failures)
  • FASTAPI_TIMEOUT=60
  • Optimize database connection pool (for better query performance)
  • SQLALCHEMY_POOL_SIZE=10
  • So probably the first thing to do is increase the top K value in admin -> settings -> documents, or you could try the new "full context mode" for rag documents. You may also need to increase the context size on the model, but it will make it slower, so you probably don't want to do that unless you start seeing the "truncating input" warnings.
  • @JamesK
  • So probably the first thing to do is increase the top K value in admin -> settings -> documents, or you could try the new "full context mode" for rag documents. You may also need to increase the context size on the model, but it will make it slower, so you probably don't want to do that unless you start seeing the "truncating input" warnings.
  • M]
  • JamesK: Ah, I see. The rag didn't work great for you in this prompt. There are three hits and the first two are duplicates, so there isn't much data for the model to work with
  • [9:12 PM] JamesK: context section
  • I see a message warning that you are using the default 2048 context length, but not the message saying you've hit that limit (from my logs the warning looks like
  • level=WARN source=runner.go:126 msg="truncating input prompt" limit=32768 prompt=33434 numKeep=5
  • [6:06 AM] JamesK: If you set the env var OLLAMA_DEBUG=1 before running ollama serve it will dump the full prompt being sent to the model, that should let you confirm what the rag has put in the prompt
  • JamesK: Watch the console output from ollama and check for warnings about overflowing the context. If you have the default 2k context you may need to increase it until the warnings go away
  • [8:58 PM] JamesK: But also, if you're using the default rag, it chunks the input into small fragments, then matches the fragments against your prompt and only inserts a few fragments into the context, not the entire document. So it's easily possible for the information you want to not be present.

Auto updates

echo '0,12 */4 * * * docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui' >> /etc/crontab 

Search

red note for API keys

  1. Go to Google Developers, use Programmable Search Engine , and log on or create account.
  2. Go to control panel and click Add button
  3. Enter a search engine name, set the other properties to suit your needs, verify you're not a robot and click Create button.
  4. Generate API key and get the Search engine ID . (Available after the engine is created)
  5. With API key and Search engine ID , open Open WebUI Admin panel and click Settings tab, and then click Web Search
  6. Enable Web search and Set Web Search Engine to google_pse
  7. Fill Google PSE API Key with the API key and Google PSE Engine Id (# 4)
  8. Click Save

Note

You have to enable Web search in the prompt field, using plus ( + ) button. Search the web ;-)

Kokoro / Open Webui

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file

apt update 
apt upgrade 
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list 
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list 
sudo apt-get update 
sudo apt-get install -y nvidia-container-toolkit 
apt install docker.io -y 
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2 

http://localhost:8880/v1

af_bella

Import fabric prompts

https://raw.githubusercontent.com/freeload101/Python/46317dee34ebb83b01c800ce70b0506352ae2f3c/Fabric_Prompts_Open_WebUI_OpenWebUI.py

Provide feedback on this episode.

  continue reading

859 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning

Lyt til dette show, mens du udforsker
Afspil