Ollama Quick-Start Guide

Ollama is your local AI engine for running fast, private language models - no cloud required. Shinydocs Pro integrates with Ollama to power document enrichment, data extraction, and intelligent tagging, all behind your firewall.

This guide walks you through installing Ollama, binding it to 0.0.0.0 so Shinydocs Pro can connect, and setting up GPU drivers to get the most out of your hardware.

For more detailed information about installing Ollama on your operating system, check out Ollama’s official documentation:

Graphics Processing Unit (GPU) Drivers

For best performance, make sure your system has the latest NVIDIA or AMD drivers installed. Failing to do so may result in Ollama using only your CPU and RAM (very slow).

To confirm your GPU is supported by Ollama, check the official compatibility list here:

https://github.com/ollama/ollama/blob/main/docs/gpu.md

If your card is listed and drivers are up to date, Ollama will automatically take advantage of your GPU when available.

For the latest NVIDIA drivers, check out: https://www.nvidia.com/en-us/drivers/

For the latest AMD drivers, check out: Drivers and Support for Processors and Graphics

Sizing models based on your resources

There’s no exact formula for matching hardware to LLM performance. In general, smaller models are faster but less accurate, and larger models are slower but more capable. The key is finding the sweet spot between speed, cost, and the level of insight you need.

General guide for VRAM

Use this as a starting point when deciding which models your hardware can realistically handle:

Model Size	Minimum VRAM Needed	Notes
1–2B	4 GB	Great for testing, basic tasks, and low-resource setups
3–4B	6 GB	Good balance for lightweight enrichment and Q&A
7–8B	8–12 GB	Ideal for general-purpose use on modern GPUs (e.g., RTX 3060+)
12–14B	16 GB	High-performing models for advanced use cases
70B+ / MoE	24 GB or more	Workstation-class cards or multi-GPU setups needed

Additional Notes

~1 GB of VRAM per billion parameters is a rough baseline.
Actual memory use varies based on model type, quantization, and input length.
Using quantized models (like :q4_K_M) significantly lowers VRAM demands.
Always test with a small set of documents first to find what works best on your setup.

The goal isn’t just “bigger is better”, it’s about choosing the model that gets good enough answers fast enough for your workflow.

Windows

Install Ollama

Download the installer: https://ollama.com/download
Run the installer and complete the setup
Open Command Prompt or PowerShell, and test it:
CODE
```
ollama run llama3
```

Bind to `0.0.0.0` for external server access

To allow Shinydocs Pro to connect to Ollama:

Press Win + S, search for Environment Variables, and open the system settings
Under System Variables, click New
- Variable name: OLLAMA_HOST
- Variable value: 0.0.0.0
Click OK on all dialogs to apply
Restart the Ollama app

Ollama will now listen on all interfaces, allowing Shinydocs Pro to connect using your machine's IP address.

Linux

Install Ollama

Visit Download Ollama on Linux for the latest version
Start the Ollama service:
CODE
```
sudo systemctl start ollama
```
Enable on boot:
CODE
```
sudo systemctl enable ollama
```
Test it:
CODE
```
ollama run llama3
```

Bind to `0.0.0.0` for external server access

To allow network access for Shinydocs Pro:

Run:
CODE
```
sudo systemctl edit ollama.service
```

Add the following lines:

CODE

[Service] Environment="OLLAMA_HOST=0.0.0.0"

Save and close

Reload the systemd daemon and restart Ollama:

CODE

sudo systemctl daemon-reload sudo systemctl restart ollama

Now Ollama will bind to all interfaces.

Pull some models for use in Shinydocs Pro

We highly recommend using nomic-embed-text as your embedding model. You can pull this from Ollama by running the following command:

ollama pull nomic-embed-text:latest

Ollama supported models for different use cases

You will need to make sure the model you download will fit into your GPU’s VRAM.

Feel free to explore Ollama’s models and how they might fit your use cases: Ollama Models

Model Name	Use-Cases	Ollama Pull Command	Link
Gemma3 (Google DeepMind)	Document Q&A Creative output Summarization Sentiment	`ollama pull gemma3:4b` `ollama pull gemma3:12b` `ollama pull gemma3:27b`	gemma3
Cogito	Text classification PII and sensitive info detection Pattern parsing Analysis-based questions Sentiment	`ollama pull cogito:8b` `ollama pull cogito:14b` `ollama pull cogito:32b` `ollama pull cogito:70b`	cogito
Llama3.2 (Meta)	General conversation Document Q&A Summarization Creative, non-technical prompts	`ollama pull llama3.2:1b` `ollama pull llama3.2:3b`	llama3.2
Llama3.1 (Meta)	High-performance Q&A Structured enrichment General enterprise analysis	`ollama pull llama3.1:8b` `ollama pull llama3.1:70b` `ollama pull llama3.1:405b`	llama3.1
Mistral (Mistral AI)	Efficient instruction following General enrichment Fast summarization	`ollama pull mistral:7b`	mistral
Phi-4 (Microsoft)	Complex context understanding Compliance QA Technical/Scientific summary and analysis	`ollama pull phi4:14b`	phi4
Phi-3 (Microsoft)	Lightweight structured enrichment Low-resource Q&A Efficient summarization	`ollama pull phi3:3.8b`	phi3
SmolLM2	Basic document Q&A Structured extraction on small text bodies Low-resource summarization	`ollama pull smollm2:135m` `ollama pull smollm2:360m` `ollama pull smollm2:1.7b`	smollm2
Qwen 3 (Alibaba)	Multi-lingual support Complex text Math/numbers based insights Code understanding	`ollama pull qwen3:0.6b` `ollama pull qwen3:1.7b` `ollama pull qwen3:4b` `ollama pull qwen3:8b` `ollama pull qwen3:14b` `ollama pull qwen3:30b` `ollama pull qwen3:32b` `ollama pull qwen3:235b`	qwen3
IBM Granite 3.3	Complex reasoning Compliance QA Larger (128k+ token) context windows	`ollama pull granite3.3:2b` `ollama pull granite3.3:8b`	granite3.3
DeepSeek-R1	Thinking model (responds with <think> tags to show reasoning Slower responses as there is usually more text in the response Detailed compliance QA Analytical document workflows	`ollama pull deepseek-r1:1.5b` `ollama pull deepseek-r1:7b` `ollama pull deepseek-r1:8b` `ollama pull deepseek-r1:14b` `ollama pull deepseek-r1:32b` `ollama pull deepseek-r1:70b` `ollama pull deepseek-r1:671b`	deepseek-r1

Graphics Processing Unit (GPU) Drivers

Sizing models based on your resources

General guide for VRAM

Additional Notes

Windows

Install Ollama

Bind to 0.0.0.0 for external server access

Linux

Install Ollama

Bind to 0.0.0.0 for external server access

Pull some models for use in Shinydocs Pro

Bind to `0.0.0.0` for external server access

Bind to `0.0.0.0` for external server access