Skip to main content
Skip table of contents

AI Enrichment (Content Tagging)

This feature is currently in development but is available to use. Please note that UI and functionality might change from release to release.

Important tip!

This process often takes a bit of trial and error.

We recommend starting with a small set of sample documents, some relevant and some irrelevant to the prompts you're testing in Shinydocs Pro. This lets you quickly see how the model responds and make adjustments to your prompt or model choice without sifting through a massive dataset.

Different models can interpret prompts differently, and outputs may not always be what you expect. Testing with a small, focused sample helps you dial it in faster.

AI Enrichment allows you to analyze document content and generate structured metadata fields using large language models. These fields can be used for tagging, classification, or extraction during a crawl. Enrichments are defined through prompts and executed conditionally based on the document’s metadata.

This guide will walk you through how to configure your enrichment, what each field does, and how to design effective prompts.

image-20250724-201817.png

Example of an AI Enrichment card

Enable the Feature

To access AI Enrichment, you must be licensed for AI features. Please contact your Shinydocs sales rep or visit Artificial Intelligence for more information.

Enable the Feature Flags

Warning! You can break stuff here.

Only enable and disable options recommended by Shinydocs (guide or via support).

  1. Access the Shinydocs Control Center hidden flags page: https://<ShinydocsProServerHostname>:9701/flags
    e.g. https://localhost:9701/flags

  2. Enable the following feature flags:

    1. Tool configuration

    2. AI Analysis Tool

  3. Navigate to Sources from the left-side menu

  4. Next, Disable AI Analysis Tool for all sources besides your test source (you can enable them later)

    1. Select the kebab (3 vertical dots) menu for the source

      image-20250801-131657.png
    2. Slide the toggle for AI Analysis Tool to disable

      image-20250801-131855.png
    3. Repeat for other, non-testing sources.

Step 1 - Choose an Intelligence Engine

Start by selecting an Intelligence engine from the dropdown. This list comes from what you've configured in the Intelligence settings. You must configure at least one model there first.

Step 2 - Configure Options

If needed, you can add Options like temperature, top_k, seed, or max_tokens to control model behavior. These settings behave the same as in the Intelligence setup.

Important tip!

The seed option is a useful option for repeatability. When testing prompts in Shinydocs Pro or working with enrichment tasks, adding seed to your model options ensures the model generates consistent outputs each time. This is especially helpful when refining prompts or comparing model behavior, since randomness is reduced or eliminated. Pick a number, any number! Try using 42, the number it self doesn’t really matter.

Options List

Option

Description

Usage

Seed

Random number used to get repeatable responses from the model.

Use if you want the same result every time for a given prompt.

Temperature

Controls randomness. 0 = more predictable, 1 = more creative.

Use 0.2–0.5 for reliable responses, 0.7+ for brainstorming.

Max Supported Tokens

The max tokens for input + output. Defaults to 4096 for text, 8192 for embeddings.

Match this to your model's actual context limit.

Mirostat Sampling

Algorithm for adaptive sampling. Can be set to Disabled, Mirostat, or Mirostat v2.

Only change if you're experimenting.

Mirostat ETA

Learning rate for Mirostat feedback. Default is 0.1.

Lower = slower learning, higher = faster.

Mirostat Tau

Coherence vs. diversity. Lower = more focused output.

Try 5.0 to start.

Context Window Size

The total tokens the model can "see" at once.

Match your model's capability. E.g. 32768 for llama3:8b.

Repeat Last N

Prevents repetition. 0 = off, 64 = standard.

Helps avoid loops in responses.

Tail Free Sampling

Reduces the impact of less probable tokens. Default is 1.0.

Lower it for more conservative output.

Tokens To Predict

Max tokens to generate. 128 is a good default. Use -1 for infinite generation.

Adjust if you need longer or shorter responses.

Top K

Limits token options to top K most likely. Higher = more diversity.

40 is a balanced default.

Min P

Filters low-probability tokens. 0.0 = no filtering.

Increase to make the output more precise.

Max Batch Size

Requests handled in parallel. 1 = safest.

Increase only if you know your setup can handle it.

Step 3 - Write the Prompts

System Prompt

The system prompt defines the role and goal of the model. This is what sets context for the model's behavior and should explain the task in simple but precise terms.

Important tip!

Write this like you're briefing a junior analyst. Be specific about the task, what you expect returned, and how to handle uncertain cases.

image-20250724-200250.png

Example of a System prompt

Example

You are a data extraction assistant. Your task is to analyze a document and determine if it pertains to any of the client matters listed below. If you identify a match, return all fields from the matching row exactly as shown. If you are not sure or cannot confidently match the document to a matter, respond with "N/A".

User Prompt

This is where you inject the actual text of the document. Use {{text}} in your prompt to insert the full document content that Shinydocs Pro extracted during the crawl. The User prompt is usually much simpler than the System prompt.

image-20250724-200320.png

Example of a User prompt

Example

Document to check:
{{text}}

You can also add follow-up instructions after the injected text if needed, such as "Return results in JSON format" or "Respond with field values only." which can help with older or smaller parameter models.

Step 4 - Set Query Conditions

Query conditions let you control when an enrichment should run. Only files that meet all conditions will be sent to the model.

image-20250724-202145.png

Each condition checks metadata about the file. Some examples:

Field

Operator

Example Value

fulltext

exists

Ensures the file has extracted text

extension

is one of

.docx, .pdf, .txt

schemaType

is one of

filesystem_sharepoint-online

ai-text-analysis-custom1

does not exist

(Avoid re-processing already enriched files where the field ai-text-analysis-custom1 already has a value)

You can add multiple conditions to narrow down processing scope.

We recommend having the following query conditions at minimum, as these conditions allow the tool to only run on new or modified data.

Field

Operator

Value

fulltext

exists

Ensures the file has extracted text

outdated-binary

exists

Files new to the system will automatically have this field, telling the tools "I'm new or have been modified since you last saw me"

path-valid

is not

false

During Shinydocs Pro’s delta analysis, it checks if a file still exists or not. Adding this condition prevents the tool from running on data that doesn’t exist anymore.

Step 5 - Ignored Responses

Define any values that should be treated as a non-result and skipped from enrichment output. For example:

  • N/A

  • [N/A]

  • NA

If the model returns any of these values, the enrichment will be considered empty and no fields will be written.

By default, the UI will have N/A, [N/A] prefilled for you. Adjust the values in here based on how your model responds.

Step 6 - Define Result Fields

This section is where you map out what you want the AI to extract from the document and where that extracted information will be stored.

When the model processes a document, it doesn't just give you raw text back. Instead, it responds with a structured result, a set of named fields, like client, matter_number, or custom_description. Shinydocs Pro prefixes these fields with ai-text-analysis-, that way you know this field’s data came from AI enrichment. The values the model returns for those fields get saved as metadata alongside the document in the Shinydocs index. That means they can be searched, filtered, visualized, or exported just like any other metadata.

You’re telling the system: “Here’s what I expect back from the AI, and this is what type of data it should be.”

How it works

You're setting up the schema for what the AI should return - field names, their data types, and whether they're required. This tells the system how to handle the response, and it tells the model what kind of format you're expecting.

The results of AI Enrichment are dependent on the chosen model. Different models will respond in different ways, which is why choosing the right model for the task at hand is so important. Most models will respect the instructions you provide it, but will vary model-by-model. Experimentation is key!

Each field definition includes:

  • Field name: This is the name that will be stored in the index. Shinydocs will automatically prefix it with ai-text-analysis-.

    • For example, client becomes ai-text-analysis-client.

  • Field type: The expected data format of the returned value. Options include:

    • string: A single piece of text (most common).

    • number: A numeric value (integer or decimal).

    • boolean: true or false.

    • array: A list of values (e.g. a list of names, IDs, keywords).

      • When to use array:

        The array type is used when you expect the model to return multiple items for the same field, for example:

        • A list of people mentioned in the document

        • Multiple project codes or legal clauses

        • A list of risks or key terms

  • Required?: If checked, this field must be returned for the enrichment to be considered valid. If it's missing or blank, the entire result will be discarded for that file.

Example field setup

Field Name

Type

Required?

client_name

string

✅ Yes

related_tags

array

❌ No

In this case:

  • client_name is required because it's the primary value you're trying to extract. If the model can't confidently identify a client name, you likely don't want to keep or act on the result at all. It's the anchor for the rest of your logic.

  • related_tags is optional because it's supplemental. The model might extract useful keywords, topics, or flags if they exist, but you're not relying on them to validate the result. If they're missing or the document doesn't contain any, that’s fine, you still want to keep the enrichment.

Step 7 - Save

Once your enrichment is ready, click Save Changes. Your enrichment configuration will now run as part of your next crawl, processing only documents that match the conditions and storing extracted data into the specified fields.

Need help writing your prompt or tuning results? Reach out to the team or use one of the sample prompts we provide in the documentation.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.