Supported Content Types
itellicoAI supports four types of knowledge items, each designed for different content sources and use cases. Understanding how each type works will help you choose the right format for your information. For guidance on organizing these items, see knowledge base architecture.Text Items
Enter content directly using the built-in editor
File Uploads
Upload PDF, Word, Excel, Text, Markdown, CSV, JSON, YAML, and XML files up to 10MB
URL Scraping
Pull content from one web page
Website Crawl
Discover and import multiple pages from one public site
Text Items
What Are Text Items?
Text items are content you enter directly into the itellicoAI knowledge base editor. They are the most straightforward and reliable content type —immediately available with no processing delay.How to Add a Text Item
Write content
Enter your content in the editor. Use formatting for clarity:
- Headings for sections
- Bullet points for lists
- Numbers for steps
- Bold for emphasis
Best Practices
Structure for retrieval
Write content in clear, self-contained sections. Each section should answer a specific question so RAG (Retrieval-Augmented Generation) retrieval returns focused results.
Use descriptive titles
Improve organization and retrieval accuracy with clear names like “Return Policy - Digital Products” instead of “Policy 4.”
When to Use Text Items
Writing FAQs
Writing FAQs
Create question-and-answer pairs directly in the system.Example:
Creating policy summaries
Creating policy summaries
Write clear, concise policy statements.Example:
Documenting procedures
Documenting procedures
Step-by-step instructions for processes.Example:
Quick reference information
Quick reference information
Brief, frequently referenced information.Example:
Limitations
- No file attachment support —content must be typed or pasted
- Large volumes of content are better managed as file uploads
File Upload Items
What Are File Upload Items?
File upload items allow you to upload existing documents in various formats. The system extracts the text content and makes it available to your agents.How to Add a File Item
Processing Details
The system uses advanced document parsing to extract text from uploaded files:- Text extraction —text-based PDFs and Word documents have their content extracted directly
- OCR (Optical Character Recognition) processing — technology that reads text from scanned images —the platform processes scanned documents and images within PDFs with OCR
- Chunking —extracted content is split into chunks for vector indexing (preparing content for semantic search), enabling retrieval
- Formats: PDF, Word (.doc, .docx), Excel (.xlsx), Text (.txt, .log), Markdown (.md), CSV/TSV (data formats), JSON (data formats), YAML (.yaml, .yml) (data formats), XML (data formats)
- Size limit: 10MB maximum
- Content: Text-based documents and scanned images (advanced parsing handles most scans)
- Protection: No password protection
Best Practices
Optimize before upload
- Compress large files
- Remove unnecessary images
- Use text-based documents when possible
- Keep under 5MB for faster processing
Test extraction
- Review extracted content after processing
- Check for formatting issues
- Verify critical information is accurate
- Re-upload if extraction is poor
Limitations
- Maximum file size of 10MB
- Password-protected files cannot be processed
- Very poor quality scans may produce incomplete or inaccurate text
- Complex layouts (multi-column, heavy tables) may not extract perfectly —review extracted content and consider converting to text items if needed
Troubleshooting
Processing failed
Processing failed
Causes:
- File exceeds 10MB
- File is password-protected
- File is corrupted
- Very poor quality scanned images
- Compress file or split into smaller files
- Remove password protection
- Re-export file from source
- For very poor quality scans, copy content into a text item instead
Content extracted incorrectly
Content extracted incorrectly
Causes:
- Complex layouts (multi-column, tables)
- Very poor quality scanned images
- Special fonts or encoding
- Form fields and interactive elements
- Check extracted content in edit mode
- Re-create as text item with proper formatting
- Simplify document layout before uploading
- Export as plain text document
Processing takes too long
Processing takes too long
What to do:
- Wait 5-10 minutes before assuming failure
- Check file size and page count
- For large files, consider splitting into multiple files
- Convert to text and upload as TEXT items instead
URL Items
What Are URL Items?
URL items scrape content from a single web page and store it in your knowledge base. This is useful for referencing a specific online documentation page, help article, or blog post.How to Add a URL Item
Processing Details
When you add a URL item, the system:- Fetches the page at the provided URL
- Extracts the main text content, stripping navigation, ads, and boilerplate
- Stores the extracted text as the knowledge item content
- Indexes the content for vector search, just like text and file items
Best Practices
Test accessibility
- Open URL in incognito window first
- Verify no login is required
- Check content is visible without JavaScript
- Ensure page loads quickly
Review scraped content
- Check content after scraping completes
- Verify correct content was captured
- Look for formatting issues
- Confirm no extra content (ads, sidebars) was included
Limitations
- Authentication —pages requiring login cannot be scraped
- JavaScript-heavy pages —single-page applications and dynamically loaded content may not be captured
- Paywalled content —content behind paywalls is inaccessible
- No automatic refresh —content is scraped once; you must re-create the item to update
- robots.txt (a file websites use to control automated access) —sites that block scraping will fail
URL scraping works best with simple, text-based web pages. If scraping fails or produces incomplete content, copy the content manually into a text item instead.
Troubleshooting
Scraping failed
Scraping failed
Causes:
- Page requires login/authentication
- URL is incorrect or broken
- Content loads via JavaScript
- Website blocks scraping (robots.txt)
- Page doesn’t exist (404)
- Verify URL is publicly accessible
- Test URL in incognito browser window
- Check URL is complete and correct
- Copy content manually into text item
- Use PDF export of page instead
Content incomplete or wrong
Content incomplete or wrong
Causes:
- JavaScript-rendered content not captured
- Dynamic content loading
- Multiple tabs/sections on page
- Comments or sidebar scraped instead of main content
- Inspect scraped content in edit mode
- Use direct URL to specific content section
- Copy desired content into text item
- Export page as PDF and upload instead
Content becomes outdated
Content becomes outdated
Solution:
Single-page URL content is scraped once at creation time. To update:
- Delete and re-create the URL item
- Or copy current content into a text item for manual updates
- Manual text items you update regularly
- PDF exports you refresh periodically
Website Crawl Items
What Are Website Crawl Items?
Website crawl items discover multiple public pages from one website and import the pages you select. Use this when one knowledge source spans several URLs, such as a help center or documentation site.How to Add a Website Crawl
Crawl Settings
Open Advanced options before discovery to control the crawl scope and refresh behavior.| UI setting | Default | What it controls | When to change it |
|---|---|---|---|
| Max pages to discover | 100 | The maximum number of URLs to discover from the starting site. Available values are 25, 50, 100, 250, and 500. This limits discovery only; you still choose which discovered pages to import. | Lower it for small sites or quick tests. Raise it for larger help centers or documentation sites. |
| Auto-refresh interval | Never | How often the system resyncs already imported pages. Options are Never, Every 24 hours, Every 7 days, and Every 30 days. | Use Every 7 days or Every 30 days for public docs, pricing, policy, or help-center pages that change over time. |
| Include subdomains | Off | Whether discovery may include pages under subdomains of the starting host. If you start from docs.example.com, this allows hosts such as api.docs.example.com; it does not include sibling domains such as help.example.com. | Turn it on only when the site you want to import is split across subdomains under the same host. |
| Discover new pages on refresh | Off, hidden while auto-refresh is Never | When refresh is enabled, the system can re-run discovery and stage newly found pages for review. Newly discovered pages are not included automatically. | Turn it on when the site regularly adds new pages and you want to review them from View pages. |
Limitations
- Public pages work best; authenticated pages are not supported
- JavaScript-heavy pages may not extract cleanly
- Crawls count toward the knowledge base URL/website item limit
- Imported pages still need successful content processing and vector indexing before RAG can retrieve them
Processing Status Flow
Knowledge items go through two separate processing pipelines:- Content Processing —extracting text from files, URLs, and website pages
- Vector Indexing —preparing content for RAG (semantic search)
Processing Status
Orange means the item is still processing. Green means it is ready to use. If an item shows an error, click Reindex to retry.Choosing the Right Content Type
| Your Situation | Best Content Type |
|---|---|
| Writing FAQs from scratch | TEXT |
| Have existing Word/PDF docs under 10MB | FILE |
| Have documents over 10MB | Split into smaller files or extract to TEXT |
| One public web page | URL (with TEXT as backup) |
| Multi-page public documentation or help center | Website Crawl |
| Private/authenticated content | Copy to TEXT |
| Need immediate availability | TEXT (no processing delay) |
| Complex formatting matters | FILE |
Next Steps
Context vs RAG
Learn how agents access your knowledge content
Create Knowledge Bases
Follow the step-by-step creation guide
Architecture Overview
Understand knowledge base structure
Template Syntax
Reference knowledge in your agent prompt