Gemini File Search Builder is an innovative solution that empowers developers to create powerful knowledge bases from any website, using automatic citations and seamless integration with Google's AI-powered Gemini platform. With this cutting-edge tool, you can turn any website into a queryable repository of information, perfect for building AI chatbots, searchable knowledge bases, and RAG applications.

What You Get

With the Gemini File Search Builder, you get unlimited queries with automatic citations, making it easy to create permanent knowledge bases from any website. The best part? Storage is free, and queries use standard Gemini model pricing (subject to Google's rates).

Key Benefits

  • One-time scraping: Actor fee of $0.0015 per page (plus Apify scraper and Gemini costs)
  • Automatic citations: Every answer includes sources
  • Free storage: File Search stores persist indefinitely at no cost
  • Cross-platform compatibility: Query from Python, web, or mobile
  • Challenge compliance: 100% banned scraper filtering

Key Features

The Gemini File Search Builder boasts a range of features that make it an ideal solution for swift app development:

  • Automatic RAG pipeline: Scrape → Clean → Upload to Gemini (all in one run)
  • Built-in citations: Every answer includes source documents
  • No per-query fees: Queries use standard Gemini token pricing (no File Search markup)
  • Challenge compliance: 100% banned scraper filtering (Instagram, Amazon, Google Maps, etc.)
  • Zero setup: Just provide URL + Gemini API key
  • Cost-optimized: Smart scraper selection based on your budget

Use Cases

The Gemini File Search Builder is perfect for a range of use cases, including:

  • Documentation indexing: Convert technical docs into queryable knowledge bases
  • Research databases: Create searchable archives from academic sites
  • Content libraries: Index blog posts, articles, tutorials
  • Internal wikis: Transform company knowledge bases for AI access

How It Works

The Gemini File Search Builder uses a simple three-step process to create a queryable knowledge base:

  1. Website URL → Scraper Selection → Content Extraction → Document Conversion
  2. Smart scraper selection: Analyzes target and selects optimal Apify scraper
  3. Content cleaning: Removes ads, navigation, extracts main content
  4. Document creation: Formats as clean text with metadata
  5. Gemini upload: Creates File Search Store (persistent, free storage)
  6. Query guide: Returns instructions for using your knowledge base

How to Build a Gemini Knowledge Base (3 Steps)

  1. Get API Keys:
  • Visit https://aistudio.google.com/apikey and create new API key (free tier available)
  • ⚠️ Important: Use the SAME key you'll use to query the knowledge base later
  1. Run the Actor:
  • {"target": "https://docs.python.org","max_pages": 100,"scraper_budget": "optimal","corpus_name": "python-docs","gemini_api_key": "YOUR_GEMINI_KEY","apify_token": "YOUR_APIFY_TOKEN"}
  1. Query Your Knowledge Base:
  • After the actor completes, query your knowledge base using Google AI Studio (web interface), Python SDK (for developers), or Gemini mobile apps (iOS/Android)

Input Parameters

| Parameter | Type | Required | Default | Description |

|---|---|---|---|---|

target | string | ✅ | - | Website URL to scrape and index |

max_pages | integer | 10 | Maximum pages to scrape (1-2000) | |

scraper_budget | string | "optimal" | Cost strategy: minimal , optimal , premium | |

corpus_name | string | ✅ | - | Unique name for your knowledge base |

gemini_api_key | string | ✅ | - | Google Gemini API key |

apify_token | string | ✅ | - | Apily API token |

Output

{"file_search_store_name": "fileSearchStores/pythondocs-abc123","files_indexed": 150,"total_size_mb": 2.5,"estimated_tokens": 125000,"indexing_cost_usd": 0.0188,"storage_type": "File Search Store","storage_persistence": "Indefinite (free)","query_cost_estimate": "$0.001 per query","query_guide_url": "https://docs.google.com/..."}

How Much Does It Cost?

The total cost includes THREE separate components billed by different services:

  1. Actor Fees: This Actor uses pay-per-page pricing:
  • Actor start: $0.02 per run (one-time)
  • Page processed: $0.0015 per page (base price)
  1. Apify Scraper Costs: The actor uses Apify scrapers to extract content. You pay Apify separately for:
  • Scraper compute time (varies by scraper and site complexity)
  • Typical cost: $0.001-0.01 per page (depends on scraper_budget setting)
  1. Gemini API Costs: Google charges for File Search usage as follows:
  • One-time indexing costs:

+ Embeddings: $0.15 per 1M

By leveraging the power of Google's AI-powered Gemini platform, you can unlock the potential of your data and create a queryable knowledge base that drives insights and decision-making.