# LocalData MCP Data Sources Reference Complete reference for connecting to all supported data sources with connection examples and best practices. ## Table of Contents 1. SQL Databases (6) 2. NoSQL Databases (6) 3. File Formats (8) 4. Spreadsheets (3) 5. Analytical Formats (4) 6. Graph Formats (4) 7. RDF & Semantic Web (3) 8. SPARQL Endpoints (1) 9. Authentication & Security 10. Connection Best Practices --- ## SQL Databases (6) ### SQLite File-based SQL database, ideal for local development and embedded use. **Connection String Format:** ``` sqlite:///path/to/database.db ``` **MCP Tool Call:** ```python connect_database("local_db", "sqlite", "sqlite:///data/mydata.db") ``` **Examples:** ```python # In-memory database (for testing) connect_database("test", "sqlite", "sqlite:///:memory:") # Local file connect_database("myapp", "sqlite", "sqlite:////var/data/app.db") # Relative path connect_database("project", "sqlite", "sqlite:///./data.db") ``` **Special Notes:** - Supports full SQL: SELECT, INSERT, UPDATE, DELETE - File locking may occur with concurrent access - Best for < 1GB databases - No authentication required --- ### PostgreSQL Enterprise-grade relational database with advanced features. **Connection String Format:** ``` postgresql://user:password@host:port/database postgresql+psycopg2://user:password@host:port/database ``` **MCP Tool Call:** ```python connect_database("prod_db", "postgresql", "postgresql://user:password@localhost:5432/production") ``` **Examples:** ```python # Standard connection connect_database("analytics", "postgresql", "postgresql://analyst:secret@db.company.com:5432/analytics") # With SSL (recommended for remote) connect_database("secure", "postgresql", "postgresql://user:pass@remote.db:5432/db?sslmode=require") # Connection pooling connect_database("pool", "postgresql", "postgresql://user:pass@localhost/db?poolclass=StaticPool&pool_size=10") ``` **Authentication:** ```python # Standard username/password (in connection string) connect_database("mydb", "postgresql", "postgresql://user:password@host/db") # Environment variables # Set: PGUSER, PGPASSWORD, PGHOST, PGPORT, PGDATABASE ``` **Special Notes:** - Full SQL support with advanced features (CTEs, window functions, JSON operators) - Excellent performance for analytical workloads - Supports array types, JSON/JSONB, full-text search - SSL/TLS encryption available --- ### MySQL Widely-used open-source relational database. **Connection String Format:** ``` mysql+pymysql://user:password@host:port/database mysql+mysqldb://user:password@host:port/database ``` **MCP Tool Call:** ```python connect_database("app_db", "mysql", "mysql+pymysql://user:password@localhost:3306/appdb") ``` **Examples:** ```python # Basic connection connect_database("sales", "mysql", "mysql+pymysql://user:pass@localhost:3306/sales_db") # Remote with SSL connect_database("remote", "mysql", "mysql+pymysql://user:pass@remote.host:3306/db?ssl_ca=/path/to/ca.pem") # MariaDB (MySQL-compatible) connect_database("mariadb_server", "mysql", "mysql+pymysql://user:pass@mariadb.server:3306/mydb") ``` **Special Notes:** - Compatible with MySQL and MariaDB - Good for web applications - Text search and JSON support available - Connection timeout defaults to 30 seconds --- ### DuckDB Fast analytical SQL engine, great for interactive analysis. **Connection String Format:** ``` duckdb:///path/to/database.db duckdb:///:memory: ``` **MCP Tool Call:** ```python connect_database("analytics", "duckdb", "duckdb:///data/analytics.db") ``` **Examples:** ```python # In-memory for fast analysis connect_database("temp", "duckdb", "duckdb:///:memory:") # Persistent file connect_database("warehouse", "duckdb", "duckdb:////data/warehouse.db") # Read Parquet directly connect_database("parquet_src", "duckdb", "duckdb:///data/") ``` **Special Notes:** - Optimized for OLAP (analytical) queries - Excellent performance on compressed data - Direct Parquet, CSV, JSON file support - Can query files without loading into database - No authentication required --- ### Oracle Database Enterprise database system with advanced features. **Connection String Format:** ``` oracle+cx_oracle://user:password@host:port/service_name oracle+oracledb://user:password@host:port/service_name ``` **MCP Tool Call:** ```python connect_database("enterprise", "oracle", "oracle+cx_oracle://user:password@oracle.server:1521/ORCL") ``` **Examples:** ```python # Basic connection connect_database("prod", "oracle", "oracle+cx_oracle://admin:secret@orahost:1521/PROD") # TNS alias (from tnsnames.ora) connect_database("tns_db", "oracle", "oracle+cx_oracle://user:password@") # Wallet authentication (PKI) connect_database("secure", "oracle", "oracle+cx_oracle://user@oracle.server:1521/ORCL", auth='{"method": "wallet", "wallet_path": "/opt/oracle/wallet"}') ``` **Authentication:** ```python # Username/password (in connection string) connect_database("db", "oracle", "oracle+cx_oracle://user:pass@host:1521/SERVICE") # Kerberos (requires LDAP configured) connect_database("kerberos_db", "oracle", "oracle+cx_oracle://host:1521/SERVICE", auth='{"method": "kerberos"}') # Wallet (PKI certificates) connect_database("wallet_db", "oracle", "oracle+cx_oracle://user@host:1521/SERVICE", auth='{"method": "wallet", "wallet_path": "/opt/oracle/wallet", "wallet_password": "secret"}') ``` **Special Notes:** - Enterprise features: partitioning, compression, advanced security - Requires Oracle client libraries installed - Large result sets benefit from memory-aware streaming - Wallet authentication ideal for production --- ### MS SQL Server Microsoft's enterprise database platform. **Connection String Format:** ``` mssql+pyodbc://user:password@host:port/database?driver=ODBC+Driver+17+for+SQL+Server mssql+pymssql://user:password@host:port/database ``` **MCP Tool Call:** ```python connect_database("mssql_db", "mssql", "mssql+pyodbc://user:password@sqlserver:1433/mydb?driver=ODBC+Driver+17+for+SQL+Server") ``` **Examples:** ```python # With ODBC connect_database("reporting", "mssql", "mssql+pyodbc://reporter:pass@mssql.company.com:1433/Reports?driver=ODBC+Driver+17+for+SQL+Server") # Azure SQL Database connect_database("azure", "mssql", "mssql+pyodbc://user@server:pass@server.database.windows.net:1433/db?driver=ODBC+Driver+17+for+SQL+Server") # Named instance connect_database("instance", "mssql", "mssql+pyodbc://user:pass@host\\INSTANCE:1433/db?driver=ODBC+Driver+17+for+SQL+Server") ``` **Special Notes:** - Windows and Azure SQL variants supported - Requires ODBC driver installation - T-SQL extensions available - Active Directory authentication supported on Windows --- ## NoSQL Databases (6) ### MongoDB Document-oriented NoSQL database. **Connection String Format:** ``` mongodb://:@:/ mongodb+srv://:@.mongodb.net/ ``` **MCP Tool Call:** ```python connect_database("mongo_db", "mongodb", "mongodb://:@localhost:27017/mydb") ``` **Examples:** ```python # Local instance (no auth) connect_database("local_mongo", "mongodb", "mongodb://localhost:27017/test") # MongoDB Atlas (cloud) connect_database("atlas", "mongodb", "mongodb+srv://:@.mongodb.net/dbname") # With authentication connect_database("auth_db", "mongodb", "mongodb://:@mongo.example.com:27017/mydb?authSource=admin") ``` **Special Notes:** - Flexible schema with BSON documents - Collections accessible as tables through LocalData - Complex nested documents automatically normalized - Supports aggregation pipeline queries --- ### Redis In-memory data store for caching and sessions. **Connection String Format:** ``` redis://[:password@]host:port/db rediss://[:password@]host:port/db ``` **MCP Tool Call:** ```python connect_database("cache", "redis", "redis://localhost:6379/0") ``` **Examples:** ```python # Default instance connect_database("cache", "redis", "redis://localhost:6379") # With authentication connect_database("secure_cache", "redis", "redis://:mypassword@redis.company.com:6379/0") # SSL/TLS connect_database("ssl_redis", "redis", "rediss://user:password@secure.redis.io:6380/0") ``` **Special Notes:** - Key-value store, not relational - Excellent for caching and real-time analytics - TTL support for expiring keys - Pub/sub messaging available --- ### Elasticsearch Search and analytics engine for full-text search and logging. **Connection String Format:** ``` elasticsearch://host:port elasticsearch+https://user:password@host:port ``` **MCP Tool Call:** ```python connect_database("search_index", "elasticsearch", "elasticsearch://localhost:9200") ``` **Examples:** ```python # Local cluster connect_database("local_es", "elasticsearch", "elasticsearch://localhost:9200") # With authentication connect_database("cloud_es", "elasticsearch", "elasticsearch+https://user:password@elastic.company.com:9200") # Multiple nodes (load balanced) connect_database("cluster", "elasticsearch", "elasticsearch://node1:9200,node2:9200,node3:9200") ``` **Special Notes:** - Documents organized in indices (like tables) - Full-text search with ranking and scoring - Aggregations for analytics (facets, histograms, etc.) - Real-time indexing available --- ### InfluxDB Time-series database optimized for metrics and events. **Connection String Format:** ``` influxdb://host:port/database influxdb://user:password@host:port/database ``` **MCP Tool Call:** ```python connect_database("metrics", "influxdb", "influxdb://localhost:8086/mymetrics") ``` **Examples:** ```python # InfluxDB 1.x connect_database("tsdb", "influxdb", "influxdb://user:password@influx.server:8086/metrics") # Local development connect_database("local_ts", "influxdb", "influxdb://localhost:8086/test") # InfluxDB Cloud connect_database("cloud", "influxdb", "influxdb://token@us-west-2-1.aws.cloud2.influxdata.com:443/org") ``` **Special Notes:** - Schema-less time series storage - Tags (indexed) and fields (not indexed) - Excellent for monitoring and sensor data - Retention policies for data aging --- ### Neo4j Graph database for connected data. **Connection String Format:** ``` neo4j://user:password@host:port neo4j+s://user:password@host:port neo4j+ssc://user:password@host:port ``` **MCP Tool Call:** ```python connect_database("graph_db", "neo4j", "neo4j://user:password@localhost:7687") ``` **Examples:** ```python # Local instance connect_database("local_graph", "neo4j", "neo4j://user:password@localhost:7687") # Aura (cloud) connect_database("aura", "neo4j", "neo4j+s://user:password@abcd1234.databases.neo4j.io:7687") # Cluster connect_database("cluster", "neo4j", "neo4j://user:password@neo4j-lb.company.com:7687") ``` **Special Notes:** - Property graph with nodes, relationships, properties - Cypher query language for graph traversal - Excellent for social networks, recommendations, hierarchies - ACID transactions supported --- ### CouchDB Document store with HTTP API and replication. **Connection String Format:** ``` couchdb://user:password@host:port ``` **MCP Tool Call:** ```python connect_database("doc_store", "couchdb", "couchdb://user:password@localhost:5984") ``` **Examples:** ```python # Local Couch connect_database("local", "couchdb", "couchdb://admin:password@localhost:5984") # Cloudant (IBM's hosted Couch) connect_database("cloudant", "couchdb", "couchdb://user:password@account.cloudant.com") ``` **Special Notes:** - JSON documents with flexible schema - Built-in replication and sync - View-based indexing with MapReduce - REST API for HTTP access --- ## File Formats (8) ### CSV (Comma-Separated Values) Tabular text format, widely supported and human-readable. **Connection String:** Path to CSV file **MCP Tool Call:** ```python connect_database("data", "csv", "/path/to/file.csv") ``` **Examples:** ```python # Local file connect_database("sales", "csv", "/data/sales_2024.csv") # With options (in URL) connect_database("quoted", "csv", "/data/file.csv?delimiter=;") # Remote HTTP connect_database("remote", "csv", "https://example.com/data.csv") ``` **Special Notes:** - Auto-detects delimiter (comma, semicolon, tab) - Header row inferred from first line - No type inference; all columns start as text - Large files automatically streamed --- ### TSV (Tab-Separated Values) Tab-delimited variant of CSV, common in bioinformatics. **Connection String:** Path to TSV file **MCP Tool Call:** ```python connect_database("genes", "tsv", "/path/to/data.tsv") ``` **Examples:** ```python # Bioinformatics data connect_database("variants", "tsv", "/genomics/variants.tsv") # Standard TSV connect_database("tab_data", "tsv", "/data/file.txt") ``` **Special Notes:** - Tab delimiter enforced - Same behavior as CSV otherwise - Common in Unix text processing --- ### JSON (JavaScript Object Notation) Hierarchical text format, ideal for nested data. **Connection String:** Path to JSON file **MCP Tool Call:** ```python connect_database("config", "json", "/path/to/file.json") ``` **Examples:** ```python # Object with nested structure connect_database("nested", "json", "/data/config.json") # Array of objects connect_database("records", "json", "/data/items.json") # Single large object (normalized to rows) connect_database("hierarchy", "json", "/data/taxonomy.json") ``` **Special Notes:** - Supports hierarchical navigation with tree tools - Deep nesting automatically normalized to rows - Good for configuration files and API responses - Unicode and special characters preserved --- ### JSONL (JSON Lines) One JSON object per line, ideal for streaming and large files. **Connection String:** Path to JSONL file **MCP Tool Call:** ```python connect_database("logs", "jsonl", "/path/to/file.jsonl") ``` **Examples:** ```python # Log file with JSON events connect_database("events", "jsonl", "/logs/application.jsonl") # Newline-delimited objects connect_database("stream", "jsonl", "/data/stream.ndjson") ``` **Special Notes:** - One complete JSON object per line - Excellent for append-only logs - Efficient streaming and chunking - Each line parsed independently --- ### YAML (YAML Ain't Markup Language) Human-friendly structured data format. **Connection String:** Path to YAML file **MCP Tool Call:** ```python connect_database("manifest", "yaml", "/path/to/file.yaml") ``` **Examples:** ```python # Kubernetes manifest connect_database("k8s", "yaml", "/config/deployment.yaml") # Configuration file connect_database("app_config", "yaml", "/etc/config.yml") # Multi-document YAML connect_database("multi", "yaml", "/data/documents.yaml") ``` **Special Notes:** - Supports hierarchical tree navigation - Human-readable with indentation-based structure - Type hints with tags (!!int, !!str, etc.) - Comments preserved as metadata --- ### TOML (Tom's Obvious, Minimal Language) Config file format designed for clarity. **Connection String:** Path to TOML file **MCP Tool Call:** ```python connect_database("cfg", "toml", "/path/to/config.toml") ``` **Examples:** ```python # Application config connect_database("settings", "toml", "/etc/app/settings.toml") # Cargo.toml (Rust projects) connect_database("rust_meta", "toml", "/project/Cargo.toml") ``` **Special Notes:** - Strict schema with clear semantics - Tables and array-of-tables support - Good for application configuration - Comments supported --- ### XML (eXtensible Markup Language) Hierarchical markup language with schema support. **Connection String:** Path to XML file **MCP Tool Call:** ```python connect_database("doc", "xml", "/path/to/file.xml") ``` **Examples:** ```python # SVG document connect_database("diagram", "xml", "/images/drawing.svg") # RSS feed connect_database("feed", "xml", "/feeds/news.xml") # SOAP response connect_database("soap", "xml", "/responses/service.xml") ``` **Special Notes:** - Hierarchical tree navigation available - Attributes and text content both accessible - Schema (XSD) validation available - Namespaces preserved --- ### INI (Initialization File Format) Simple key-value format for configuration. **Connection String:** Path to INI file **MCP Tool Call:** ```python connect_database("ini_cfg", "ini", "/path/to/file.ini") ``` **Examples:** ```python # Windows configuration connect_database("windows_cfg", "ini", "C:\\Program Files\\app\\config.ini") # Unix config connect_database("unix_cfg", "ini", "/etc/application/settings.ini") ``` **Special Notes:** - Sections group related key-value pairs - Comments with semicolon or hash - Simple tree structure - Flat sections (no nesting) --- ## Spreadsheets (3) ### Excel (XLSX, XLS) Spreadsheet format from Microsoft Office. **Connection String:** Path to Excel file **MCP Tool Call:** ```python connect_database("sheet", "excel", "/path/to/file.xlsx") ``` **MCP Tool Call with Sheet Selection:** ```python connect_database("sheet", "excel", "/path/to/file.xlsx", sheet_name="Sales") ``` **Examples:** ```python # All sheets connect_database("workbook", "excel", "/data/report.xlsx") # Specific sheet connect_database("sales_data", "excel", "/data/financial.xlsx", sheet_name="Monthly Sales") # Multiple connections to different sheets connect_database("q1_sales", "excel", "/data/sales.xlsx", sheet_name="Q1") connect_database("q2_sales", "excel", "/data/sales.xlsx", sheet_name="Q2") ``` **Special Notes:** - Auto-detects XLSX (modern) vs XLS (legacy) - Merged cells automatically expanded - Formulas converted to values - Empty rows and columns handled gracefully --- ### ODS (OpenDocument Spreadsheet) Open standard spreadsheet from LibreOffice/OpenOffice. **Connection String:** Path to ODS file **MCP Tool Call:** ```python connect_database("ods_sheet", "ods", "/path/to/file.ods", sheet_name="Data") ``` **Examples:** ```python # Default sheet connect_database("calc", "ods", "/data/spreadsheet.ods") # Named sheet connect_database("results", "ods", "/data/results.ods", sheet_name="Analysis") ``` **Special Notes:** - Open standard format, fully supported - Same behavior as Excel - Good for cross-platform compatibility --- ### Numbers (NUMBERS) Spreadsheet format from Apple iWork. **Connection String:** Path to Numbers file **MCP Tool Call:** ```python connect_database("numbers_sheet", "numbers", "/path/to/file.numbers", sheet_name="Sheet 1") ``` **Examples:** ```python # Default sheet connect_database("apple", "numbers", "/Users/username/Documents/data.numbers") # Named sheet connect_database("monthly", "numbers", "/data/reports.numbers", sheet_name="January") ``` **Special Notes:** - macOS/iOS native format - Supports multiple sheets and tables - Styled cells converted to plain values - Large files automatically streamed --- ## Analytical Formats (4) ### Parquet Columnar storage format optimized for analytical queries. **Connection String:** Path to Parquet file or directory **MCP Tool Call:** ```python connect_database("analytics", "parquet", "/path/to/file.parquet") ``` **Examples:** ```python # Single file connect_database("sales_parquet", "parquet", "/data/sales.parquet") # Directory with multiple Parquet files connect_database("dataset", "parquet", "/warehouse/data/") # Partitioned directory connect_database("partitioned", "parquet", "/data/year=2024/month=01/") ``` **Special Notes:** - Highly compressed columnar format - Excellent for large analytical datasets - Schema preserved (data types, null info) - Partition pruning for efficient querying - Industry standard for data warehouses --- ### Feather Fast serialization format for data frames. **Connection String:** Path to Feather file **MCP Tool Call:** ```python connect_database("feather_data", "feather", "/path/to/file.feather") ``` **Examples:** ```python # Arrow/Feather V2 format connect_database("fast", "feather", "/data/cache.feather") ``` **Special Notes:** - Fast read/write with minimal parsing - Language-agnostic (Python, R, etc.) - Good for intermediate storage in pipelines - Preserves data types exactly --- ### Arrow Apache Arrow columnar in-memory format. **Connection String:** Path to Arrow file **MCP Tool Call:** ```python connect_database("arrow_data", "arrow", "/path/to/file.arrow") ``` **Examples:** ```python # Arrow IPC format connect_database("ipc", "arrow", "/data/buffer.arrow") ``` **Special Notes:** - Industry standard columnar format - Zero-copy data access - GPU acceleration possible - Excellent for high-performance workflows --- ### HDF5 (Hierarchical Data Format) Scientific data format for complex hierarchical data. **Connection String:** Path to HDF5 file **MCP Tool Call:** ```python connect_database("scientific", "hdf5", "/path/to/file.h5") ``` **MCP Tool Call with Dataset Selection:** ```python connect_database("dataset", "hdf5", "/path/to/file.h5", sheet_name="dataset_name") ``` **Examples:** ```python # Scientific data connect_database("genomics", "hdf5", "/data/sequences.h5", sheet_name="DNA") # Multi-dataset HDF5 connect_database("imaging", "hdf5", "/data/microscopy.h5", sheet_name="channel_1") ``` **Special Notes:** - Hierarchical groups and datasets - Supports arrays of arbitrary dimensions - Metadata and attributes preserved - Compression built-in - Common in scientific computing --- ## Graph Formats (4) ### DOT Graph description language from Graphviz. **Connection String:** Path to DOT file **MCP Tool Call:** ```python connect_database("graph", "dot", "/path/to/file.dot") ``` **Examples:** ```python # Directed graph connect_database("digraph", "dot", "/graphs/network.dot") # Undirected graph connect_database("network", "dot", "/graphs/undirected.dot") ``` **Special Notes:** - Human-readable text format - Supports hierarchical clusters - Node and edge attributes - Use `export_graph` to convert to other formats --- ### GML (Graph Modelling Language) Graph format with attributes and metadata. **Connection String:** Path to GML file **MCP Tool Call:** ```python connect_database("gml_graph", "gml", "/path/to/file.gml") ``` **Examples:** ```python # Complex network connect_database("social_network", "gml", "/networks/communities.gml") ``` **Special Notes:** - Structured attribute support - Weighted and directed graphs - Good for social networks - Preserves node/edge metadata --- ### GraphML XML-based graph format with rich metadata. **Connection String:** Path to GraphML file **MCP Tool Call:** ```python connect_database("graphml_data", "graphml", "/path/to/file.graphml") ``` **Examples:** ```python # Network with attributes connect_database("org_chart", "graphml", "/data/organization.graphml") # Yfiles format connect_database("yfiles_export", "graphml", "/graphs/diagram.graphml") ``` **Special Notes:** - XML-based, extensible - Widely supported by graph tools - Rich attribute system - Can include visualization hints --- ### Mermaid ASCII-based diagram format. **Connection String:** Path to Mermaid file **MCP Tool Call:** ```python connect_database("diagram", "mermaid", "/path/to/file.mmd") ``` **Examples:** ```python # Flowchart connect_database("flow", "mermaid", "/diagrams/workflow.mmd") # Sequence diagram connect_database("sequence", "mermaid", "/diagrams/interaction.mmd") ``` **Special Notes:** - Human-readable ASCII diagrams - Supports flowcharts, sequences, gantt, state diagrams - Markdown-compatible - Good for documentation --- ## RDF & Semantic Web (3) ### Turtle (TTL) Terse RDF Triple Language, human-readable format. **Connection String:** Path to Turtle file **MCP Tool Call:** ```python connect_database("rdf_data", "turtle", "/path/to/file.ttl") ``` **Examples:** ```python # Ontology or knowledge graph connect_database("ontology", "turtle", "/rdf/dbpedia.ttl") # Linked data connect_database("linked_data", "turtle", "/data/schema.ttl") ``` **Special Notes:** - Subject-Predicate-Object triples - Namespace prefixes for IRIs - Literal values with language tags and types - Equivalent to RDF/XML but more readable --- ### N-Triples Line-based RDF format, one triple per line. **Connection String:** Path to N-Triples file **MCP Tool Call:** ```python connect_database("ntriples_data", "ntriples", "/path/to/file.nt") ``` **Examples:** ```python # Streaming RDF connect_database("stream", "ntriples", "/data/large.nt") # SPARQL query results connect_database("results", "ntriples", "/sparql/results.nt") ``` **Special Notes:** - Simpler subset of Turtle - One triple (S, P, O) per line - Easier to parse and stream - Good for large RDF datasets --- ### SPARQL Query language and endpoint for RDF data. **Connection String:** SPARQL endpoint URL **MCP Tool Call:** ```python connect_database("sparql_endpoint", "sparql", "http://dbpedia.org/sparql") ``` **Examples:** ```python # DBpedia connect_database("dbpedia", "sparql", "http://dbpedia.org/sparql") # Wikidata connect_database("wikidata", "sparql", "https://query.wikidata.org/sparql") # Local Fuseki server connect_database("local_sparql", "sparql", "http://localhost:3030/ds/sparql") ``` **Special Notes:** - Remote query interface to RDF data - SPARQL query language (similar to SQL) - Supports CONSTRUCT, SELECT, ASK, DESCRIBE queries - Service federation available - Public endpoints usually rate-limited --- ## Authentication & Security ### Basic Patterns **Username/Password (in connection string):** ```python # PostgreSQL connect_database("db", "postgresql", "postgresql://user:password@host/db") # MySQL connect_database("db", "mysql", "mysql+pymysql://user:password@host/db") # MongoDB connect_database("db", "mongodb", "mongodb://user:password@host/db") ``` **Environment Variables:** ```python # Set environment variables first # export PGUSER=user PGPASSWORD=secret PGHOST=localhost # Then connection string uses environment connect_database("db", "postgresql", "postgresql://localhost/db") ``` **OAuth 2.0 / OAuth Credentials:** ```python # Not directly supported in connection string # Use service account JSON key for cloud services ``` ### Advanced Authentication **Oracle Wallet (PKI):** ```python connect_database( "prod", "oracle", "oracle+cx_oracle://user@host:1521/SERVICE", auth='{"method": "wallet", "wallet_path": "/opt/oracle/wallet", "wallet_password": "secret"}' ) ``` **Kerberos (Enterprise AD):** ```python connect_database( "enterprise", "oracle", "oracle+cx_oracle://host:1521/SERVICE", auth='{"method": "kerberos"}' ) ``` **AWS IAM (RDS):** ```python # AWS RDS with IAM auth token connect_database( "rds", "postgresql", "postgresql://user@rds-instance.abc123.us-east-1.rds.amazonaws.com/db?sslmode=require" ) ``` **Azure AD (SQL Server):** ```python # Uses Windows authentication or service principal connect_database( "azure", "mssql", "mssql+pyodbc://SERVER.database.windows.net/DATABASE?driver=ODBC+Driver+17+for+SQL+Server&Authentication=ActiveDirectoryIntegrated" ) ``` --- ## Connection Best Practices ### 1. Connection Naming Use descriptive, consistent names: ```python # Good: describes purpose and environment connect_database("prod_analytics", "postgresql", ...) connect_database("staging_warehouse", "duckdb", ...) connect_database("cache_layer", "redis", ...) # Avoid: ambiguous or generic connect_database("db1", "postgresql", ...) connect_database("temp", "csv", ...) ``` ### 2. Credential Management Never hardcode credentials: ```python # Bad: credentials in code connect_database("mydb", "postgresql", "postgresql://user:password@host/db") # Good: use environment variables import os user = os.getenv("DB_USER") password = os.getenv("DB_PASSWORD") host = os.getenv("DB_HOST") connect_database("mydb", "postgresql", f"postgresql://{user}:{password}@{host}/db") # Good: use config file (not in version control) import json with open("/etc/config.json") as f: config = json.load(f) connect_database("mydb", "postgresql", config["database_url"]) ``` ### 3. Connection Pooling For production workloads: ```python # PostgreSQL with connection pooling connect_database( "prod_pool", "postgresql", "postgresql://user:pass@host/db?poolclass=QueuePool&pool_size=20&max_overflow=40" ) ``` ### 4. SSL/TLS for Remote Connections Always use encryption for network traffic: ```python # PostgreSQL with SSL connect_database( "secure_pg", "postgresql", "postgresql://user:pass@remote.host/db?sslmode=require" ) # MySQL with SSL certificates connect_database( "secure_mysql", "mysql", "mysql+pymysql://user:pass@host/db?ssl_ca=/etc/ssl/certs/ca.pem" ) ``` ### 5. Connection Lifecycle Properly manage connections: ```python # Connect connect_database("mydb", "postgresql", "postgresql://localhost/db") # Use queries execute_query("mydb", "SELECT * FROM users") # Disconnect when done disconnect_database("mydb") ``` ### 6. Large File Streaming For large files, use memory-aware settings: ```python # CSV with chunking connect_database("large_csv", "csv", "/data/huge_file.csv") results = execute_query("large_csv", "SELECT * FROM data") # Then use next_chunk for pagination next_chunk(query_id, 1, "1000") # 1000 rows at a time ``` ### 7. Remote Data Sources For HTTP-accessible files: ```python # Remote CSV connect_database("remote_data", "csv", "https://data.example.com/export.csv") # Remote Parquet dataset connect_database("cloud_data", "parquet", "https://warehouse.example.com/data/") ``` ### 8. Testing Connections Verify connectivity before processing: ```python # Test connection databases = list_databases() # Check if connection exists if any(db["name"] == "mydb" for db in databases): # Use connection describe_database("mydb") ``` ### 9. Error Handling Anticipate and handle connection failures: ```python try: connect_database("flaky_db", "postgresql", "postgresql://unreliable.host/db") execute_query("flaky_db", "SELECT 1") except Exception as e: # Fallback or retry logic print(f"Connection failed: {e}") ``` ### 10. Performance Optimization Match database features to workload: ```python # OLAP analysis: use DuckDB or analytical database connect_database("warehouse", "duckdb", "duckdb:///data/warehouse.db") # Real-time: use PostgreSQL or MongoDB connect_database("app_db", "postgresql", "postgresql://localhost/app") # Caching: use Redis or similar connect_database("cache", "redis", "redis://localhost:6379/0") # Search: use Elasticsearch connect_database("search", "elasticsearch", "elasticsearch://localhost:9200") ``` --- ## Quick Reference Table | Data Source | Type | Best For | Auth Type | |-------------|------|----------|-----------| | SQLite | SQL | Local dev | None | | PostgreSQL | SQL | Analytics | User/pass | | MySQL | SQL | Web apps | User/pass | | DuckDB | SQL | OLAP | None | | Oracle | SQL | Enterprise | Wallet/Kerberos | | MS SQL | SQL | Azure/Windows | AD/User | | MongoDB | NoSQL | Documents | User/pass | | Redis | NoSQL | Caching | Password | | Elasticsearch | Search | Full-text | User/pass | | InfluxDB | Time-series | Metrics | Token | | Neo4j | Graph | Networks | User/pass | | CouchDB | Document | Sync | User/pass | | CSV/TSV | File | Bulk | None | | JSON/JSONL | File | Hierarchical | None | | YAML/TOML | File | Config | None | | XML | File | Markup | None | | Excel/ODS | Sheet | Spreadsheets | None | | Parquet | Analytics | Warehouse | None | | Feather | Analytics | Pipelines | None | | HDF5 | Scientific | Research | None | | DOT/GML/GraphML | Graph | Diagrams | None | | Turtle/N-Triples | RDF | Semantic web | None | | SPARQL | RDF | Linked data | Endpoint | --- **For tool usage examples and workflow composition, see the main tools reference guide.**