# Flat Data Files

LocalData MCP treats flat files as queryable SQL tables. Connect to a file, then query it with standard SQL using `execute_query`. The connection remains active until you disconnect or the session ends.

## Supported formats

### CSV and TSV

Delimiters are detected automatically. The file contents become a single SQL table named `data`.

```python
connect_database("sales", "csv", "./sales_data.csv")
execute_query("sales", "SELECT region, SUM(amount) FROM data GROUP BY region")
```

TSV files use the same connection type:

```python
connect_database("logs", "csv", "./access_log.tsv")
execute_query("logs", "SELECT status_code, COUNT(*) FROM data GROUP BY status_code")
```

For files larger than 100 MB, LocalData MCP automatically loads the data into a temporary SQLite database on disk. Queries behave identically — the storage strategy is transparent.

### Excel (.xlsx, .xls)

Each worksheet becomes a separate SQL table. Sheet names are sanitized: spaces become underscores, and names starting with a digit get an underscore prefix. A sheet named `Q1 Results` becomes the table `Q1_Results`; a sheet named `2024 Data` becomes `_2024_Data`.

```python
connect_database("report", "xlsx", "./annual_report.xlsx")
describe_database("report")   # lists all sheets as tables
execute_query("report", "SELECT * FROM Q1_Results LIMIT 20")
```

To connect to a single sheet, append `?sheet=SheetName` to the path:

```python
connect_database("q1", "xlsx", "./annual_report.xlsx?sheet=Q1 Results")
execute_query("q1", "SELECT * FROM data")
```

When a single sheet is selected this way, the table is named `data` rather than the sheet name.

### LibreOffice Calc (.ods)

Behavior is identical to Excel. Each sheet becomes a table with the same name sanitization rules.

```python
connect_database("budget", "ods", "./budget.ods")
execute_query("budget", "SELECT category, SUM(amount) FROM Expenses GROUP BY category")
```

### Apple Numbers (.numbers)

Behavior is identical to Excel and ODS. Each sheet becomes a table.

```python
connect_database("plan", "numbers", "./project_plan.numbers")
describe_database("plan")
execute_query("plan", "SELECT milestone, due_date FROM Timeline WHERE status = 'pending'")
```

### Parquet, Feather, and Arrow

These binary columnar formats are read directly and queried with SQL. They are well-suited to large analytical datasets because column pruning reduces memory usage when only a subset of columns are selected.

```python
connect_database("events", "parquet", "./events.parquet")
execute_query("events", "SELECT event_type, COUNT(*) FROM data GROUP BY event_type ORDER BY 2 DESC")
```

```python
connect_database("features", "feather", "./model_features.feather")
execute_query("features", "SELECT * FROM data WHERE label = 1 LIMIT 1000")
```

Arrow IPC files use the same pattern with type `"arrow"`.

### HDF5

HDF5 files store datasets in a hierarchical group structure. Each top-level dataset is exposed as a SQL table.

```python
connect_database("experiment", "hdf5", "./results.h5")
describe_database("experiment")   # shows available datasets
execute_query("experiment", "SELECT * FROM measurements WHERE temperature > 37.5")
```

## Available tools

All flat file formats share the same tool set.

| Tool | Description |
|---|---|
| `connect_database(name, type, path)` | Open a file and register it under `name` |
| `disconnect_database(name)` | Close the connection and release resources |
| `list_databases()` | Show all active connections and their types |
| `describe_database(name)` | List tables (sheets/datasets) and row counts |
| `describe_table(name, table)` | Show column names and types for a table |
| `find_table(name, pattern)` | Search table names by substring or pattern |
| `execute_query(name, sql)` | Run a SQL SELECT statement |
| `next_chunk(name, buffer_id)` | Retrieve the next chunk from a buffered result |

## Large result handling

When a query returns more than 100 rows, LocalData MCP returns the first chunk immediately and buffers the remainder. The response includes a `buffer_id` and a `has_more` flag.

```python
result = execute_query("sales", "SELECT * FROM data")
# result["has_more"] == True
# result["buffer_id"] == "buf_abc123"
# result["rows"] contains the first chunk

while result.get("has_more"):
    result = next_chunk("sales", result["buffer_id"])
    # process result["rows"]
```

Call `next_chunk` repeatedly until `has_more` is `False`. Buffers are released automatically when exhausted. If you stop consuming a buffer early, call `disconnect_database` to release the resources, or the buffer will expire after the session ends.

## Notes

- File access is restricted to the current working directory and its subdirectories. Paths containing `../` are rejected.
- SQL queries are read-only. `INSERT`, `UPDATE`, `DELETE`, and `DROP` are not permitted.
- Column names are taken directly from the file header row. Names with special characters may need quoting in SQL: `SELECT "unit price" FROM data`.
- For CSV files without a header row, columns are named `col0`, `col1`, and so on.