23 File I/O
This chapter covers file operations in Tactus, from basic file reading and writing to volume mounting and data format operations.
23.1 In-Sandbox File Operations
Tactus procedures run inside Docker containers by default. The container filesystem is ephemeral—files created during execution are destroyed when the container exits, unless they’re written to mounted volumes.
23.1.1 Basic File Operations
The File module provides core file operations:
-- Read a text file
local content = File.read("/workspace/data.txt")
-- Write a text file
File.write("/workspace/output.txt", "Hello, World!")
-- Check if a file exists
if File.exists("/workspace/config.json") then
-- file exists
end
-- List files in a directory
local files = File.list("/workspace")
for _, file in ipairs(files) do
print(file)
endPath conventions: - /workspace - The current directory (mounted by default) - Absolute paths required - relative paths not supported inside procedures - Use forward slashes (/) even on Windows
23.1.2 JSON Operations
The Json module handles JSON encoding and decoding:
-- Parse JSON string
local data = Json.decode('{"name": "Alice", "age": 30}')
print(data.name) -- "Alice"
-- Encode Lua table to JSON
local json_str = Json.encode({
name = "Bob",
age = 25,
active = true
})
-- Read and parse JSON file
local json_content = File.read("/workspace/data.json")
local data = Json.decode(json_content)
-- Write data as JSON file
local output = Json.encode(results)
File.write("/workspace/output.json", output)23.1.3 Filesystem Helpers
The tactus.io.fs module provides additional filesystem utilities:
local fs = require("tactus.io.fs")
-- Create directory
fs.mkdir("/workspace/reports")
-- Remove file
fs.remove("/workspace/temp.txt")
-- Check if path is a directory
if fs.isdir("/workspace/data") then
-- it's a directory
end
-- Get file size
local size = fs.size("/workspace/large_file.dat")23.2 Volume Mounting and Filesystem Boundaries
Understanding how files get into and out of the container is crucial for working with procedures that need persistent storage or access to external data.
23.2.1 Default Mount Behavior
By default, Tactus mounts your current directory to /workspace:rw, making it easy for procedures to: - Read source code and configuration files - Write outputs and reports - Work with project data naturally
Example - Reading project files:
-- Procedure running in /Users/alice/my-project
-- Current directory automatically mounted to /workspace
-- Read a config file from the project
local config = File.read("/workspace/config.json")
-- Write results back to the project
File.write("/workspace/results.csv", output_csv)This is safe because: - Container isolation: Procedure can only access the mounted project directory, not your entire filesystem - Git version control: All changes are tracked and easily reviewed with git diff - Project scope: Only the current project is exposed, not home directory or system files
23.2.2 Path Resolution
Volume mount paths in sidecar configuration files are resolved relative to the procedure’s directory:
# /Users/alice/my-project/analyze.tac.yml
sandbox:
volumes:
- "../data:/workspace/data:ro" # Resolves to /Users/alice/data
- "./output:/workspace/output:rw" # Resolves to /Users/alice/my-project/output
- "~/shared:/shared:ro" # Expands to /Users/alice/shared
- "/abs/path:/data:ro" # Absolute path used as-is23.2.3 Additional Volume Mounts
Mount other directories via sidecar configuration:
# procedure.tac.yml
sandbox:
volumes:
- "../external-repo:/workspace/external:ro" # Sibling repository
- "/data/shared:/data:ro" # Shared data directory
- "./reports:/workspace/reports:rw" # Output directoryFrom Lua:
-- Access files from mounted volumes
local external_data = File.read("/workspace/external/data.csv")
local shared_config = File.read("/data/config.json")
-- Write to output directory
File.write("/workspace/reports/summary.txt", summary)23.2.4 Read-Only vs Read-Write Mounts
Control write permissions with volume modes:
Read-only (:ro) - Safer when you don’t need writes:
sandbox:
volumes:
- "/sensitive/data:/data:ro" # Cannot modify source data
- "../reference:/reference:ro" # Cannot modify reference materialsRead-write (:rw) - When you need to modify files:
sandbox:
volumes:
- "./cache:/cache:rw" # Can update cache
- "./output:/workspace/output:rw" # Can write outputsDefault: If you omit the mode, :rw is assumed.
23.2.5 Common Volume Mounting Patterns
Pattern 1: Multi-Repository Access
# Access multiple repositories for analysis
sandbox:
volumes:
- "../tactus:/workspace/tactus:ro"
- "../tactus-examples:/workspace/examples:ro"
- "./analysis-output:/workspace/output:rw"Use case: Cross-repository analysis, documentation generation, dependency scanning.
Pattern 2: Persistent Outputs
# Keep outputs separate from source
sandbox:
volumes:
- "./data:/workspace/data:ro" # Input data (read-only)
- "./output:/workspace/output:rw" # Output directory (read-write)Use case: Data processing pipelines, report generation, build artifacts.
Pattern 3: Shared Data Directories
# Access team-wide data
sandbox:
volumes:
- "/data/team-datasets:/datasets:ro" # Shared team data
- "~/local-cache:/cache:rw" # Personal cacheUse case: ML training, data science workflows, shared reference data.
Pattern 4: Read-Only Project + Write-Only Output
# Principle of least privilege
sandbox:
mount_current_dir: false # Disable default RW mount
volumes:
- ".:/workspace:ro" # Read-only project access
- "./output:/workspace/output:rw" # Write-only outputUse case: Production deployments, untrusted procedures, compliance requirements.
23.2.6 Disabling the Default Mount
For procedures that should have limited filesystem access:
# procedure.tac.yml
sandbox:
mount_current_dir: false # Disable automatic current directory mount
volumes:
- "./output:/workspace/output:rw" # Only mount what's neededWhen to disable: - Running untrusted procedures from unknown sources - Output-only workflows that don’t need source access - Production deployments with strict permission requirements - Multi-tenant systems where procedures share a runtime
23.2.7 Cross-Platform Path Considerations
Path separators: - Always use forward slashes (/) in Lua code - Docker handles path translation automatically - Works consistently across Windows, macOS, and Linux
Windows host paths:
# On Windows, use forward slashes or escaped backslashes
sandbox:
volumes:
- "C:/data:/data:ro" # Preferred (forward slashes)
- "C:\\data:/data:ro" # Also works (escaped backslashes)macOS specific:
sandbox:
volumes:
- "~/Library/Application Support/MyApp:/config:ro" # Spaces in paths OKLinux specific:
sandbox:
volumes:
- "/mnt/nas/shared:/data:ro" # Network mounts
- "/home/$USER/data:/data:ro" # Env vars NOT expanded (use absolute paths)23.3 Data Format Operations
23.3.1 CSV/TSV Operations
The Csv module provides CSV and TSV parsing:
-- Read CSV file
local csv_data = Csv.read("/workspace/data.csv")
-- Returns: {headers = {"col1", "col2"}, rows = {{...}, {...}}}
-- Access data
for _, row in ipairs(csv_data.rows) do
print(row.col1, row.col2)
end
-- Write CSV file
Csv.write("/workspace/output.csv", {
headers = {"name", "score"},
rows = {
{name = "Alice", score = 95},
{name = "Bob", score = 87}
}
})
-- TSV (tab-separated) works the same way
local tsv_data = Csv.read("/workspace/data.tsv", {delimiter = "\t"})23.3.2 JSON Files
Already covered above - see “JSON Operations” section.
23.3.3 Parquet Files
Use Python modules via MCP tools or host-side tools for Parquet:
# procedure.tac.yml
mcp_servers:
parquet:
command: "python"
args: ["-m", "tactus_parquet_tool"]-- Read Parquet file via tool
local result = call_tool("parquet.read", {
path = "/workspace/data.parquet"
})23.3.4 HDF5 Files
Use Python modules via MCP tools for HDF5:
# procedure.tac.yml
mcp_servers:
hdf5:
command: "python"
args: ["-m", "tactus_hdf5_tool"]-- Read HDF5 dataset via tool
local result = call_tool("hdf5.read", {
path = "/workspace/data.h5",
dataset = "/data/measurements"
})23.3.5 Excel Files
Use Python modules via MCP tools for Excel:
# procedure.tac.yml
mcp_servers:
excel:
command: "python"
args: ["-m", "tactus_excel_tool"]-- Read Excel sheet via tool
local result = call_tool("excel.read", {
path = "/workspace/report.xlsx",
sheet = "Summary"
})23.4 Security and Sandboxing
23.4.1 Trust Boundary of Sidecar Files
Important: Sidecar YAML files (.tac.yml) are NOT sandboxed like .tac procedure files.
Trust model: - .tac files: Sandboxed Lua code - safe for user contributions, AI generation, public sharing - .yml files: Trusted configuration - can mount arbitrary paths, configure network, reference Docker images
Best practice: If accepting user-contributed procedures, accept only .tac files. Review .yml configurations carefully before use.
23.4.2 When to Use Read-Only Mounts
Use :ro (read-only) mounts when: - Accessing reference data that shouldn’t be modified - Reading configuration files - Mounting external repositories for analysis - Accessing shared team datasets - Compliance requires immutable inputs
Example - Preventing accidental modifications:
sandbox:
volumes:
- "/data/production-db-export:/data:ro" # Cannot accidentally modify prod data
- "./analysis:/workspace/analysis:rw" # Can write analysis results23.4.3 Path Traversal Prevention
Docker automatically prevents path traversal attacks:
-- This CANNOT escape the container to access host files outside mounts
File.read("/../../../etc/passwd") -- BLOCKED by container isolation
-- Only mounted paths are accessible
File.read("/workspace/data.txt") -- OK (if current dir mounted)
File.read("/data/file.csv") -- OK (if /data mounted in config)
File.read("/etc/passwd") -- BLOCKED (not mounted)Container isolation guarantees: - Procedures can only access explicitly mounted volumes - Cannot traverse outside mounted directories - Cannot access host filesystem beyond mounts - Cannot access other containers’ filesystems
23.4.4 Security Checklist
For development: - ✅ Default current directory mount is fine with Git version control - ✅ Review changes with git diff before committing - ✅ Use :ro for reference data when possible
For production: - ✅ Consider disabling mount_current_dir for untrusted procedures - ✅ Use explicit volume mounts with minimal necessary permissions - ✅ Use :ro for all input data - ✅ Limit :rw mounts to specific output directories - ✅ Review all .tac.yml sidecar files before deployment - ✅ Never commit secrets in sidecar files (use host-side config instead)
For multi-tenant systems: - ✅ Disable mount_current_dir by default - ✅ Use per-tenant volume isolation - ✅ Implement volume quota limits - ✅ Audit all filesystem access - ✅ Consider read-only mounts for shared resources
23.5 Examples
23.5.1 Example 1: Simple Report Generation
Generate a report from project data:
-- read_and_report.tac
Procedure {
function()
-- Read input data
local data_json = File.read("/workspace/sales_data.json")
local data = Json.decode(data_json)
-- Process data
local total = 0
for _, sale in ipairs(data.sales) do
total = total + sale.amount
end
-- Generate report
local report = string.format(
"Sales Report\n" ..
"Total Sales: $%.2f\n" ..
"Number of Transactions: %d\n",
total, #data.sales
)
-- Write report to project directory
File.write("/workspace/sales_report.txt", report)
return {status = "success", total = total}
end
}No sidecar file needed - uses default current directory mount.
23.5.2 Example 2: Cross-Repository Analysis
Analyze multiple repositories:
# analyze_repos.tac.yml
sandbox:
volumes:
- "../repo1:/workspace/repo1:ro"
- "../repo2:/workspace/repo2:ro"
- "./analysis_output:/workspace/output:rw"-- analyze_repos.tac
Procedure {
function()
-- Read files from multiple repos
local repo1_readme = File.read("/workspace/repo1/README.md")
local repo2_readme = File.read("/workspace/repo2/README.md")
-- Analyze (simplified example)
local analysis = {
repo1_lines = #repo1_readme,
repo2_lines = #repo2_readme
}
-- Write results
local output_json = Json.encode(analysis)
File.write("/workspace/output/analysis.json", output_json)
return analysis
end
}23.5.3 Example 3: Secure Data Processing
Process sensitive data with read-only inputs:
# process_secure.tac.yml
sandbox:
mount_current_dir: false # Don't mount current directory
volumes:
- "/data/sensitive:/data:ro" # Read-only sensitive data
- "./processed:/workspace/output:rw" # Write-only output-- process_secure.tac
Procedure {
function()
-- Can read from /data (read-only)
local input = File.read("/data/patient_records.csv")
-- Process and anonymize
local anonymized = anonymize_data(input)
-- Can write to /workspace/output (read-write)
File.write("/workspace/output/anonymized.csv", anonymized)
-- CANNOT modify source data
-- File.write("/data/patient_records.csv", "hacked") -- BLOCKED (read-only)
-- CANNOT write to current directory
-- File.write("/workspace/malicious.txt", "data") -- BLOCKED (not mounted)
return {status = "success"}
end
}23.6 Summary
Key takeaways: - Current directory is mounted to /workspace:rw by default for convenience - Container isolation provides security even with default mount - Additional volumes configured via .tac.yml sidecar files - Use :ro for inputs, :rw only where needed - Disable mount_current_dir for untrusted procedures or stricter security - Path traversal automatically prevented by container isolation - Sidecar .yml files are trusted configuration, not sandboxed code
Best practices: - Keep inputs read-only when possible - Use Git to review filesystem changes before committing - Separate input and output directories in production - Review all sidecar configurations in untrusted scenarios - Test volume mounts with simple procedures before complex workflows