Bulk Data Export
OctoFHIR implements the FHIR Bulk Data Access IG for exporting large datasets in NDJSON format. This is useful for analytics, data warehousing, and system migrations.
Overview
Section titled “Overview”Bulk export follows the FHIR Asynchronous Request Pattern:
- Client initiates export with
$exportoperation - Server returns
202 Acceptedwith status URL - Client polls status endpoint until complete
- Client downloads NDJSON files from manifest
Export Levels
Section titled “Export Levels”System Export
Section titled “System Export”Export all resources on the server:
GET /$exportPrefer: respond-asyncAccept: application/fhir+jsonPatient Export
Section titled “Patient Export”Export all patient compartment data:
GET /Patient/$exportPrefer: respond-asyncAccept: application/fhir+jsonGroup Export
Section titled “Group Export”Export data for members of a specific group:
GET /Group/{id}/$exportPrefer: respond-asyncAccept: application/fhir+jsonParameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
_outputFormat | string | Must be application/fhir+ndjson (default) |
_since | instant | Only resources updated since this timestamp |
_type | string | Comma-separated list of resource types to export |
_typeFilter | string | FHIR search queries per type |
Examples
Section titled “Examples”Export only Patient and Observation resources:
GET /$export?_type=Patient,ObservationExport resources modified after a specific date:
GET /$export?_since=2024-01-01T00:00:00ZExport with type-specific filters:
GET /$export?_type=Observation&_typeFilter=Observation?code=8867-4Response Flow
Section titled “Response Flow”1. Initiate Export
Section titled “1. Initiate Export”GET /$export?_type=Patient,ObservationPrefer: respond-asyncAccept: application/fhir+jsonResponse: 202 Accepted
HTTP/1.1 202 AcceptedContent-Location: http://server/fhir/_async-status/550e8400-e29b-41d4-a716-4466554400002. Check Status
Section titled “2. Check Status”Poll the status URL until the export completes:
GET /_async-status/550e8400-e29b-41d4-a716-446655440000Response (In Progress): 202 Accepted
{ "status": "in_progress", "progress": 0.45, "message": "Exporting Observation resources..."}Response (Complete): 200 OK
{ "transactionTime": "2024-01-15T14:30:00Z", "request": "http://server/fhir/$export?_type=Patient,Observation", "requiresAccessToken": true, "output": [ { "type": "Patient", "url": "http://server/fhir/_bulk-files/550e8400.../Patient.ndjson", "count": 1500 }, { "type": "Observation", "url": "http://server/fhir/_bulk-files/550e8400.../Observation.ndjson", "count": 45000 } ], "error": []}3. Download Files
Section titled “3. Download Files”Download each NDJSON file from the manifest:
GET /_bulk-files/550e8400.../Patient.ndjsonAuthorization: Bearer {token}Response:
{"resourceType":"Patient","id":"1","name":[{"family":"Smith"}]}{"resourceType":"Patient","id":"2","name":[{"family":"Jones"}]}{"resourceType":"Patient","id":"3","name":[{"family":"Williams"}]}4. Cancel Export (Optional)
Section titled “4. Cancel Export (Optional)”To cancel an in-progress export:
DELETE /_async-status/550e8400-e29b-41d4-a716-446655440000Response: 204 No Content
Configuration
Section titled “Configuration”Configure bulk export in octofhir.toml:
[bulk_export]# Enable/disable bulk export (default: true)enabled = true
# Directory for export files (default: "./exports")export_path = "./exports"
# Maximum concurrent export jobs (default: 5)max_concurrent_jobs = 5
# File retention before cleanup, in hours (default: 24)retention_hours = 24
# Split files after this many resources (default: 100000)max_resources_per_file = 100000
# Database query batch size (default: 1000)batch_size = 1000
# Default resource types if _type not specified (empty = all types)default_resource_types = []NDJSON Format
Section titled “NDJSON Format”Each output file contains one JSON resource per line (Newline Delimited JSON):
{"resourceType":"Patient","id":"1","name":[{"family":"Smith","given":["John"]}]}{"resourceType":"Patient","id":"2","name":[{"family":"Jones","given":["Jane"]}]}File Splitting
Section titled “File Splitting”Large exports are automatically split into multiple files when max_resources_per_file is exceeded:
Patient.ndjson # First 100,000 patientsPatient.1.ndjson # Next 100,000 patientsPatient.2.ndjson # Remaining patientsSupported Resource Types
Section titled “Supported Resource Types”By default, bulk export includes these common resource types:
- Clinical: Patient, Observation, Condition, Procedure, DiagnosticReport
- Medications: MedicationRequest, Medication
- Encounters: Encounter, CarePlan, CareTeam
- Allergies/Immunizations: AllergyIntolerance, Immunization
- Administrative: Organization, Practitioner, PractitionerRole, Location
- Documents: DocumentReference, Provenance
Use the _type parameter to limit to specific types.
Error Handling
Section titled “Error Handling”Export Errors
Section titled “Export Errors”If errors occur during export, they appear in the error array of the manifest:
{ "output": [...], "error": [ { "type": "OperationOutcome", "url": "http://server/fhir/_bulk-files/550e8400.../errors.ndjson" } ]}Common Error Codes
Section titled “Common Error Codes”| HTTP Status | Meaning |
|---|---|
| 202 | Export accepted/in progress |
| 200 | Export complete |
| 400 | Invalid parameters |
| 404 | Job not found |
| 410 | Job cancelled |
| 500 | Export failed |
Best Practices
Section titled “Best Practices”For Large Exports
Section titled “For Large Exports”- Use
_typeparameter to limit resource types - Use
_sinceparameter for incremental exports - Monitor progress by polling status endpoint
- Download files promptly before retention expires
For Incremental Exports
Section titled “For Incremental Exports”Track the transactionTime from each export manifest and use it as the _since parameter for the next export:
# First exportGET /$export?_type=Patient# Returns transactionTime: 2024-01-15T14:30:00Z
# Next incremental exportGET /$export?_type=Patient&_since=2024-01-15T14:30:00ZSecurity Considerations
Section titled “Security Considerations”- Export files require authentication to download
- Files are automatically cleaned up after
retention_hours - Consider network bandwidth for large exports
- Use
_typeto avoid exporting sensitive data
Limitations
Section titled “Limitations”Current implementation notes:
- Only
application/fhir+ndjsonoutput format is supported - Group export requires the Group resource to exist
- S3/cloud storage backend not yet implemented
- Compression (gzip) not yet supported