Skip to content

Bulk Data Export

OctoFHIR implements the FHIR Bulk Data Access IG for exporting large datasets in NDJSON format. This is useful for analytics, data warehousing, and system migrations.

Bulk export follows the FHIR Asynchronous Request Pattern:

  1. Client initiates export with $export operation
  2. Server returns 202 Accepted with status URL
  3. Client polls status endpoint until complete
  4. Client downloads NDJSON files from manifest

Export all resources on the server:

Terminal window
GET /$export
Prefer: respond-async
Accept: application/fhir+json

Export all patient compartment data:

Terminal window
GET /Patient/$export
Prefer: respond-async
Accept: application/fhir+json

Export data for members of a specific group:

Terminal window
GET /Group/{id}/$export
Prefer: respond-async
Accept: application/fhir+json
ParameterTypeDescription
_outputFormatstringMust be application/fhir+ndjson (default)
_sinceinstantOnly resources updated since this timestamp
_typestringComma-separated list of resource types to export
_typeFilterstringFHIR search queries per type

Export only Patient and Observation resources:

Terminal window
GET /$export?_type=Patient,Observation

Export resources modified after a specific date:

Terminal window
GET /$export?_since=2024-01-01T00:00:00Z

Export with type-specific filters:

Terminal window
GET /$export?_type=Observation&_typeFilter=Observation?code=8867-4
Terminal window
GET /$export?_type=Patient,Observation
Prefer: respond-async
Accept: application/fhir+json

Response: 202 Accepted

HTTP/1.1 202 Accepted
Content-Location: http://server/fhir/_async-status/550e8400-e29b-41d4-a716-446655440000

Poll the status URL until the export completes:

Terminal window
GET /_async-status/550e8400-e29b-41d4-a716-446655440000

Response (In Progress): 202 Accepted

{
"status": "in_progress",
"progress": 0.45,
"message": "Exporting Observation resources..."
}

Response (Complete): 200 OK

{
"transactionTime": "2024-01-15T14:30:00Z",
"request": "http://server/fhir/$export?_type=Patient,Observation",
"requiresAccessToken": true,
"output": [
{
"type": "Patient",
"url": "http://server/fhir/_bulk-files/550e8400.../Patient.ndjson",
"count": 1500
},
{
"type": "Observation",
"url": "http://server/fhir/_bulk-files/550e8400.../Observation.ndjson",
"count": 45000
}
],
"error": []
}

Download each NDJSON file from the manifest:

Terminal window
GET /_bulk-files/550e8400.../Patient.ndjson
Authorization: Bearer {token}

Response:

{"resourceType":"Patient","id":"1","name":[{"family":"Smith"}]}
{"resourceType":"Patient","id":"2","name":[{"family":"Jones"}]}
{"resourceType":"Patient","id":"3","name":[{"family":"Williams"}]}

To cancel an in-progress export:

Terminal window
DELETE /_async-status/550e8400-e29b-41d4-a716-446655440000

Response: 204 No Content

Configure bulk export in octofhir.toml:

[bulk_export]
# Enable/disable bulk export (default: true)
enabled = true
# Directory for export files (default: "./exports")
export_path = "./exports"
# Maximum concurrent export jobs (default: 5)
max_concurrent_jobs = 5
# File retention before cleanup, in hours (default: 24)
retention_hours = 24
# Split files after this many resources (default: 100000)
max_resources_per_file = 100000
# Database query batch size (default: 1000)
batch_size = 1000
# Default resource types if _type not specified (empty = all types)
default_resource_types = []

Each output file contains one JSON resource per line (Newline Delimited JSON):

{"resourceType":"Patient","id":"1","name":[{"family":"Smith","given":["John"]}]}
{"resourceType":"Patient","id":"2","name":[{"family":"Jones","given":["Jane"]}]}

Large exports are automatically split into multiple files when max_resources_per_file is exceeded:

Patient.ndjson # First 100,000 patients
Patient.1.ndjson # Next 100,000 patients
Patient.2.ndjson # Remaining patients

By default, bulk export includes these common resource types:

  • Clinical: Patient, Observation, Condition, Procedure, DiagnosticReport
  • Medications: MedicationRequest, Medication
  • Encounters: Encounter, CarePlan, CareTeam
  • Allergies/Immunizations: AllergyIntolerance, Immunization
  • Administrative: Organization, Practitioner, PractitionerRole, Location
  • Documents: DocumentReference, Provenance

Use the _type parameter to limit to specific types.

If errors occur during export, they appear in the error array of the manifest:

{
"output": [...],
"error": [
{
"type": "OperationOutcome",
"url": "http://server/fhir/_bulk-files/550e8400.../errors.ndjson"
}
]
}
HTTP StatusMeaning
202Export accepted/in progress
200Export complete
400Invalid parameters
404Job not found
410Job cancelled
500Export failed
  1. Use _type parameter to limit resource types
  2. Use _since parameter for incremental exports
  3. Monitor progress by polling status endpoint
  4. Download files promptly before retention expires

Track the transactionTime from each export manifest and use it as the _since parameter for the next export:

Terminal window
# First export
GET /$export?_type=Patient
# Returns transactionTime: 2024-01-15T14:30:00Z
# Next incremental export
GET /$export?_type=Patient&_since=2024-01-15T14:30:00Z
  • Export files require authentication to download
  • Files are automatically cleaned up after retention_hours
  • Consider network bandwidth for large exports
  • Use _type to avoid exporting sensitive data

Current implementation notes:

  • Only application/fhir+ndjson output format is supported
  • Group export requires the Group resource to exist
  • S3/cloud storage backend not yet implemented
  • Compression (gzip) not yet supported