Implementation Details

This page provides a detailed look at the internal implementation of the OctoFHIR FHIRPath engine, covering the technical specifics that make it fast, reliable, and compliant with the FHIRPath specification.

Lexical Analysis Implementation

Token Types

The lexer recognizes and categorizes input into specific token types:

Identifiers: Field names, function names, and keywords
Literals: Strings, numbers, booleans, and dates
Operators: Arithmetic, comparison, and logical operators
Delimiters: Parentheses, brackets, and dots
Whitespace: Spaces, tabs, and newlines (typically ignored)

Tokenization Strategy

The lexer uses a state machine approach for efficient tokenization:

Single-pass scanning: Process input character by character
Lookahead buffering: Minimal lookahead for disambiguation
Error recovery: Continue tokenizing after encountering errors
Position tracking: Maintain line and column information for error reporting

String Handling

Special attention is paid to string literal processing:

Escape sequence handling: Support for standard escape sequences
Unicode support: Full Unicode character support
Quote handling: Both single and double quotes supported
Interpolation: Future support for string interpolation

Parser Implementation

Grammar Structure

The parser implements the FHIRPath grammar using recursive descent parsing:

Expression precedence: Proper operator precedence handling
Left-associativity: Correct associativity for operators
Function calls: Support for function invocation syntax
Path navigation: Dot notation and bracket notation

AST Node Types

The Abstract Syntax Tree uses an enum-based representation:

Expression:
  - Literal(value)
  - Identifier(name)
  - FunctionCall(name, args)
  - BinaryOp(left, op, right)
  - UnaryOp(op, expr)
  - Path(base, field)
  - Index(base, index)
  - Filter(base, condition)

Error Recovery

The parser implements sophisticated error recovery:

Synchronization points: Resume parsing at statement boundaries
Error cascading prevention: Avoid reporting multiple errors for single issues
Suggestion generation: Provide helpful suggestions for common mistakes
Partial AST construction: Build partial ASTs even with errors

Evaluation Engine

Value Representation

FHIRPath values are represented using a tagged union approach:

Primitive types: String, Integer, Decimal, Boolean, Date, DateTime
Complex types: Objects, Arrays, and custom FHIR types
Special values: Empty collections and null values
Type coercion: Automatic type conversion where appropriate

Context Management

The evaluator maintains evaluation context throughout execution:

Variable bindings: Support for variable assignment and lookup
Function scope: Proper scoping for function parameters
Resource context: Current FHIR resource being evaluated
Path context: Current position in the resource hierarchy

Function Library

Built-in functions are implemented as native Rust functions:

Collection functions: where(), select(), all(), any()
String functions: substring(), length(), matches()
Math functions: abs(), ceiling(), floor(), round()
Date functions: today(), now(), date arithmetic
Type functions: is(), as(), type checking

Lazy Evaluation

The engine implements lazy evaluation for performance:

Short-circuit evaluation: Boolean operations stop early when possible
Deferred computation: Only compute values when actually needed
Memoization: Cache expensive computations
Stream processing: Process large collections without full materialization

Memory Management

Allocation Strategy

The implementation uses several memory management techniques:

Stack allocation: Prefer stack allocation for temporary values
Arena allocation: Use arenas for AST nodes and temporary objects
Reference counting: Share immutable data using Rc<T>
Copy-on-write: Efficient string handling with Cow<str>

Garbage Collection Avoidance

Rust’s ownership system eliminates the need for garbage collection:

Deterministic cleanup: Objects are cleaned up when they go out of scope
No GC pauses: Predictable performance without garbage collection
Memory safety: Prevent use-after-free and double-free errors
Leak prevention: Automatic detection of potential memory leaks

Error Handling

Error Types

The implementation defines specific error types for different failure modes:

SyntaxError: Parsing errors with position information
TypeError: Type mismatch errors during evaluation
RuntimeError: Runtime errors like division by zero
ResourceError: FHIR resource format errors

Error Propagation

Errors are propagated using Rust’s Result type:

Early return: Use ? operator for clean error propagation
Error chaining: Chain errors to preserve context
Error conversion: Automatic conversion between error types
Error recovery: Attempt to continue processing when possible

Diagnostic Information

Rich diagnostic information is provided for all errors:

Position information: Line and column numbers for syntax errors
Context information: Show the problematic expression or value
Suggestion generation: Provide helpful suggestions when possible
Stack traces: Full stack traces for runtime errors

Performance Optimizations

Parsing Optimizations

Several optimizations are applied during parsing:

Operator precedence climbing: Efficient precedence parsing
Left-recursion elimination: Convert left-recursive rules
Memoization: Cache parsing results for repeated subexpressions
Incremental parsing: Reparse only changed portions (future)

Evaluation Optimizations

The evaluator includes numerous performance optimizations:

Constant folding: Evaluate constant expressions at parse time
Dead code elimination: Remove unreachable code paths
Inline expansion: Inline simple function calls
Loop optimization: Optimize common loop patterns

Data Structure Optimizations

Careful attention is paid to data structure efficiency:

Compact representations: Minimize memory footprint of data structures
Cache-friendly layouts: Arrange data for good cache locality
SIMD utilization: Use SIMD instructions where beneficial
Vectorization: Process multiple values simultaneously

Concurrency Implementation

Thread Safety

The implementation ensures thread safety through several mechanisms:

Immutable data structures: Most data is immutable after creation
Atomic operations: Use atomic operations for shared counters
Lock-free algorithms: Avoid locks where possible
Message passing: Use channels for communication between threads

Parallel Evaluation

Support for parallel evaluation of expressions:

Work stealing: Distribute work efficiently across threads
Parallel iterators: Use Rayon for parallel collection processing
Task decomposition: Break large tasks into smaller parallel tasks
Load balancing: Ensure even distribution of work

Language Binding Implementation

WebAssembly Bindings

The WASM bindings are implemented using wasm-bindgen:

Type marshalling: Efficient conversion between Rust and JavaScript types
Memory management: Proper cleanup of WASM memory
Error handling: Translate Rust errors to JavaScript exceptions
Async support: Support for asynchronous operations

Node.js Bindings

The Node.js bindings use NAPI-RS for native integration:

Zero-copy operations: Minimize copying between Rust and JavaScript
Async integration: Integrate with Node.js event loop
Buffer handling: Efficient handling of binary data
Error propagation: Proper error handling across language boundaries

CLI Implementation

The command-line interface is built using the clap crate:

Argument parsing: Comprehensive command-line argument handling
Shell completion: Generate completion scripts for popular shells
Streaming I/O: Process large files without loading into memory
Signal handling: Proper handling of interrupt signals

Testing Implementation

Unit Testing Strategy

Comprehensive unit testing covers all components:

Property-based testing: Use proptest for property-based testing
Fuzzing: Use cargo-fuzz for automated fuzzing
Benchmark testing: Use criterion for performance benchmarking
Coverage analysis: Track code coverage with tarpaulin

Integration Testing

Integration tests verify end-to-end functionality:

Official test suite: Run against the official FHIRPath test suite
Cross-platform testing: Test on multiple operating systems
Language binding testing: Test all language bindings
Performance regression testing: Detect performance regressions

Continuous Integration

Automated testing runs on every change:

GitHub Actions: Automated CI/CD pipeline
Multiple platforms: Test on Linux, macOS, and Windows
Multiple Rust versions: Test with stable, beta, and nightly Rust
Security scanning: Automated security vulnerability scanning

Debugging and Profiling

Debug Support

The implementation includes comprehensive debugging support:

Debug logging: Structured logging with multiple levels
AST visualization: Tools to visualize the parsed AST
Execution tracing: Trace expression evaluation step by step
Memory profiling: Tools to analyze memory usage patterns

Performance Profiling

Built-in support for performance analysis:

CPU profiling: Integration with standard profiling tools
Memory profiling: Track memory allocations and deallocations
Benchmark suite: Comprehensive benchmark suite for performance testing
Flame graphs: Generate flame graphs for performance analysis

Future Implementation Plans

Planned Optimizations

Several optimizations are planned for future releases:

JIT compilation: Compile frequently used expressions to native code
Query optimization: Apply database-style query optimization techniques
Vectorization: Use SIMD instructions for bulk operations
GPU acceleration: Leverage GPU for parallel processing

Language Support

Additional language bindings are planned:

Python bindings: Native Python integration using PyO3
Java bindings: JNI-based Java bindings
C bindings: C-compatible API for broader language support
Go bindings: CGO-based Go bindings

This implementation provides a solid foundation for high-performance FHIRPath evaluation while maintaining code quality, safety, and maintainability.