Skip to content

Implementation Details

This page provides a detailed look at the internal implementation of the OctoFHIR FHIRPath engine, covering the technical specifics that make it fast, reliable, and compliant with the FHIRPath specification.

The lexer recognizes and categorizes input into specific token types:

  • Identifiers: Field names, function names, and keywords
  • Literals: Strings, numbers, booleans, and dates
  • Operators: Arithmetic, comparison, and logical operators
  • Delimiters: Parentheses, brackets, and dots
  • Whitespace: Spaces, tabs, and newlines (typically ignored)

The lexer uses a state machine approach for efficient tokenization:

  • Single-pass scanning: Process input character by character
  • Lookahead buffering: Minimal lookahead for disambiguation
  • Error recovery: Continue tokenizing after encountering errors
  • Position tracking: Maintain line and column information for error reporting

Special attention is paid to string literal processing:

  • Escape sequence handling: Support for standard escape sequences
  • Unicode support: Full Unicode character support
  • Quote handling: Both single and double quotes supported
  • Interpolation: Future support for string interpolation

The parser implements the FHIRPath grammar using recursive descent parsing:

  • Expression precedence: Proper operator precedence handling
  • Left-associativity: Correct associativity for operators
  • Function calls: Support for function invocation syntax
  • Path navigation: Dot notation and bracket notation

The Abstract Syntax Tree uses an enum-based representation:

Expression:
- Literal(value)
- Identifier(name)
- FunctionCall(name, args)
- BinaryOp(left, op, right)
- UnaryOp(op, expr)
- Path(base, field)
- Index(base, index)
- Filter(base, condition)

The parser implements sophisticated error recovery:

  • Synchronization points: Resume parsing at statement boundaries
  • Error cascading prevention: Avoid reporting multiple errors for single issues
  • Suggestion generation: Provide helpful suggestions for common mistakes
  • Partial AST construction: Build partial ASTs even with errors

FHIRPath values are represented using a tagged union approach:

  • Primitive types: String, Integer, Decimal, Boolean, Date, DateTime
  • Complex types: Objects, Arrays, and custom FHIR types
  • Special values: Empty collections and null values
  • Type coercion: Automatic type conversion where appropriate

The evaluator maintains evaluation context throughout execution:

  • Variable bindings: Support for variable assignment and lookup
  • Function scope: Proper scoping for function parameters
  • Resource context: Current FHIR resource being evaluated
  • Path context: Current position in the resource hierarchy

Built-in functions are implemented as native Rust functions:

  • Collection functions: where(), select(), all(), any()
  • String functions: substring(), length(), matches()
  • Math functions: abs(), ceiling(), floor(), round()
  • Date functions: today(), now(), date arithmetic
  • Type functions: is(), as(), type checking

The engine implements lazy evaluation for performance:

  • Short-circuit evaluation: Boolean operations stop early when possible
  • Deferred computation: Only compute values when actually needed
  • Memoization: Cache expensive computations
  • Stream processing: Process large collections without full materialization

The implementation uses several memory management techniques:

  • Stack allocation: Prefer stack allocation for temporary values
  • Arena allocation: Use arenas for AST nodes and temporary objects
  • Reference counting: Share immutable data using Rc<T>
  • Copy-on-write: Efficient string handling with Cow<str>

Rust’s ownership system eliminates the need for garbage collection:

  • Deterministic cleanup: Objects are cleaned up when they go out of scope
  • No GC pauses: Predictable performance without garbage collection
  • Memory safety: Prevent use-after-free and double-free errors
  • Leak prevention: Automatic detection of potential memory leaks

The implementation defines specific error types for different failure modes:

  • SyntaxError: Parsing errors with position information
  • TypeError: Type mismatch errors during evaluation
  • RuntimeError: Runtime errors like division by zero
  • ResourceError: FHIR resource format errors

Errors are propagated using Rust’s Result type:

  • Early return: Use ? operator for clean error propagation
  • Error chaining: Chain errors to preserve context
  • Error conversion: Automatic conversion between error types
  • Error recovery: Attempt to continue processing when possible

Rich diagnostic information is provided for all errors:

  • Position information: Line and column numbers for syntax errors
  • Context information: Show the problematic expression or value
  • Suggestion generation: Provide helpful suggestions when possible
  • Stack traces: Full stack traces for runtime errors

Several optimizations are applied during parsing:

  • Operator precedence climbing: Efficient precedence parsing
  • Left-recursion elimination: Convert left-recursive rules
  • Memoization: Cache parsing results for repeated subexpressions
  • Incremental parsing: Reparse only changed portions (future)

The evaluator includes numerous performance optimizations:

  • Constant folding: Evaluate constant expressions at parse time
  • Dead code elimination: Remove unreachable code paths
  • Inline expansion: Inline simple function calls
  • Loop optimization: Optimize common loop patterns

Careful attention is paid to data structure efficiency:

  • Compact representations: Minimize memory footprint of data structures
  • Cache-friendly layouts: Arrange data for good cache locality
  • SIMD utilization: Use SIMD instructions where beneficial
  • Vectorization: Process multiple values simultaneously

The implementation ensures thread safety through several mechanisms:

  • Immutable data structures: Most data is immutable after creation
  • Atomic operations: Use atomic operations for shared counters
  • Lock-free algorithms: Avoid locks where possible
  • Message passing: Use channels for communication between threads

Support for parallel evaluation of expressions:

  • Work stealing: Distribute work efficiently across threads
  • Parallel iterators: Use Rayon for parallel collection processing
  • Task decomposition: Break large tasks into smaller parallel tasks
  • Load balancing: Ensure even distribution of work

The WASM bindings are implemented using wasm-bindgen:

  • Type marshalling: Efficient conversion between Rust and JavaScript types
  • Memory management: Proper cleanup of WASM memory
  • Error handling: Translate Rust errors to JavaScript exceptions
  • Async support: Support for asynchronous operations

The Node.js bindings use NAPI-RS for native integration:

  • Zero-copy operations: Minimize copying between Rust and JavaScript
  • Async integration: Integrate with Node.js event loop
  • Buffer handling: Efficient handling of binary data
  • Error propagation: Proper error handling across language boundaries

The command-line interface is built using the clap crate:

  • Argument parsing: Comprehensive command-line argument handling
  • Shell completion: Generate completion scripts for popular shells
  • Streaming I/O: Process large files without loading into memory
  • Signal handling: Proper handling of interrupt signals

Comprehensive unit testing covers all components:

  • Property-based testing: Use proptest for property-based testing
  • Fuzzing: Use cargo-fuzz for automated fuzzing
  • Benchmark testing: Use criterion for performance benchmarking
  • Coverage analysis: Track code coverage with tarpaulin

Integration tests verify end-to-end functionality:

  • Official test suite: Run against the official FHIRPath test suite
  • Cross-platform testing: Test on multiple operating systems
  • Language binding testing: Test all language bindings
  • Performance regression testing: Detect performance regressions

Automated testing runs on every change:

  • GitHub Actions: Automated CI/CD pipeline
  • Multiple platforms: Test on Linux, macOS, and Windows
  • Multiple Rust versions: Test with stable, beta, and nightly Rust
  • Security scanning: Automated security vulnerability scanning

The implementation includes comprehensive debugging support:

  • Debug logging: Structured logging with multiple levels
  • AST visualization: Tools to visualize the parsed AST
  • Execution tracing: Trace expression evaluation step by step
  • Memory profiling: Tools to analyze memory usage patterns

Built-in support for performance analysis:

  • CPU profiling: Integration with standard profiling tools
  • Memory profiling: Track memory allocations and deallocations
  • Benchmark suite: Comprehensive benchmark suite for performance testing
  • Flame graphs: Generate flame graphs for performance analysis

Several optimizations are planned for future releases:

  • JIT compilation: Compile frequently used expressions to native code
  • Query optimization: Apply database-style query optimization techniques
  • Vectorization: Use SIMD instructions for bulk operations
  • GPU acceleration: Leverage GPU for parallel processing

Additional language bindings are planned:

  • Python bindings: Native Python integration using PyO3
  • Java bindings: JNI-based Java bindings
  • C bindings: C-compatible API for broader language support
  • Go bindings: CGO-based Go bindings

This implementation provides a solid foundation for high-performance FHIRPath evaluation while maintaining code quality, safety, and maintainability.