Bidi and RTL: A Practical Guide to Bidirectional Text Handling
bidirtldeveloperaccessibility

Bidi and RTL: A Practical Guide to Bidirectional Text Handling

SSamir Haddad
2025-07-04
12 min read
Advertisement

Handling bidirectional (Bidi) text is tricky. This guide gives developers practical rules, examples, and pitfalls when working with mixed LTR and RTL content.

Bidi and RTL: A Practical Guide to Bidirectional Text Handling

Many modern applications must handle languages that are written right-to-left (RTL) such as Arabic, Hebrew, and Persian, alongside left-to-right (LTR) languages like English. Mixed direction text introduces unique challenges for rendering, cursor movement, selection, and string processing. The Unicode Bidirectional Algorithm (UBA) provides the technical foundation, but real-world implementation requires practical rules and testing. This guide brings the UBA down to actionable steps that developers and designers can apply.

Fundamental concepts

The Unicode Bidirectional Algorithm assigns directional classes to characters. These include:

  • Strong types: L for left-to-right scripts, R and AL for right-to-left scripts.
  • Weak types: digits and punctuation that follow surrounding context.
  • Neutral types: spaces and some punctuation characters that take direction from neighbors.

When text contains runs of different directionalities, UBA computes embedding levels and reorders the display order so that the visual appearance matches reading direction expectations.

Common scenarios and how to handle them

1. Displaying mixed text (e.g., English inside Arabic)

By default, browsers and modern text engines apply the UBA. However, problems arise when directionality is ambiguous. To avoid surprises, explicitly set directionality using the direction attribute or CSS direction property:

  • Use direction="rtl" for container elements where the default reading direction is RTL.
  • For inline LTR fragments inside an RTL block, wrap them in an element with direction="ltr".

2. Embedding codes and overrides

Unicode provides control characters such as RLE, LRE, PDF, RLO, and LRO to influence embedding and overrides. These are powerful but risky because invisible control characters can persist in text and cause confusing rendering. Prefer to use markup-level directionality when available.

3. Numbers and punctuation

Numbers in an RTL context often retain LTR ordering while surrounding content flows RTL. This leads to visually mixed runs. Libraries and rendering engines provide number shaping and locale-aware formatting. Keep numeric data in separate elements where possible so it can be styled independently.

4. Cursor movement and selection

Textbox caret movement is governed by visual ordering, not logical ordering. When building custom editors, rely on the platforms text APIs where possible. If you must implement custom caret logic, work with grapheme clusters and reordering information provided by Unicode libraries.

Practical recommendations

  • Set a clear document or container direction: Use lang and dir attributes. This gives the rendering system context for ambiguous characters.
  • Use markup to separate logical regions: Wrap different direction runs in elements with explicit directionality rather than sprinkling control characters.
  • Test with real content: Use samples that include punctuation, parentheses, numbers, and neutral characters to catch edge cases.
  • Sanitize invisible controls: Remove or normalize embedding and override control codes unless you explicitly need them.

Accessibility considerations

Screen readers depend on proper direction information. Ensure that elements with RTL text contain lang attributes to give screen readers the right pronunciation rules. Also verify that readout order matches visual expectations.

Troubleshooting common bugs

Here are a few common bugs and how to debug them:

  • Flipped punctuation: Parentheses and quotes may appear reversed. Wrap the punctuation in a span with explicit direction appropriate for its orientation.
  • Broken alignment: Layout engines sometimes mix logical and visual ordering when aligning elements. Use CSS text-align and direction consistently on containers.
  • Invisible control characters in copy-paste: They can propagate through text. Offer a "clean text" option that strips bidi controls and normalizes spaces.

Libraries and tooling

Leverage robust libraries rather than rolling your own:

  • ICU includes bidi utilities.
  • Platform APIs in browsers expose directionality features and line breaking behavior.
  • High-level frameworks often implement bidi-aware rendering but always verify in the target environment.

Testing checklist

  1. Test simple RTL-only and LTR-only strings.
  2. Test mixed strings with embedded numbers and punctuation.
  3. Test UI components like dropdowns, date pickers, and code editors.
  4. Test copy-paste between systems, including Windows, macOS, and mobile.
  5. Test with assistive technologies like screen readers.

Conclusion

Bidi and RTL support is nuanced but manageable with the right strategy. Use explicit direction markers at the container level, avoid invisible bidi control characters unless required, and test with realistic multilingual content. Most importantly, treat bidi as a first-class requirement when your application targets global audiences.

Final note: Correct bidi handling is a combination of standards knowledge (the UBA) and engineering pragmatism: use platform features and validate with real content.

Advertisement

Related Topics

#bidi#rtl#developer#accessibility
S

Samir Haddad

Frontend engineer & accessibility advocate

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement