From 9cd0a33ff9f9e188659e3ee3d6b922ebc98dfafb Mon Sep 17 00:00:00 2001 From: Rodin Date: Thu, 30 Apr 2026 15:08:25 -0700 Subject: [PATCH] docs: unsafe patterns from rust-lang/rust 10 patterns, 624 lines. Full spec compliance. Patterns: // SAFETY: comments, unsafe fn contracts, safe wrappers, MaybeUninit, transmute, raw pointers, unsafe impl Send/Sync, NonNull/PhantomData, extern "C" FFI, type-encoded invariants. --- patterns/unsafe-patterns.md | 624 ++++++++++++++++++++++++++++++++++++ 1 file changed, 624 insertions(+) create mode 100644 patterns/unsafe-patterns.md diff --git a/patterns/unsafe-patterns.md b/patterns/unsafe-patterns.md new file mode 100644 index 0000000..c89c79b --- /dev/null +++ b/patterns/unsafe-patterns.md @@ -0,0 +1,624 @@ +# Rust Unsafe Patterns + +Patterns for using unsafe code correctly in Rust, extracted from +the standard library source. + +**Source:** [rust-lang/rust](https://github.com/rust-lang/rust) at commit +[`f53b654`](https://github.com/rust-lang/rust/tree/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6) + +**Stats:** 31,244 unsafe blocks, 7,091 unsafe fn declarations, +2,463 `// SAFETY:` comments, 9,061 transmute usages, 928 MaybeUninit +usages, 710 ptr::read/write/copy calls, 489 extern "C" blocks. + +--- + +## 1. // SAFETY: Comment on Every Unsafe Block + +### Source: + +[library/core/src/slice/mod.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/slice/mod.rs) + +2,463 `// SAFETY:` comments in library/. + +```rust +// library/core/src/slice/mod.rs +pub fn split_at(&self, mid: usize) -> (&[T], &[T]) { + assert!(mid <= self.len()); + // SAFETY: `[ptr; mid]` and `[mid; len]` are inside `self`, which + // fulfills the requirements of `split_at_unchecked`. + unsafe { self.split_at_unchecked(mid) } +} +``` + +### Why + +Every unsafe block must prove soundness at the point of use. The +comment is a proof obligation: "I assert that the following +invariants hold HERE because..." This is how unsafe code gets +audited — reviewers check the comment against the requirements. + +### When to Use + +**Triggers:** +- Every `unsafe { }` block (no exceptions) +- Explain WHY it's safe, not WHAT the code does +- Reference specific invariants from the unsafe fn's `# Safety` docs + +**Example — before:** +```rust +unsafe { + ptr::copy_nonoverlapping(src, dst, len); +} +// No comment — reviewer has no idea if this is actually safe +``` + +**Example — after:** +```rust +// SAFETY: `src` and `dst` are both derived from `self.buf` which +// is a contiguous allocation. `src` points to index `self.head` and +// `dst` points to index 0. They don't overlap because head > 0 +// (checked by the if-guard above). `len` is bounded by capacity +// minus head, ensuring we don't read past the allocation. +unsafe { + ptr::copy_nonoverlapping(src, dst, len); +} +``` + +### When NOT to Use + +**This pattern is ALWAYS required.** There is no "when not to use." +If you have an unsafe block without a SAFETY comment, it's +incomplete. + +--- + +## 2. unsafe fn for Precondition Contracts + +### Source: + +[library/core/src/slice/mod.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/slice/mod.rs) (get_unchecked) + +7,091 unsafe fn declarations in library/. + +```rust +/// Returns a reference to an element, without doing bounds checking. +/// +/// # Safety +/// +/// Calling this method with an out-of-bounds index is +/// *[undefined behavior]* even if the resulting reference is not used. +pub unsafe fn get_unchecked(&self, index: I) -> &I::Output +where + I: SliceIndex, +{ ... } +``` + +### Why + +`unsafe fn` shifts the proof obligation to the CALLER. The function +says "I'm correct IF you uphold these preconditions." The `# Safety` +doc section is the contract. Without `unsafe`, Rust guarantees +safety; with it, YOU guarantee safety. + +### When to Use + +**Triggers:** +- The function has preconditions that can't be checked at runtime + (or checking would be too expensive) +- Performance-critical inner loops where bounds checking matters +- The function wraps raw pointer operations + +**Example — before:** +```rust +// Safe version — always checks (correct but slower in hot paths) +pub fn get(&self, index: usize) -> Option<&T> { + if index < self.len() { + Some(unsafe { &*self.ptr.add(index) }) + } else { + None + } +} +``` + +**Example — after:** +```rust +// Unsafe version — skips the check (caller's responsibility) +/// # Safety +/// +/// `index` must be less than `self.len()`. +pub unsafe fn get_unchecked(&self, index: usize) -> &T { + // SAFETY: caller guarantees index < len + unsafe { &*self.ptr.add(index) } +} +``` + +### When NOT to Use + +**Don't use this when:** +- You can validate inputs cheaply (just check and panic/return Err) +- The function is public API that regular users will call +- Performance isn't critical (safe version is always preferred) + +--- + +## 3. Safe Wrapper Around Unsafe Core + +### Source: + +This is THE fundamental pattern of Rust's stdlib. Almost every +safe public API is a thin wrapper that validates inputs then calls +unsafe internals. + +```rust +// The pattern: safe API → validate → unsafe impl +pub fn split_at(&self, mid: usize) -> (&[T], &[T]) { + assert!(mid <= self.len()); // ← validation + // SAFETY: assertion above guarantees mid is in bounds + unsafe { self.split_at_unchecked(mid) } // ← unsafe core +} +``` + +### Why + +This is how Rust achieves both safety AND performance. The safe +wrapper provides the guarantee. The unsafe core provides the speed. +Users get safety by default; experts opt into `_unchecked` when they +can prove the preconditions themselves. + +### When to Use + +**Triggers:** +- You have an operation that's unsafe in general but can be made + safe with runtime checks +- You want to offer both safe and unsafe versions +- The safe version is the default; unsafe is the opt-in optimization + +**Example — before:** +```rust +// Only unsafe — forces ALL callers to use unsafe +pub unsafe fn index(&self, i: usize) -> &T { + &*self.ptr.add(i) +} +``` + +**Example — after:** +```rust +// Safe default (what most users call): +pub fn index(&self, i: usize) -> &T { + assert!(i < self.len(), "index {i} out of bounds (len {})", self.len()); + // SAFETY: we just verified i < len + unsafe { self.index_unchecked(i) } +} + +// Unsafe escape hatch (for performance-critical code): +/// # Safety +/// `i` must be less than `self.len()`. +pub unsafe fn index_unchecked(&self, i: usize) -> &T { + unsafe { &*self.ptr.add(i) } +} +``` + +### When NOT to Use + +**Don't use this when:** +- The safe version has no overhead worth avoiding (just be safe) +- The precondition can't be expressed as a simple check +- Only internal code will ever call the unsafe version + +--- + +## 4. MaybeUninit for Uninitialized Memory + +### Source: + +[library/core/src/mem/maybe_uninit.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/mem/maybe_uninit.rs) + +928 MaybeUninit usages in the stdlib. + +```rust +use std::mem::MaybeUninit; + +let mut buf: [MaybeUninit; 1024] = MaybeUninit::uninit_array(); +let len = read_into(&mut buf)?; +// SAFETY: read_into guarantees buf[..len] is initialized +let initialized = unsafe { MaybeUninit::array_assume_init(buf[..len]) }; +``` + +### Why + +Rust requires all values to be initialized. `MaybeUninit` opts out +of this requirement for performance (avoiding zeroing large buffers). +It tells the compiler "this might not be initialized yet — don't +assume anything." + +### When to Use + +**Triggers:** +- Buffer allocation without initialization overhead +- FFI where C code fills in the data +- Building arrays element-by-element without Default requirement +- Performance-critical allocation hot paths + +**Example — before:** +```rust +// Zeroing 1MB for no reason — the OS will fill it immediately +let mut buf = vec![0u8; 1_000_000]; +file.read(&mut buf)?; // overwrites all zeros anyway +``` + +**Example — after:** +```rust +let mut buf = Vec::with_capacity(1_000_000); +// SAFETY: read will initialize exactly `n` bytes +unsafe { + let n = file.read(buf.spare_capacity_mut())?; + buf.set_len(n); +} +``` + +### When NOT to Use + +**Don't use this when:** +- Default/zeroed memory is fine (clarity > micro-optimization) +- You're not sure how many bytes will be initialized +- The type has drop glue (forgetting to call `assume_init` leaks) + +--- + +## 5. transmute for Type Reinterpretation + +### Source: + +[library/core/src/mem/mod.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/mem/mod.rs) (transmute) + +9,061 transmute usages (many in generated code/architecture intrinsics). + +```rust +// SAFETY: u8 and i8 have the same size and any bit pattern is valid +let signed: i8 = unsafe { std::mem::transmute::(byte) }; +``` + +### Why + +`transmute` reinterprets the bits of one type as another type. It's +the most dangerous unsafe operation — it bypasses ALL type checking. +The stdlib uses it for zero-cost conversions between types with +identical bit representations. + +### When to Use + +**Triggers:** +- Converting between types with identical memory layout +- Enum discriminant inspection +- FFI type conversions + +### When NOT to Use + +**Don't use this when:** +- `From`/`Into` can do the conversion safely +- `as` casting works (numeric conversions) +- The types might have different sizes (instant UB) +- There are invalid bit patterns for the target type + +### Anti-pattern + +```rust +// DON'T: transmute between types with different validity +let x: u8 = 255; +let b: bool = unsafe { std::mem::transmute(x) }; +// UB! bool can only be 0 or 1 + +// DO: use safe conversion +let b: bool = x != 0; +``` + +--- + +## 6. Raw Pointers (ptr::read, ptr::write, ptr::copy) + +### Source: + +710 ptr operations (read/write/copy/drop_in_place) in library/. + +```rust +use std::ptr; + +// SAFETY: src is valid, aligned, and initialized for T. +// dst is valid and aligned for T. +// src and dst don't overlap. +unsafe { + ptr::copy_nonoverlapping(src, dst, count); +} +``` + +### Why + +Raw pointers bypass the borrow checker. They're needed for: +implementing data structures, FFI, and performance-critical code. +The `ptr` module provides safe building blocks for common operations. + +### When to Use + +**Triggers:** +- Implementing custom collections (Vec, LinkedList) +- Moving values without running Drop +- FFI (C gives you raw pointers) +- Pointer arithmetic for buffer management + +### When NOT to Use + +**Don't use this when:** +- References (&T, &mut T) work (almost always) +- You can use safe abstractions (Vec, Box, slice methods) +- You're using raw pointers to "work around" the borrow checker + (fix the design instead) + +--- + +## 7. unsafe impl Send/Sync + +### Source: + +[library/core/src/marker.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/marker.rs) + +274 unsafe impl Send/Sync in the stdlib. + +```rust +// library/alloc/src/sync.rs +unsafe impl Send for Arc {} +unsafe impl Sync for Arc {} +``` + +### Why + +Types with raw pointers are !Send and !Sync by default (safe). +If you've built a type that IS safe to share across threads (e.g., +using atomic operations internally), you must explicitly opt in +with `unsafe impl`. + +### When to Use + +**Triggers:** +- Your type contains raw pointers but IS thread-safe +- You use atomic operations for all shared access +- The type wraps a C library that's documented as thread-safe + +**Example — before:** +```rust +struct SharedData { + ptr: *mut u8, // raw pointer → auto !Send, !Sync +} +// Can't use in thread::spawn — even if it's actually safe +``` + +**Example — after:** +```rust +struct SharedData { + ptr: *mut u8, + // internally uses atomic operations for all access +} + +// SAFETY: SharedData uses atomic operations for all mutations +// and the underlying data is never accessed without synchronization. +unsafe impl Send for SharedData {} +unsafe impl Sync for SharedData {} +``` + +### When NOT to Use + +**Don't use this when:** +- You're not 100% certain the type is thread-safe +- The type uses non-atomic interior mutability (Cell, RefCell) +- You haven't proven that no data races are possible + +--- + +## 8. NonNull and PhantomData for Safe Abstractions + +### Source: + +[library/core/src/ptr/non_null.rs](https://github.com/rust-lang/rust/blob/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6/library/core/src/ptr/non_null.rs) + +```rust +// NonNull is used instead of *mut T to encode the "never null" invariant: +pub struct Vec { + ptr: NonNull, // never null — can use niche optimization + len: usize, + cap: usize, + _marker: PhantomData, // tells compiler Vec "owns" T values +} +``` + +### Why + +`NonNull` wraps a raw pointer with a "not null" invariant. +This enables the compiler to use the null bit pattern for +`Option>` optimization (same size as a raw pointer). +`PhantomData` tells the compiler about ownership/variance +without storing T. + +### When to Use + +**Triggers:** +- You have a raw pointer that's never null by construction +- You want `Option` to be pointer-sized +- You need correct drop checking behavior (PhantomData) + +### When NOT to Use + +**Don't use this when:** +- The pointer CAN be null (use `Option>` or `*mut T`) +- You don't need the niche optimization +- A reference (&T, &mut T) would work + +--- + +## 9. extern "C" for FFI + +### Source: + +489 `extern "C"` blocks in the stdlib. + +```rust +extern "C" { + fn strlen(s: *const c_char) -> usize; + fn memcpy(dst: *mut u8, src: *const u8, n: usize) -> *mut u8; +} +``` + +### Why + +`extern "C"` declares functions using the C calling convention. +This is how Rust calls into C libraries. All extern functions are +implicitly `unsafe` because Rust can't verify C's behavior. + +### When to Use + +**Triggers:** +- Calling C/C++ libraries from Rust +- Providing Rust functions callable from C +- OS system calls + +**Example — before:** +```rust +// Re-implementing in Rust what already exists in C: +fn my_strlen(s: &[u8]) -> usize { + s.iter().position(|&b| b == 0).unwrap_or(s.len()) +} +``` + +**Example — after:** +```rust +use std::ffi::{CStr, c_char}; + +extern "C" { + fn strlen(s: *const c_char) -> usize; +} + +fn safe_strlen(s: &CStr) -> usize { + // SAFETY: CStr is null-terminated, which strlen requires + unsafe { strlen(s.as_ptr()) } +} +``` + +### When NOT to Use + +**Don't use this when:** +- A safe Rust equivalent exists (prefer pure Rust) +- The C library isn't well-documented (you can't prove safety) +- You only need it on one platform (consider cfg + fallback) + +--- + +## 10. Invariant Encoding in Types (Making Invalid States Unrepresentable) + +### Source: + +The stdlib encodes invariants in the type system, reducing the +surface area where unsafe is needed: + +```rust +// NonZero — can never be zero (compiler enforces this) +pub struct NonZero(T); // where T is a primitive integer + +// str — ALWAYS valid UTF-8 (unsafe to construct from arbitrary bytes) +// &str methods never need to re-validate + +// Pin

— the pointed-to value will never move +pub struct Pin { pointer: Ptr } +``` + +### Why + +The safest unsafe code is code that doesn't exist. By encoding +invariants in types, you push the unsafe boundary to construction +and then never need unsafe again. `str` is ALWAYS valid UTF-8 — +every `&str` method can assume this without checking. + +### When to Use + +**Triggers:** +- You have an invariant that many functions depend on +- Validating the invariant is expensive (do it once at construction) +- The invariant can be expressed as a type distinction + +**Example — before:** +```rust +// Every function must check the invariant +fn process(data: &[u8]) -> Result { + if !is_valid_utf8(data) { + return Err(Error::InvalidUtf8); + } + // ... 10 more functions all repeat this check +} +``` + +**Example — after:** +```rust +// Validate once at the boundary, then it's always true +struct ValidatedInput(String); // String is always valid UTF-8 + +impl ValidatedInput { + pub fn new(data: &[u8]) -> Result { + let s = std::str::from_utf8(data)?; + Ok(Self(s.to_owned())) + } +} + +fn process(input: &ValidatedInput) -> Output { + // No validation needed — the type guarantees it +} +``` + +### When NOT to Use + +**Don't use this when:** +- The invariant is trivial to check (just check it) +- The type would make the API confusing +- Creating the validated type requires unsafe (might not be worth it) + +--- + +## Summary: Unsafe Decision Tree + +``` +Do you need unsafe? +├── Can you use a safe API? → NO UNSAFE (always prefer this) +├── Performance-critical inner loop → Safe wrapper + unsafe core +├── FFI (calling C) → extern "C" + safe wrapper +├── Custom data structure → raw pointers + NonNull + PhantomData +└── Thread safety assertion → unsafe impl Send/Sync + +Writing an unsafe block? +├── Add // SAFETY: comment (MANDATORY) +├── What invariants does the unsafe op require? +├── How are those invariants guaranteed HERE? +└── Would a reviewer agree with your proof? + +Designing an unsafe fn? +├── Document # Safety section (contract with caller) +├── What must the caller guarantee? +├── Can you offer a safe alternative? (almost always yes) +└── Name it with _unchecked suffix +``` + +| Pattern | Use when | +|---|---| +| `// SAFETY:` comment | Every `unsafe {}` block | +| `unsafe fn` | Preconditions callers must guarantee | +| Safe wrapper + unsafe core | Public API with bounds/validity checks | +| `MaybeUninit` | Avoiding unnecessary initialization | +| `transmute` | Zero-cost type reinterpretation | +| `ptr::read`/`write`/`copy` | Custom data structure internals | +| `unsafe impl Send/Sync` | Asserting thread safety for raw-pointer types | +| `NonNull` + `PhantomData` | Encoding invariants in pointer wrappers | +| `extern "C"` | FFI (calling C libraries) | +| Type-encoded invariants | Make invalid states unrepresentable | + +See also: +- [documentation.md](documentation.md) — # Safety doc sections +- [concurrency.md](concurrency.md) — Send/Sync auto traits +- [ownership.md](ownership.md) — Raw pointers vs references +- [traits.md](traits.md) — Marker traits and sealed traits + +