Files

T

Rodin f270fc68c1 docs: add smell files (anti-patterns + common mistakes)

anti-patterns.md: 10 entries, 490 lines — what the stdlib avoids
common-mistakes.md: 10 entries, 508 lines — language transition smells

Covers: unwrap in library code, String errors, unsafe without SAFETY,
clone as borrow-checker escape, Deref as inheritance, panic for
recoverable errors, god structs, Box<dyn Any>, static mut, stringly-typed APIs,
Arc<Mutex> everywhere, collect then iterate, index loops, try/catch
mentality, null sentinels, OOP hierarchies, global state, impl vs dyn.

2026-04-30 16:01:52 -07:00

15 KiB

Raw Permalink Blame History

Anti-Patterns: What Rust's Stdlib Avoids (and Why)

Patterns the Rust standard library team actively avoids, extracted from studying what they DON'T do in their source code.

Source: rust-lang/rust at commit f53b654

1. unwrap() in Library Code

What they avoid: Using .unwrap() in non-test, non-example code without an invariant proof.

Source evidence: 1,790 unwrap/expect calls in library/ (non-test). Concentrated in: (1) proven invariants with expect() messages, (2) compile-time constants, (3) doc examples. Zero bare .unwrap() calls where the value could legitimately be None/Err at runtime.

Why it's bad: unwrap() panics. In library code, a panic kills the caller's thread without giving them a chance to handle the error. Libraries should never make that decision for their users.

What to do instead:

// BAD — library code that panics on user input
pub fn parse_config(input: &str) -> Config {
    let value: serde_json::Value = serde_json::from_str(input).unwrap();
    Config { name: value["name"].as_str().unwrap().to_string() }
}

// GOOD — returns Result, caller decides how to handle
pub fn parse_config(input: &str) -> Result<Config, ConfigError> {
    let value: serde_json::Value = serde_json::from_str(input)?;
    let name = value["name"].as_str()
        .ok_or(ConfigError::MissingField("name"))?;
    Ok(Config { name: name.to_string() })
}

When to Apply This Rule

Triggers:

.unwrap() in any pub fn that takes external input
.unwrap() without a comment explaining why it can't fail
.unwrap() in a library (as opposed to a binary/application)

Exceptions

It's OK when:

Proven invariant: self.items.last().expect("items is never empty after init")
Compile-time constant: Regex::new(r"^\d+$").unwrap() (pattern is always valid)
Test code: tests should panic on unexpected errors
Application code where you WANT to crash on bad state

2. String as Error Type

What they avoid: Using String or &str as the error type in Result<T, String>.

Source evidence: 0 public functions in library/ return Result<T, String>. Every error is a named struct or enum that implements the Error trait.

Why it's bad: Can't match on it programmatically. Callers must string-compare to determine what went wrong. No type safety. No structured data. Breaks the entire Error trait ecosystem.

What to do instead:

// BAD — stringly-typed error
fn connect(addr: &str) -> Result<Connection, String> {
    if !valid(addr) { return Err(format!("invalid address: {addr}")); }
    if timeout() { return Err("connection timed out".to_string()); }
    Ok(conn)
}
// Caller: if err.contains("timeout") { ... } — fragile!

// GOOD — typed error enum
#[derive(Debug)]
enum ConnectError {
    InvalidAddress(String),
    Timeout(Duration),
    Refused,
}
impl fmt::Display for ConnectError { ... }
impl std::error::Error for ConnectError {}

fn connect(addr: &str) -> Result<Connection, ConnectError> { ... }
// Caller: match err { ConnectError::Timeout(d) => retry(d), ... }

When to Apply This Rule

Triggers:

Result<T, String> in any function signature
Error handling that does string matching
Multiple different failure modes lumped into one String

Exceptions

It's OK when:

Quick scripts/prototypes that will be thrown away
Internal one-off tools where the only consumer is a human reading logs

3. Unsafe Without SAFETY Comment

What they avoid: Any unsafe { } block without an accompanying // SAFETY: comment explaining the soundness proof.

Source evidence: 2,463 // SAFETY: comments for 31,244 unsafe blocks. The density is lower than 1:1 because many unsafe blocks are one-liners inside already-documented unsafe functions. But every PUBLIC-facing unsafe block has a comment.

Why it's bad: Without the comment, no one (including future-you) can audit whether the unsafe code is actually sound. It might be correct today but break after a refactor that invalidates an assumption no one documented.

What to do instead:

// BAD — unsafe with no justification
unsafe {
    ptr::copy_nonoverlapping(src, dst, len);
}

// GOOD — proves soundness at the call site
// SAFETY: src points to self.buf[0..self.len] and dst points to
// out[0..self.len]. They're from different allocations so they
// can't overlap. len is bounded by self.len which we checked above.
unsafe {
    ptr::copy_nonoverlapping(src, dst, len);
}

When to Apply This Rule

Triggers:

Every unsafe { } block (no exceptions)
The comment should explain WHY, not just WHAT

Exceptions

None. This rule has no exceptions in the stdlib.

4. Clone to Satisfy the Borrow Checker

What they avoid: Sprinkling .clone() calls to make the borrow checker happy instead of restructuring the code.

Source evidence: The stdlib uses clone deliberately for explicit copies, never as a borrow-checker escape hatch. Clone calls are concentrated in: (1) actual need for independent copies, (2) Cow clone-on-write, (3) test setup.

Why it's bad: Hides the real problem (ownership structure is wrong). Adds unnecessary allocations. Makes performance unpredictable. The borrow checker error is telling you your design has a flaw.

What to do instead:

// BAD — cloning because the borrow checker complains
fn process(data: &mut Vec<String>) {
    for item in data.clone().iter() {  // CLONE to avoid borrow conflict
        if item.is_empty() {
            data.retain(|s| !s.is_empty());
        }
    }
}

// GOOD — restructure to avoid the conflict
fn process(data: &mut Vec<String>) {
    data.retain(|s| !s.is_empty());  // just do what you actually want
}

When to Apply This Rule

Triggers:

.clone() adjacent to a borrow checker error comment
Clone inside a loop (O(n) allocations)
Cloning a large struct just to use one field

Exceptions

It's OK when:

You genuinely need two independent copies (fork a value)
The type is cheap to clone (small Copy types, Arc)
Prototyping to get the logic right before optimizing

5. Deref as Inheritance

What they avoid: Using the Deref trait to simulate inheritance from OOP languages.

Source evidence: Every Deref impl in the stdlib follows the "smart pointer" pattern: Box→T, Vec→[T], String→str, Arc→T. Zero cases of "struct A derefs to struct B" where B isn't the logical inner content.

Why it's bad: Deref coercion is implicit — the compiler inserts it silently. If Dog derefs to Animal, users can accidentally call Animal methods on Dog without realizing they're crossing an abstraction boundary. Method resolution becomes unpredictable.

What to do instead:

// BAD — using Deref as inheritance
struct Animal { name: String }
struct Dog { animal: Animal, tricks: Vec<String> }

impl Deref for Dog {
    type Target = Animal;
    fn deref(&self) -> &Animal { &self.animal }
}
// dog.name works — but it's confusing, not actually inheritance

// GOOD — explicit delegation
struct Dog { name: String, tricks: Vec<String> }

impl Dog {
    fn name(&self) -> &str { &self.name }
}

When to Apply This Rule

Triggers:

Deref where the target type isn't "contained" content
Using Deref to avoid writing delegation methods
Deref between two types at the same abstraction level

Exceptions

It's OK when:

Your type IS a smart pointer (wraps and owns one inner value)
The target is clearly "the thing inside" (Vec→[T], Box→T)
Deref target is at a lower abstraction level (concrete → abstract)

6. Panic for Recoverable Errors

What they avoid: Using panic!() for conditions that callers could reasonably handle.

Source evidence: 514 panic! calls in library/ (non-test). ALL are for: (1) invariant violations (bugs), (2) index out of bounds where the contract says "index must be valid", (3) unimplemented/unreachable. Zero panics for "file not found" or "parse failed" type errors.

Why it's bad: Panic unwinds the stack. The caller can't recover gracefully. In servers, it kills the request. In libraries, it's a contract violation — you're deciding for the user that this error is fatal.

What to do instead:

// BAD — panicking on user-facing errors
pub fn load_config(path: &str) -> Config {
    let text = std::fs::read_to_string(path)
        .expect("failed to read config");  // KILLS caller's thread
    toml::from_str(&text).expect("invalid config")
}

// GOOD — let the caller decide
pub fn load_config(path: &str) -> Result<Config, ConfigError> {
    let text = std::fs::read_to_string(path)?;
    let config = toml::from_str(&text)?;
    Ok(config)
}

When to Apply This Rule

Triggers:

panic!/expect/unwrap on external input (files, network, user data)
panic in library code (not application code)
"Failed to X" in expect messages where X is an I/O operation

Exceptions

It's OK when:

Programmer bug: index out of bounds on internal array (invariant violated)
Impossible state: match arm that can't be reached by construction
Application main(): crashing on startup config failure is reasonable

7. God Struct (All State in One Place)

What they avoid: A single struct that holds the entire application state with dozens of fields.

Source evidence: The stdlib separates concerns into distinct types. io::Error has 2 fields. Vec has 3 fields. The largest public struct is OsString at 1 field (the inner platform string). No struct in the stdlib has more than 6-7 fields.

Why it's bad: Can't borrow parts independently (borrowing one field borrows the whole struct). Hard to test in isolation. Methods grow unboundedly. Violates single responsibility.

What to do instead:

// BAD — god struct
struct App {
    db: Database,
    cache: Cache,
    config: Config,
    logger: Logger,
    metrics: Metrics,
    users: HashMap<UserId, User>,
    sessions: HashMap<SessionId, Session>,
    pending_jobs: VecDeque<Job>,
}
// Every method has access to everything. Can't test cache without db.

// GOOD — separated concerns
struct App { db: Database, cache: CacheLayer, jobs: JobQueue }
struct CacheLayer { inner: Cache, config: CacheConfig }
struct JobQueue { pending: VecDeque<Job>, metrics: Metrics }

When to Apply This Rule

Triggers:

Struct with 8+ fields
Methods that only use 2-3 of the struct's fields
Can't write a unit test without constructing the entire struct
Borrow checker fights when trying to use two parts simultaneously

Exceptions

It's OK when:

Configuration structs (many fields, all read-only, constructed once)
DTOs for serialization (matching an external schema)
Builder intermediate state (temporarily holds many optional fields)

8. Box Instead of Generics

What they avoid: Using Box<dyn Any> with downcasting to erase types instead of using proper generics.

Source evidence: Any is used in the stdlib only for: panic payloads (Box<dyn Any + Send>) and the reflection system. Zero uses of dyn Any as a general-purpose container or map value.

Why it's bad: Loses all type information. Downcasting can fail at runtime. You've recreated dynamic typing inside a statically-typed language — the compiler can't help you anymore.

What to do instead:

// BAD — type erasure via Any (Java's Object approach)
struct Registry {
    items: HashMap<String, Box<dyn Any>>,
}
impl Registry {
    fn get<T: 'static>(&self, key: &str) -> Option<&T> {
        self.items.get(key)?.downcast_ref()  // runtime failure if wrong type!
    }
}

// GOOD — use generics or typed enums
enum ConfigValue { String(String), Int(i64), Bool(bool) }
struct Config { values: HashMap<String, ConfigValue> }
// Or: use generics to make the container type-safe
struct TypedMap(HashMap<TypeId, Box<dyn Any>>);  // at least TypeId keyed

When to Apply This Rule

Triggers:

Box<dyn Any> in a non-error, non-panic context
.downcast_ref::<T>() calls scattered through the code
"I don't know what type this will be" — you probably do

Exceptions

It's OK when:

Panic handling (that's what catch_unwind returns)
Plugin systems where types genuinely aren't known at compile time
Thin FFI wrappers bridging to dynamic languages

9. Shared Mutable State Without Synchronization

What they avoid: Any static mut or mutable global state without proper synchronization primitives.

Source evidence: Zero static mut in public library APIs. All shared mutable state uses: AtomicUsize (783 usages), Mutex (135 usages), OnceLock (254 usages), or thread_local!. The compiler prevents data races at compile time via Send/Sync.

Why it's bad: static mut is always unsafe and the unsoundest construct in Rust. Multiple threads accessing it simultaneously is undefined behavior. Even single-threaded access requires unsafe.

What to do instead:

// BAD — global mutable state (UB waiting to happen)
static mut COUNTER: u64 = 0;
fn increment() {
    unsafe { COUNTER += 1; }  // DATA RACE if called from multiple threads
}

// GOOD — atomic for simple counters
static COUNTER: AtomicU64 = AtomicU64::new(0);
fn increment() {
    COUNTER.fetch_add(1, Ordering::Relaxed);  // thread-safe, no unsafe
}

// GOOD — OnceLock for lazy initialization
static CONFIG: OnceLock<Config> = OnceLock::new();
fn get_config() -> &'static Config {
    CONFIG.get_or_init(|| load_config())
}

When to Apply This Rule

Triggers:

static mut anywhere
unsafe blocks that access global state
Global state without atomic/mutex/rwlock protection

Exceptions

Literally none for static mut. There is no safe use case that can't be expressed with atomics, OnceLock, or Mutex. The Rust team has discussed deprecating static mut entirely.

10. Stringly-Typed APIs (Magic Constants)

What they avoid: Using string constants where enums would provide compile-time checking.

Source evidence: The stdlib uses enums extensively: ErrorKind (30+ variants), Ordering (3 variants), SeekFrom (3 variants). String parameters are only for genuinely dynamic data (file paths, user input, format strings).

Why it's bad: Typos compile fine. "relaexd" won't be caught. No autocomplete. No exhaustive matching. Refactoring is grep-based instead of compiler-based.

What to do instead:

// BAD — string constants (invisible typos)
fn set_log_level(level: &str) { ... }
set_log_level("wraning");  // typo compiles fine — bug at runtime

// GOOD — enum (typos are compile errors)
enum LogLevel { Error, Warning, Info, Debug, Trace }
fn set_log_level(level: LogLevel) { ... }
set_log_level(LogLevel::Wraning);  // COMPILE ERROR — caught immediately

When to Apply This Rule

Triggers:

String parameters with a fixed set of valid values
Match/if-else chains comparing strings
Documentation that says "valid values are: X, Y, Z"

Exceptions

It's OK when:

Values are genuinely dynamic (user input, file paths, URLs)
Interop with external systems that use strings (HTTP headers, env vars)
The set of values changes frequently (but consider #[non_exhaustive] enum)

15 KiB Raw Permalink Blame History

Anti-Patterns: What Rust's Stdlib Avoids (and Why)

1. unwrap() in Library Code

When to Apply This Rule

Exceptions

2. String as Error Type

When to Apply This Rule

Exceptions

3. Unsafe Without SAFETY Comment

When to Apply This Rule

Exceptions

4. Clone to Satisfy the Borrow Checker

When to Apply This Rule

Exceptions

5. Deref as Inheritance

When to Apply This Rule

Exceptions

6. Panic for Recoverable Errors

When to Apply This Rule

Exceptions

7. God Struct (All State in One Place)

When to Apply This Rule

Exceptions

8. Box Instead of Generics

When to Apply This Rule

Exceptions

9. Shared Mutable State Without Synchronization

When to Apply This Rule

Exceptions

10. Stringly-Typed APIs (Magic Constants)

When to Apply This Rule

Exceptions

15 KiB

Raw Permalink Blame History