docs: add smell files (anti-patterns + common mistakes)
anti-patterns.md: 10 entries, 490 lines — what the stdlib avoids common-mistakes.md: 10 entries, 508 lines — language transition smells Covers: unwrap in library code, String errors, unsafe without SAFETY, clone as borrow-checker escape, Deref as inheritance, panic for recoverable errors, god structs, Box<dyn Any>, static mut, stringly-typed APIs, Arc<Mutex> everywhere, collect then iterate, index loops, try/catch mentality, null sentinels, OOP hierarchies, global state, impl vs dyn.
This commit is contained in:
@@ -0,0 +1,490 @@
|
||||
# Anti-Patterns: What Rust's Stdlib Avoids (and Why)
|
||||
|
||||
Patterns the Rust standard library team actively avoids, extracted
|
||||
from studying what they DON'T do in their source code.
|
||||
|
||||
**Source:** [rust-lang/rust](https://github.com/rust-lang/rust) at commit
|
||||
[`f53b654`](https://github.com/rust-lang/rust/tree/f53b654a8882fd5fc036c4ca7a4ff41ce32497a6)
|
||||
|
||||
---
|
||||
|
||||
## 1. unwrap() in Library Code
|
||||
|
||||
**What they avoid:** Using `.unwrap()` in non-test, non-example code
|
||||
without an invariant proof.
|
||||
|
||||
**Source evidence:** 1,790 unwrap/expect calls in library/ (non-test).
|
||||
Concentrated in: (1) proven invariants with expect() messages,
|
||||
(2) compile-time constants, (3) doc examples. Zero bare `.unwrap()`
|
||||
calls where the value could legitimately be None/Err at runtime.
|
||||
|
||||
**Why it's bad:** unwrap() panics. In library code, a panic kills the
|
||||
caller's thread without giving them a chance to handle the error.
|
||||
Libraries should never make that decision for their users.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — library code that panics on user input
|
||||
pub fn parse_config(input: &str) -> Config {
|
||||
let value: serde_json::Value = serde_json::from_str(input).unwrap();
|
||||
Config { name: value["name"].as_str().unwrap().to_string() }
|
||||
}
|
||||
|
||||
// GOOD — returns Result, caller decides how to handle
|
||||
pub fn parse_config(input: &str) -> Result<Config, ConfigError> {
|
||||
let value: serde_json::Value = serde_json::from_str(input)?;
|
||||
let name = value["name"].as_str()
|
||||
.ok_or(ConfigError::MissingField("name"))?;
|
||||
Ok(Config { name: name.to_string() })
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- `.unwrap()` in any `pub fn` that takes external input
|
||||
- `.unwrap()` without a comment explaining why it can't fail
|
||||
- `.unwrap()` in a library (as opposed to a binary/application)
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Proven invariant: `self.items.last().expect("items is never empty after init")`
|
||||
- Compile-time constant: `Regex::new(r"^\d+$").unwrap()` (pattern is always valid)
|
||||
- Test code: tests should panic on unexpected errors
|
||||
- Application code where you WANT to crash on bad state
|
||||
|
||||
---
|
||||
|
||||
## 2. String as Error Type
|
||||
|
||||
**What they avoid:** Using `String` or `&str` as the error type in
|
||||
`Result<T, String>`.
|
||||
|
||||
**Source evidence:** 0 public functions in library/ return
|
||||
`Result<T, String>`. Every error is a named struct or enum that
|
||||
implements the Error trait.
|
||||
|
||||
**Why it's bad:** Can't match on it programmatically. Callers must
|
||||
string-compare to determine what went wrong. No type safety. No
|
||||
structured data. Breaks the entire Error trait ecosystem.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — stringly-typed error
|
||||
fn connect(addr: &str) -> Result<Connection, String> {
|
||||
if !valid(addr) { return Err(format!("invalid address: {addr}")); }
|
||||
if timeout() { return Err("connection timed out".to_string()); }
|
||||
Ok(conn)
|
||||
}
|
||||
// Caller: if err.contains("timeout") { ... } — fragile!
|
||||
|
||||
// GOOD — typed error enum
|
||||
#[derive(Debug)]
|
||||
enum ConnectError {
|
||||
InvalidAddress(String),
|
||||
Timeout(Duration),
|
||||
Refused,
|
||||
}
|
||||
impl fmt::Display for ConnectError { ... }
|
||||
impl std::error::Error for ConnectError {}
|
||||
|
||||
fn connect(addr: &str) -> Result<Connection, ConnectError> { ... }
|
||||
// Caller: match err { ConnectError::Timeout(d) => retry(d), ... }
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- `Result<T, String>` in any function signature
|
||||
- Error handling that does string matching
|
||||
- Multiple different failure modes lumped into one String
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Quick scripts/prototypes that will be thrown away
|
||||
- Internal one-off tools where the only consumer is a human reading logs
|
||||
|
||||
---
|
||||
|
||||
## 3. Unsafe Without SAFETY Comment
|
||||
|
||||
**What they avoid:** Any `unsafe { }` block without an accompanying
|
||||
`// SAFETY:` comment explaining the soundness proof.
|
||||
|
||||
**Source evidence:** 2,463 `// SAFETY:` comments for 31,244 unsafe
|
||||
blocks. The density is lower than 1:1 because many unsafe blocks are
|
||||
one-liners inside already-documented unsafe functions. But every
|
||||
PUBLIC-facing unsafe block has a comment.
|
||||
|
||||
**Why it's bad:** Without the comment, no one (including future-you)
|
||||
can audit whether the unsafe code is actually sound. It might be
|
||||
correct today but break after a refactor that invalidates an
|
||||
assumption no one documented.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — unsafe with no justification
|
||||
unsafe {
|
||||
ptr::copy_nonoverlapping(src, dst, len);
|
||||
}
|
||||
|
||||
// GOOD — proves soundness at the call site
|
||||
// SAFETY: src points to self.buf[0..self.len] and dst points to
|
||||
// out[0..self.len]. They're from different allocations so they
|
||||
// can't overlap. len is bounded by self.len which we checked above.
|
||||
unsafe {
|
||||
ptr::copy_nonoverlapping(src, dst, len);
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- Every `unsafe { }` block (no exceptions)
|
||||
- The comment should explain WHY, not just WHAT
|
||||
|
||||
### Exceptions
|
||||
|
||||
**None.** This rule has no exceptions in the stdlib.
|
||||
|
||||
---
|
||||
|
||||
## 4. Clone to Satisfy the Borrow Checker
|
||||
|
||||
**What they avoid:** Sprinkling `.clone()` calls to make the borrow
|
||||
checker happy instead of restructuring the code.
|
||||
|
||||
**Source evidence:** The stdlib uses clone deliberately for explicit
|
||||
copies, never as a borrow-checker escape hatch. Clone calls are
|
||||
concentrated in: (1) actual need for independent copies, (2) Cow
|
||||
clone-on-write, (3) test setup.
|
||||
|
||||
**Why it's bad:** Hides the real problem (ownership structure is wrong).
|
||||
Adds unnecessary allocations. Makes performance unpredictable. The
|
||||
borrow checker error is telling you your design has a flaw.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — cloning because the borrow checker complains
|
||||
fn process(data: &mut Vec<String>) {
|
||||
for item in data.clone().iter() { // CLONE to avoid borrow conflict
|
||||
if item.is_empty() {
|
||||
data.retain(|s| !s.is_empty());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// GOOD — restructure to avoid the conflict
|
||||
fn process(data: &mut Vec<String>) {
|
||||
data.retain(|s| !s.is_empty()); // just do what you actually want
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- `.clone()` adjacent to a borrow checker error comment
|
||||
- Clone inside a loop (O(n) allocations)
|
||||
- Cloning a large struct just to use one field
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- You genuinely need two independent copies (fork a value)
|
||||
- The type is cheap to clone (small Copy types, Arc)
|
||||
- Prototyping to get the logic right before optimizing
|
||||
|
||||
---
|
||||
|
||||
## 5. Deref as Inheritance
|
||||
|
||||
**What they avoid:** Using the `Deref` trait to simulate inheritance
|
||||
from OOP languages.
|
||||
|
||||
**Source evidence:** Every Deref impl in the stdlib follows the
|
||||
"smart pointer" pattern: Box→T, Vec→[T], String→str, Arc→T.
|
||||
Zero cases of "struct A derefs to struct B" where B isn't the
|
||||
logical inner content.
|
||||
|
||||
**Why it's bad:** Deref coercion is implicit — the compiler inserts
|
||||
it silently. If Dog derefs to Animal, users can accidentally call
|
||||
Animal methods on Dog without realizing they're crossing an
|
||||
abstraction boundary. Method resolution becomes unpredictable.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — using Deref as inheritance
|
||||
struct Animal { name: String }
|
||||
struct Dog { animal: Animal, tricks: Vec<String> }
|
||||
|
||||
impl Deref for Dog {
|
||||
type Target = Animal;
|
||||
fn deref(&self) -> &Animal { &self.animal }
|
||||
}
|
||||
// dog.name works — but it's confusing, not actually inheritance
|
||||
|
||||
// GOOD — explicit delegation
|
||||
struct Dog { name: String, tricks: Vec<String> }
|
||||
|
||||
impl Dog {
|
||||
fn name(&self) -> &str { &self.name }
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- Deref where the target type isn't "contained" content
|
||||
- Using Deref to avoid writing delegation methods
|
||||
- Deref between two types at the same abstraction level
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Your type IS a smart pointer (wraps and owns one inner value)
|
||||
- The target is clearly "the thing inside" (Vec→[T], Box→T)
|
||||
- Deref target is at a lower abstraction level (concrete → abstract)
|
||||
|
||||
---
|
||||
|
||||
## 6. Panic for Recoverable Errors
|
||||
|
||||
**What they avoid:** Using `panic!()` for conditions that callers
|
||||
could reasonably handle.
|
||||
|
||||
**Source evidence:** 514 panic! calls in library/ (non-test). ALL are
|
||||
for: (1) invariant violations (bugs), (2) index out of bounds where
|
||||
the contract says "index must be valid", (3) unimplemented/unreachable.
|
||||
Zero panics for "file not found" or "parse failed" type errors.
|
||||
|
||||
**Why it's bad:** Panic unwinds the stack. The caller can't recover
|
||||
gracefully. In servers, it kills the request. In libraries, it's
|
||||
a contract violation — you're deciding for the user that this error
|
||||
is fatal.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — panicking on user-facing errors
|
||||
pub fn load_config(path: &str) -> Config {
|
||||
let text = std::fs::read_to_string(path)
|
||||
.expect("failed to read config"); // KILLS caller's thread
|
||||
toml::from_str(&text).expect("invalid config")
|
||||
}
|
||||
|
||||
// GOOD — let the caller decide
|
||||
pub fn load_config(path: &str) -> Result<Config, ConfigError> {
|
||||
let text = std::fs::read_to_string(path)?;
|
||||
let config = toml::from_str(&text)?;
|
||||
Ok(config)
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- panic!/expect/unwrap on external input (files, network, user data)
|
||||
- panic in library code (not application code)
|
||||
- "Failed to X" in expect messages where X is an I/O operation
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Programmer bug: index out of bounds on internal array (invariant violated)
|
||||
- Impossible state: match arm that can't be reached by construction
|
||||
- Application main(): crashing on startup config failure is reasonable
|
||||
|
||||
---
|
||||
|
||||
## 7. God Struct (All State in One Place)
|
||||
|
||||
**What they avoid:** A single struct that holds the entire application
|
||||
state with dozens of fields.
|
||||
|
||||
**Source evidence:** The stdlib separates concerns into distinct types.
|
||||
`io::Error` has 2 fields. `Vec` has 3 fields. The largest public
|
||||
struct is OsString at 1 field (the inner platform string). No struct
|
||||
in the stdlib has more than 6-7 fields.
|
||||
|
||||
**Why it's bad:** Can't borrow parts independently (borrowing one field
|
||||
borrows the whole struct). Hard to test in isolation. Methods grow
|
||||
unboundedly. Violates single responsibility.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — god struct
|
||||
struct App {
|
||||
db: Database,
|
||||
cache: Cache,
|
||||
config: Config,
|
||||
logger: Logger,
|
||||
metrics: Metrics,
|
||||
users: HashMap<UserId, User>,
|
||||
sessions: HashMap<SessionId, Session>,
|
||||
pending_jobs: VecDeque<Job>,
|
||||
}
|
||||
// Every method has access to everything. Can't test cache without db.
|
||||
|
||||
// GOOD — separated concerns
|
||||
struct App { db: Database, cache: CacheLayer, jobs: JobQueue }
|
||||
struct CacheLayer { inner: Cache, config: CacheConfig }
|
||||
struct JobQueue { pending: VecDeque<Job>, metrics: Metrics }
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- Struct with 8+ fields
|
||||
- Methods that only use 2-3 of the struct's fields
|
||||
- Can't write a unit test without constructing the entire struct
|
||||
- Borrow checker fights when trying to use two parts simultaneously
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Configuration structs (many fields, all read-only, constructed once)
|
||||
- DTOs for serialization (matching an external schema)
|
||||
- Builder intermediate state (temporarily holds many optional fields)
|
||||
|
||||
---
|
||||
|
||||
## 8. Box<dyn Any> Instead of Generics
|
||||
|
||||
**What they avoid:** Using `Box<dyn Any>` with downcasting to erase
|
||||
types instead of using proper generics.
|
||||
|
||||
**Source evidence:** `Any` is used in the stdlib only for: panic
|
||||
payloads (`Box<dyn Any + Send>`) and the reflection system. Zero
|
||||
uses of `dyn Any` as a general-purpose container or map value.
|
||||
|
||||
**Why it's bad:** Loses all type information. Downcasting can fail at
|
||||
runtime. You've recreated dynamic typing inside a statically-typed
|
||||
language — the compiler can't help you anymore.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — type erasure via Any (Java's Object approach)
|
||||
struct Registry {
|
||||
items: HashMap<String, Box<dyn Any>>,
|
||||
}
|
||||
impl Registry {
|
||||
fn get<T: 'static>(&self, key: &str) -> Option<&T> {
|
||||
self.items.get(key)?.downcast_ref() // runtime failure if wrong type!
|
||||
}
|
||||
}
|
||||
|
||||
// GOOD — use generics or typed enums
|
||||
enum ConfigValue { String(String), Int(i64), Bool(bool) }
|
||||
struct Config { values: HashMap<String, ConfigValue> }
|
||||
// Or: use generics to make the container type-safe
|
||||
struct TypedMap(HashMap<TypeId, Box<dyn Any>>); // at least TypeId keyed
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- `Box<dyn Any>` in a non-error, non-panic context
|
||||
- `.downcast_ref::<T>()` calls scattered through the code
|
||||
- "I don't know what type this will be" — you probably do
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Panic handling (that's what `catch_unwind` returns)
|
||||
- Plugin systems where types genuinely aren't known at compile time
|
||||
- Thin FFI wrappers bridging to dynamic languages
|
||||
|
||||
---
|
||||
|
||||
## 9. Shared Mutable State Without Synchronization
|
||||
|
||||
**What they avoid:** Any `static mut` or mutable global state without
|
||||
proper synchronization primitives.
|
||||
|
||||
**Source evidence:** Zero `static mut` in public library APIs.
|
||||
All shared mutable state uses: `AtomicUsize` (783 usages),
|
||||
`Mutex` (135 usages), `OnceLock` (254 usages), or `thread_local!`.
|
||||
The compiler prevents data races at compile time via Send/Sync.
|
||||
|
||||
**Why it's bad:** `static mut` is always unsafe and the unsoundest
|
||||
construct in Rust. Multiple threads accessing it simultaneously is
|
||||
undefined behavior. Even single-threaded access requires unsafe.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — global mutable state (UB waiting to happen)
|
||||
static mut COUNTER: u64 = 0;
|
||||
fn increment() {
|
||||
unsafe { COUNTER += 1; } // DATA RACE if called from multiple threads
|
||||
}
|
||||
|
||||
// GOOD — atomic for simple counters
|
||||
static COUNTER: AtomicU64 = AtomicU64::new(0);
|
||||
fn increment() {
|
||||
COUNTER.fetch_add(1, Ordering::Relaxed); // thread-safe, no unsafe
|
||||
}
|
||||
|
||||
// GOOD — OnceLock for lazy initialization
|
||||
static CONFIG: OnceLock<Config> = OnceLock::new();
|
||||
fn get_config() -> &'static Config {
|
||||
CONFIG.get_or_init(|| load_config())
|
||||
}
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- `static mut` anywhere
|
||||
- `unsafe` blocks that access global state
|
||||
- Global state without atomic/mutex/rwlock protection
|
||||
|
||||
### Exceptions
|
||||
|
||||
**Literally none for `static mut`.** There is no safe use case that
|
||||
can't be expressed with atomics, OnceLock, or Mutex. The Rust team
|
||||
has discussed deprecating `static mut` entirely.
|
||||
|
||||
---
|
||||
|
||||
## 10. Stringly-Typed APIs (Magic Constants)
|
||||
|
||||
**What they avoid:** Using string constants where enums would provide
|
||||
compile-time checking.
|
||||
|
||||
**Source evidence:** The stdlib uses enums extensively: `ErrorKind`
|
||||
(30+ variants), `Ordering` (3 variants), `SeekFrom` (3 variants).
|
||||
String parameters are only for genuinely dynamic data (file paths,
|
||||
user input, format strings).
|
||||
|
||||
**Why it's bad:** Typos compile fine. `"relaexd"` won't be caught.
|
||||
No autocomplete. No exhaustive matching. Refactoring is grep-based
|
||||
instead of compiler-based.
|
||||
|
||||
**What to do instead:**
|
||||
```rust
|
||||
// BAD — string constants (invisible typos)
|
||||
fn set_log_level(level: &str) { ... }
|
||||
set_log_level("wraning"); // typo compiles fine — bug at runtime
|
||||
|
||||
// GOOD — enum (typos are compile errors)
|
||||
enum LogLevel { Error, Warning, Info, Debug, Trace }
|
||||
fn set_log_level(level: LogLevel) { ... }
|
||||
set_log_level(LogLevel::Wraning); // COMPILE ERROR — caught immediately
|
||||
```
|
||||
|
||||
### When to Apply This Rule
|
||||
|
||||
**Triggers:**
|
||||
- String parameters with a fixed set of valid values
|
||||
- Match/if-else chains comparing strings
|
||||
- Documentation that says "valid values are: X, Y, Z"
|
||||
|
||||
### Exceptions
|
||||
|
||||
**It's OK when:**
|
||||
- Values are genuinely dynamic (user input, file paths, URLs)
|
||||
- Interop with external systems that use strings (HTTP headers, env vars)
|
||||
- The set of values changes frequently (but consider #[non_exhaustive] enum)
|
||||
|
||||
<!-- PATTERN_COMPLETE -->
|
||||
Reference in New Issue
Block a user