Building #[flatten] for a trait-object serialization framework: a Rust ownership puzzle
Last year we ported vim_rs — a VMware vSphere SDK with roughly 7,000 serializable types — from serde to miniserde. The results were hard to ignore: release build time on Windows AMD64 dropped from 19 minutes to 6, and the release binary shrank from 40 MB to 12 MB.
The savings came from several directions. The biggest was eliminating serde’s monomorphization — every Serialize and Deserialize impl stamps out generic code for every combination of serializer and data type, and at 7,000 types the cost compounds fast. But the migration also overhauled how vim_rs models the vSphere type hierarchy. The WSDL schema uses inheritance: VirtualE1000 extends VirtualEthernetCard, which extends VirtualDevice — sometimes four or five levels deep. Under the old serde-based architecture, vim_rs handled this by brute-force flattening at the struct level: a child struct explicitly duplicated every field from all of its ancestors alongside its own. To maintain polymorphism, the code generator created a trait for every parent type with accessor methods for each inherited field, and wired up impl blocks for every child. Since the child struct physically contained all the fields, the generated serde code just serialized them directly. But the cost was staggering: thousands of duplicated fields, thousands of boilerplate trait implementations, and a massive monomorphized serde footprint for every distinct type.
The miniserde migration replaced this with a composition-based design: each child struct holds its parent as a field (virtual_device_: VirtualDevice) and exposes it through Deref/DerefMut for ergonomic access. The code generator, vim_build, still inlines all ancestor fields into each child’s serializer and deserializer — VirtualEthernetCardSerializer emits "key" by reaching through self.data.virtual_device_.key; VirtualEthernetCardFields has a flat f0..f16 covering parent and child fields alike — but the accessor-trait scaffolding is gone. Combined with replacing derive macros on enums with direct generated code, these changes added up to the dramatic build-time and binary-size improvements.
The serialization side of the inline-flatten pattern works well — it is fast, zero-allocation, and zero-overhead — but it is a closed solution. It requires a code generator with full prior knowledge of every type it will ever encounter. You cannot use it for hand-written structs, for types from external crates, or for any scenario where the child does not know the parent’s fields at code-generation time.
We wanted a general-purpose #[mini(flatten)] for midiserde, our companion crate that brings serde-like ergonomics to miniserde. One that works the way serde’s #[serde(flatten)] does — you annotate a field, and the macro figures out the rest — but without serde’s deserialization buffering. Serde’s flatten carries a hidden cost: on deserialization, it buffers every key-value pair into a Vec<Option<(Content, Content)>> (serde’s internal Content enum, essentially a Value), then replays them to each flattened field via FlatMapDeserializer. Serialization is better designed — FlatMapSerializer writes directly into the parent SerializeMap without buffering — but the deserialization tax is real and grows with payload size.
It took three iterations to get there, and the obstacles were not algorithmic. They were ownership problems baked into the trait signatures themselves — the kind where the borrow checker is not wrong, it is protecting you from a real aliasing hazard, and you have to redesign your way out.
This article walks through those iterations. If you have ever fought E0506: cannot assign to value because it is borrowed in a loop, or wondered why Box<dyn Any> sometimes feels like the only escape hatch from Rust’s type system, the patterns here might save you a few hours.
Background: how miniserde works
Miniserde is dtolnay’s proof of concept for a serialization framework that avoids monomorphization. Where serde’s Serializer and Deserializer traits are generic over data types (producing a monomorphized code path for every struct you serialize), miniserde works entirely through dynamic dispatch.
Deserialization: the “place” pattern
Miniserde’s deserialization is built around three traits:
#![allow(unused)]
fn main() {
trait Deserialize: Sized {
fn begin(out: &mut Option<Self>) -> &mut dyn Visitor;
}
trait Visitor {
fn map(&mut self) -> Result<Box<dyn Map + '_>>;
// ... null, boolean, string, seq, etc.
}
trait Map {
fn key(&mut self, k: &str) -> Result<&mut dyn Visitor>;
fn finish(&mut self) -> Result<()>;
}
}
The key insight: begin takes &mut Option<Self> — a pre-allocated “place” where the result will be written. The derived Visitor holds a reference to that place. The Map returned by visitor.map() holds Option<FieldType> slots for each field, routes key() calls to the right slot’s Visitor, and in finish() assembles the struct and writes it into the place.
For a simple struct like Pagination { limit: u32, offset: u32, total: u32 }, the generated code looks like:
#![allow(unused)]
fn main() {
struct __State<'a> {
__limit: Option<u32>,
__offset: Option<u32>,
__total: Option<u32>,
__out: &'a mut Option<Pagination>,
}
impl Map for __State<'_> {
fn key(&mut self, k: &str) -> Result<&mut dyn Visitor> {
match k {
"limit" => Ok(Deserialize::begin(&mut self.__limit)),
"offset" => Ok(Deserialize::begin(&mut self.__offset)),
"total" => Ok(Deserialize::begin(&mut self.__total)),
_ => Ok(<dyn Visitor>::ignore()),
}
}
fn finish(&mut self) -> Result<()> {
let limit = self.__limit.take().ok_or(Error)?;
let offset = self.__offset.take().ok_or(Error)?;
let total = self.__total.take().ok_or(Error)?;
*self.__out = Some(Pagination { limit, offset, total });
Ok(())
}
}
}
This is elegant and zero-allocation (beyond the Box<dyn Map> returned by map()). Each field gets its own Option slot, key() routes by string match, and finish() moves everything into the output place.
Serialization: the streaming Map trait
Serialization mirrors this with a different set of traits:
#![allow(unused)]
fn main() {
trait Serialize {
fn begin(&self) -> Fragment;
}
enum Fragment<'a> {
Null, Bool(bool), Str(Cow<'a, str>), U64(u64), I64(i64), F64(f64),
Seq(Box<dyn Seq + 'a>),
Map(Box<dyn Map + 'a>),
}
trait Map {
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)>;
}
}
A struct’s Serialize::begin() returns Fragment::Map(box) wrapping a state machine that yields key-value pairs one at a time. The JSON writer calls next() in a loop until it returns None.
For Pagination:
#![allow(unused)]
fn main() {
struct __SerMap<'a> {
data: &'a Pagination,
state: usize,
}
impl Map for __SerMap<'_> {
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> {
let s = self.state;
self.state = s + 1;
match s {
0 => Some((Cow::Borrowed("limit"), &self.data.limit)),
1 => Some((Cow::Borrowed("offset"), &self.data.offset)),
2 => Some((Cow::Borrowed("total"), &self.data.total)),
_ => None,
}
}
}
}
Simple, zero-copy. The &dyn Serialize reference points directly into the original struct; no cloning, no intermediate representation.
The flatten problem
Now consider:
#![allow(unused)]
fn main() {
struct UsersResponse {
users: Vec<String>,
#[mini(flatten)]
pagination: Pagination,
}
}
The JSON looks like {"users":["a","b"],"limit":10,"offset":0,"total":100} — all fields at the same level. Flatten means the parent needs to share the JSON key stream with a child struct it does not know the fields of (at least, not at the source level — the derive macro does know, but we want the generated code to work through trait objects, not hard-coded field lists).
Deserialization challenge: who owns the keys?
The parent’s Map::key() gets called with "users", "limit", "offset", "total" in some order. It knows about "users" but not the others. It needs to delegate those to Pagination’s deserialization logic.
But miniserde’s Deserialize::begin(out: &mut Option<Self>) requires a pre-allocated Option<Pagination> — the parent would need to own one. And begin() returns &mut dyn Visitor, not &mut dyn Map — the parent receives a Visitor, but the JSON parser already told the parent it is a map. The parent needs a Map for the child, not a Visitor.
Serialization challenge: who yields the entries?
The parent’s ser::Map::next() must yield ("users", &self.data.users), then somehow yield ("limit", &self.data.pagination.limit), ("offset", ...), ("total", ...) from the child.
The parent cannot hardcode the child’s field names (that defeats the purpose of flatten). It needs to call into the child’s ser::Map::next() and forward the results.
Iteration 1: buffer everything (the serde approach)
Our first midiserde implementation used the same strategy as serde: buffer unknown keys as Vec<(String, Value)>, then replay them to the child in finish().
#![allow(unused)]
fn main() {
// Parent's Map during deserialization
fn key(&mut self, k: &str) -> Result<&mut dyn Visitor> {
match k {
"users" => Ok(Deserialize::begin(&mut self.users)),
_ => {
// Buffer unknown keys as Value
self.current_key = Some(k.to_string());
Ok(Deserialize::begin(&mut self.current_value))
}
}
}
fn finish(&mut self) -> Result<()> {
// Build a Value::Object from buffered pairs, then deserialize Pagination from it
let obj = Value::Object(self.buffer.drain(..).collect());
self.pagination = Some(from_value(&obj)?);
// ...
}
}
This works, but has several costs:
- Double parsing: every flattened key-value is parsed once into
Value, then deserialized again fromValueinto the target type. - Cloning:
Valuemust be cloned or moved for each field. - Ordering dependency: with multiple struct flattens, we needed the user to list which fields belong to which struct:
#[mini(flatten = "limit, offset, total")]. Without that, we could not tell which buffered keys go where. - One-flatten limit: supporting two struct flattens plus a
HashMapcatchall was painful — the user had to annotate everything.
For serialization, the initial approach was similarly blunt: convert the flattened field to Value via to_value(), then iterate the Value::Object’s entries:
#![allow(unused)]
fn main() {
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> {
// ... direct fields via match ...
// For flatten field:
let val = to_value(&self.data.pagination);
if let Value::Object(obj) = val {
self.buffer = Some(obj.into_iter().collect());
}
// then drain buffer entries one by one
}
}
Same problem: struct to Value to JSON, two allocation layers.
Iteration 2: Box<dyn Any> and the escape hatch
To avoid buffering, we needed the parent’s Map to directly delegate key() calls to the child’s Map. But Deserialize::begin() requires &mut Option<Self>, producing a Visitor — not a Map. We needed a way to create a child Map without the place-based API.
First attempt: a MapBuilder trait that returns Box<dyn Any>:
#![allow(unused)]
fn main() {
trait MapBuilder: de::Map {
fn build(&mut self) -> Result<Box<dyn Any>>;
}
trait DeserializeBuilder {
fn create_map() -> Box<dyn MapBuilder>;
}
}
Now the parent can:
#![allow(unused)]
fn main() {
struct __State<'a> {
users: Option<Vec<String>>,
pagination_builder: Box<dyn MapBuilder>,
out: &'a mut Option<UsersResponse>,
}
impl Map for __State<'_> {
fn key(&mut self, k: &str) -> Result<&mut dyn Visitor> {
match k {
"users" => Ok(Deserialize::begin(&mut self.users)),
_ => self.pagination_builder.key(k), // delegate directly!
}
}
fn finish(&mut self) -> Result<()> {
self.pagination_builder.finish()?;
let any = self.pagination_builder.build()?;
let pagination = *any.downcast::<Pagination>().map_err(|_| Error)?;
// ...
}
}
}
This eliminates buffering: JSON keys flow directly to the child’s Map, no intermediate Value. But Box<dyn Any> has its own costs:
- Two heap allocations: one for
Box<dyn MapBuilder>, one forBox<dyn Any>frombuild(). - Runtime downcast:
downcast::<Pagination>()is aTypeIdcomparison. Cheap, but it is a runtime check for something the compiler already knows. - Not composable:
MapBuildercannot express “I accept keys X, Y, Z but not others” — it takes all unknown keys, so you still cannot have two struct flattens without field lists.
Iteration 3: typed FlattenMap<T> with accepts_key
The breakthrough was realizing we needed two things:
- A typed alternative to
Box<dyn Any>: if the trait is generic over the output type, no downcast is needed. - Key routing at the trait level: each child should declare which keys it handles, so the parent can route keys to the right child at parse time.
#![allow(unused)]
fn main() {
trait FlattenMap<T>: de::Map {
fn accepts_key(&self, k: &str) -> bool;
fn build(&mut self) -> Result<T>;
}
trait FlattenDeserialize: Sized {
fn create_flatten_map() -> Box<dyn FlattenMap<Self>>;
}
}
Now the parent’s generated key() does this:
#![allow(unused)]
fn main() {
fn key(&mut self, k: &str) -> Result<&mut dyn Visitor> {
match k {
"users" => Ok(Deserialize::begin(&mut self.users)),
k if self.pagination_builder.accepts_key(k) => self.pagination_builder.key(k),
k if self.metadata_builder.accepts_key(k) => self.metadata_builder.key(k),
_ => self.extras_builder.key(k), // HashMap catchall
}
}
}
Multiple struct flattens, no field lists, no buffering, no Box<dyn Any>, no downcast. The accepts_key method is generated by the derive macro — it is just a match against the struct’s field names:
#![allow(unused)]
fn main() {
fn accepts_key(&self, k: &str) -> bool {
matches!(k, "limit" | "offset" | "total")
}
}
For HashMap flatten fields, accepts_key always returns true — they are the catchall that absorbs everything no other struct claims.
Why FlattenMap<T> is not FlattenMap (the object-safety issue)
A natural question: why not make FlattenMap non-generic and use build(&mut self) -> Result<Self> with Self as the associated output? Because then Box<dyn FlattenMap> would not work — Self makes the trait not object-safe.
We need Box<dyn FlattenMap<Pagination>> because the parent holds multiple children of different types. The parent’s state struct looks like:
#![allow(unused)]
fn main() {
struct __KeyMap {
users: Option<Vec<String>>,
__flatten_pagination: Box<dyn FlattenMap<Pagination>>,
__flatten_metadata: Box<dyn FlattenMap<Metadata>>,
__flatten_extra: Box<dyn FlattenMap<HashMap<String, Value>>>,
}
}
Each box erases the concrete builder type but preserves the output type — exactly the right level of abstraction for the parent’s finish() to call build() and get a typed result.
Monomorphization concern
Does FlattenMap<T> re-introduce serde’s monomorphization problem? No. Each FlattenMap<Pagination> is instantiated exactly once — for Pagination’s own builder. The parent references it through Box<dyn FlattenMap<Pagination>>, a trait object call. There is one vtable per flatten-able struct, and the parent’s code is fully monomorphization-free.
Compare with serde, where Deserialize<'de> is monomorphized for every combination of deserializer + data type, and flatten adds _serde::__private::de::Content buffering on top.
The serialization battle
With deserialization solved elegantly, we expected serialization to be straightforward. We designed a parallel trait:
#![allow(unused)]
fn main() {
trait SerializeMapBuilder {
fn create_ser_map(&self) -> Box<dyn ser::Map + '_>;
}
}
A parent struct with #[mini(flatten)] pagination: Pagination would:
- Yield its own direct fields normally.
- When it reaches the flatten field, call
pagination.create_ser_map()to get aBox<dyn ser::Map>. - Drain entries from that inner map, forwarding them as its own.
Simple in theory. The first implementation:
#![allow(unused)]
fn main() {
struct __Map<'a> {
data: &'a UsersResponse,
state: usize,
flatten_ser: Option<Box<dyn ser::Map + 'a>>,
}
impl ser::Map for __Map<'_> {
// The returned &dyn Serialize has the lifetime of &mut self.
// vvvvvvvvvv vvvvvvvvvvvvvvvvv
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> {
loop {
if let Some(ref mut fm) = self.flatten_ser {
// ^^^^^^^^^^^ BORROW STARTS: mutable borrow of self.flatten_ser
if let Some(entry) = fm.next() {
return Some(entry);
// ^^^^^ On this path, the borrow escapes to the caller
// via the return value's lifetime (tied to &mut self).
// Rust must assume it lives until the next call to next().
}
}
// The compiler cannot prove the borrow from `ref mut fm` above has ended,
// because `return Some(entry)` on another branch extended it to the caller.
if self.flatten_ser.is_some() {
self.flatten_ser = None;
// ^^^^ CONFLICT: assigns to self.flatten_ser while the
// borrow from `ref mut fm` may still be live.
// E0506: cannot assign to `self.flatten_ser`
// because it is borrowed
self.state += 1;
continue;
}
// ... state machine for direct fields ...
}
}
}
}
This does not compile. The borrow checker reports:
E0506: cannot assign to self.flatten_ser because it is borrowed
Why the borrow checker is right
The returned &dyn Serialize borrows from &mut self. When the inner map’s fm.next() returns Some(entry), entry contains a reference whose lifetime is tied to &mut self. Within a loop body, Rust sees the following conflict:
- Iteration N:
ref mut fmborrowsself.flatten_ser. Iffm.next()returnsSome(entry), thereturnextends that borrow to the caller — it must live until the next call tonext(). - Iteration N+1:
self.flatten_ser = Nonetries to write to the same memory.
The return on one path and the = None on another are mutually exclusive at runtime. But the borrow checker analyzes the loop body as a single region and cannot prove non-aliasing — it must conservatively assume the returned reference could still be live when the loop re-enters.
Failed fixes
Recursion instead of loop: replacing continue with return self.next() hits the same problem — the compiler sees that the recursive call could re-borrow self.flatten_ser while the previous call’s return value is still live.
Two borrow scopes: splitting the if let into two consecutive blocks (one to try next(), one to clear) does not help in a loop because the compiler treats the entire loop body as one borrow region.
The solution: one field per flatten, no loop
The fix came from the prototype code in our exploration crate, where a single flatten field worked fine:
#![allow(unused)]
fn main() {
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> {
if self.state == 0 {
self.state = 1;
return Some((Cow::Borrowed("users"), &self.data.users));
}
// Initialize inner map on first visit
if self.flatten_map.is_none() {
self.flatten_map = Some(
SerializeMapBuilder::create_ser_map(&self.data.pagination),
);
}
// Drain it
if let Some(ref mut m) = self.flatten_map {
return m.next();
}
None
}
}
This compiles because there is no loop and no mutation of self.flatten_map while a reference from m.next() is live — the return m.next() is a tail return, so the reference escapes to the caller, not back into the same function.
But this only works for one flatten field. With two — pagination and metadata — we need to detect when the first inner map is exhausted and switch to the second.
The generalized solution uses one Option<Box<dyn Map>> per flatten field and a sequential if-chain instead of a loop:
#![allow(unused)]
fn main() {
struct __Map<'a> {
data: &'a FullResponse,
state: usize,
__flatten_0: Option<Box<dyn ser::Map + 'a>>, // pagination
__flatten_1: Option<Box<dyn ser::Map + 'a>>, // metadata
}
impl ser::Map for __Map<'_> {
fn next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> {
// State 0: direct field "id"
if self.state == 0 {
self.state = 1;
return Some((Cow::Borrowed("id"), &self.data.id));
}
// State 1: drain pagination
if self.state == 1 {
if self.__flatten_0.is_none() {
self.__flatten_0 = Some(
SerializeMapBuilder::create_ser_map(&self.data.pagination)
);
}
if let Some(ref mut m) = self.__flatten_0 {
if let Some(entry) = m.next() {
return Some(entry);
}
}
self.state = 2; // OK: m is no longer borrowed
}
// State 2: drain metadata
if self.state == 2 {
if self.__flatten_1.is_none() {
self.__flatten_1 = Some(
SerializeMapBuilder::create_ser_map(&self.data.metadata)
);
}
if let Some(ref mut m) = self.__flatten_1 {
if let Some(entry) = m.next() {
return Some(entry);
}
}
self.state = 3;
}
None
}
}
}
Why this works: Rust’s partial borrow analysis can see that self.__flatten_0, self.__flatten_1, and self.state are disjoint struct fields. Within each if self.state == N block, only one __flatten_N is borrowed. When m.next() returns None, the ref mut m borrow ends at the closing brace of the if let, and self.state = N+1 touches a different field. The function returns None at the bottom, after all if-blocks have fallen through — no loop needed.
The return Some(entry) exits the function, extending the borrow to the caller (which is fine: Map::next’s signature says the returned reference borrows from &mut self). On the next call, self.state has already advanced past the exhausted flatten field, so we never re-enter that block.
HashMap flatten: reusing existing Serialize impls
Initially, HashMap flatten used a different path: to_value() followed by buffer iteration. But miniserde already implements Serialize for HashMap — its begin() returns Fragment::Map(box) wrapping a streaming iterator over the map’s entries. We added blanket SerializeMapBuilder impls:
#![allow(unused)]
fn main() {
impl<K, V, H> SerializeMapBuilder for HashMap<K, V, H>
where K: Hash + Eq + ToString, V: Serialize, H: BuildHasher,
{
fn create_ser_map(&self) -> Box<dyn ser::Map + '_> {
match Serialize::begin(self) {
Fragment::Map(m) => m,
_ => unreachable!(),
}
}
}
}
With this, the derive macro treats all flatten fields identically — struct or map — using the same create_ser_map + if-chain pattern. The to_value buffering is gone entirely from serialization.
The full trait landscape
Here is the final set of traits that midiserde adds on top of miniserde:
| Trait | Purpose | Generated for |
|---|---|---|
FlattenMap<T> | Extends de::Map with accepts_key + typed build() | Every derived struct |
FlattenDeserialize | Factory: create_flatten_map() -> Box<dyn FlattenMap<Self>> | Every derived struct, plus HashMap/BTreeMap |
SerializeMapBuilder | Factory: create_ser_map(&self) -> Box<dyn ser::Map + '_> | Every derived struct, plus HashMap/BTreeMap |
Three traits, all operating through trait objects, all generated by the derive macro. No monomorphization explosion. No Box<dyn Any>. No buffering for struct flattens.
Benchmark results
Comparing the streaming implementation against the previous buffer-based approach:
| Benchmark | Before (buffered) | After (streaming) | Change |
|---|---|---|---|
deser/flatten_struct/small | ~325 ns | ~183 ns | -43% |
deser/flatten_struct/medium | ~737 ns | ~424 ns | -42% |
ser/flatten_struct/small | ~385 ns | ~204 ns | -47% |
ser/flatten_struct/medium | ~467 ns | ~304 ns | -35% |
ser/flatten_combo (struct + HashMap) | ~448 ns | ~326 ns | -27% |
The struct flatten improvements come from eliminating the Value intermediate. The combo benchmark (struct flatten + HashMap flatten) shows the HashMap SerializeMapBuilder uplift — switching from to_value + buffer to direct streaming.
Deserialization and serialization of non-flatten structs are unchanged — the streaming traits add zero overhead when #[mini(flatten)] is not used.
Head to head: midiserde vs serde_json
The before/after numbers show the streaming approach is faster than our own buffered baseline. But how does it compare to serde_json — the production-grade, monomorphized, battle-tested library? We ran identical payloads through equivalent types derived with #[derive(serde::Serialize, serde::Deserialize)] and #[serde(flatten)].
Raw throughput (median, lower is better):
| Benchmark | midiserde | serde_json | ratio |
|---|---|---|---|
| deser/plain (no flatten) | 161 ns | 102 ns | 1.58x |
| deser/flatten_struct/small | 187 ns | 162 ns | 1.15x |
| deser/flatten_struct/medium | 431 ns | 406 ns | 1.06x |
| deser/flatten_combo (struct + HashMap) | 466 ns | 362 ns | 1.29x |
| ser/plain (no flatten) | 187 ns | 52 ns | 3.6x |
| ser/flatten_struct/small | 210 ns | 53 ns | 3.96x |
| ser/flatten_struct/medium | 298 ns | 96 ns | 3.1x |
| ser/flatten_combo (struct + HashMap) | 321 ns | 73 ns | 4.4x |
Serde_json is faster in absolute terms — substantially so for serialization. That is the monomorphization trade-off in action: serde_json’s Serializer is a concrete type, its serialize_str is a direct function call that writes bytes into a Vec<u8>, and the compiler inlines aggressively. Midiserde routes everything through &dyn Serialize → Fragment → Box<dyn ser::Map>, paying for dynamic dispatch and heap allocation on every nested value. That overhead is the price of a 6-minute build and a 12 MB binary instead of 19 minutes and 40 MB.
The more revealing metric is the flatten tax — how much overhead #[flatten] adds relative to each framework’s own non-flatten baseline:
| midiserde | serde_json | |
|---|---|---|
| deser/flatten_struct (small) | +16% | +59% |
| deser/flatten_struct (medium) | +167% | +298% |
| deser/flatten_combo | +189% | +255% |
| ser/flatten_struct (small) | +12% | +2.5% |
| ser/flatten_struct (medium) | +59% | +86% |
| ser/flatten_combo | +71% | +40% |
On deserialization, midiserde’s streaming FlattenMap with accepts_key routing consistently adds less relative overhead than serde’s Content buffering. Serde pays a 59% penalty for a small struct flatten; midiserde pays 16%. At medium payloads the gap widens: serde’s +298% vs midiserde’s +167%. This is the direct result of eliminating the buffer-and-replay cycle — midiserde routes each key to the right builder as it arrives, touching the data once.
On serialization, serde’s FlatMapSerializer is cleverly designed — it writes directly into the parent SerializeMap without any intermediate representation, so its flatten overhead is near-zero for small payloads. Midiserde’s trait-object-based ser::Map chaining pays more per entry due to dynamic dispatch. The structural advantage of midiserde’s approach shows up more clearly at medium payloads where serde’s overhead (+86%) actually exceeds midiserde’s (+59%), likely because serde_json must still resolve the FlatMapSerializer → SerializeMap indirection for every key-value pair and the cost scales with entry count.
The takeaway: if your bottleneck is raw JSON throughput on a small number of types, serde_json wins on speed. If your bottleneck is 7,000 types that take 19 minutes to compile and you use #[flatten] extensively, midiserde gives you dramatically faster builds, smaller binaries, and a flatten implementation that scales better with payload size — especially on the deserialization side where serde’s buffering hurts most.
Where to find serde’s flatten buffering in the source:
- Deserialization:
serde/src/private/de.rs:3184—FlatMapDeserializerwraps&mut Vec<Option<(Content, Content)>>and scans the buffer per field viaflat_map_take_entry.- Serialization:
serde/src/private/ser.rs:1003—FlatMapSerializerwraps&'a mut MwhereM: SerializeMap, writing directly without buffering.- The
Contentenum:serde_core/src/private/content.rs— serde’s internalValuetype used as the intermediate buffer.
Beyond flatten: midiserde’s feature set
While flatten was the most technically challenging feature, midiserde provides a broader set of serde-compatible attributes:
#[mini(rename = "json_key")] — Field and variant renaming for camelCase APIs or unconventional JSON naming.
#[mini(default)] / #[mini(default = "path")] — Missing-field handling. The plain form uses Default::default(), the path form calls a custom function fn() -> T. This avoids serde’s requirement for the entire struct to be Default.
#[mini(with = "module")] — Custom serialization/deserialization. The module provides begin(out: &mut Option<T>) -> &mut dyn Visitor and serialize(value: &T) -> Fragment. Midiserde ships adapters for base64 (Vec<u8> as base64 strings), rfc3339 and timestamp (chrono DateTime<Utc>), and time_delta (chrono TimeDelta as [seconds, nanoseconds] arrays).
#[mini(skip)], #[mini(skip_serializing)], #[mini(skip_deserializing)] — Field visibility control, matching serde’s behavior.
#[mini(skip_serializing_if = "path")] — Conditional serialization. Option::is_none and Vec::is_empty are common predicates.
from_value / to_value — miniserde does not ship these. Midiserde adds them, enabling the two-stage deserialization pattern that serde_json users rely on for polymorphic dispatch, config merging, and partial deserialization.
All of these are implemented as proc-macro codegen producing trait-object-based code — no monomorphization, no generic Serializer<S> or Deserializer<D> bounds. The derive macro generates the same compact code regardless of how many types use it.
Lessons learned
Trait signatures constrain architecture. ser::Map::next(&mut self) -> Option<(Cow<str>, &dyn Serialize)> — that &dyn Serialize with lifetime tied to &mut self is what made streaming serialization of flatten fields a puzzle. The signature is correct (the values are borrowed from the map’s state), but it meant we could not use a loop to drain an inner map and then mutate the field holding it. The workaround — per-field storage with sequential if-chains — works precisely because Rust can do partial borrow analysis on distinct struct fields.
Box<dyn Any> is a code smell, not a solution. Our first deserialization approach used Box<dyn Any> and downcast because we could not express a typed builder through miniserde’s existing traits. Adding FlattenMap<T> — a trait generic in its output but used as Box<dyn FlattenMap<Pagination>> — gave us type safety, fewer allocations, and better performance, all while remaining object-safe.
Key routing belongs at the trait level. accepts_key(&self, k: &str) -> bool turned out to be the single most important method in the design. It eliminated field lists, enabled multiple struct flattens, and removed the buffering that both serde and our initial implementation relied on. The cost — one string-match per key per flattened struct — is negligible compared to the JSON parsing itself.
The borrow checker rewards structural thinking. Every borrow error we hit had a structural fix. The loop-based serializer failed not because of a compiler limitation, but because a shared Option<Box<dyn Map>> genuinely created an aliasing hazard across loop iterations. Giving each flatten its own field resolved it at the type level, and the resulting code is both clearer and faster.
Try it, break it, tell us
midiserde is available on crates.io and Codeberg. If your project uses serde primarily for JSON and you are tired of watching cargo build crawl through monomorphization — especially with hundreds or thousands of types — give it a try. The migration path is mechanical: swap serde derives for midiserde derives, replace #[serde(...)] with #[mini(...)], and see what breaks.
We are actively looking for edge cases. The flatten macro generates non-trivial code — nested flattens, flattens inside generic structs, flattens combined with skip_serializing_if and with adapters — and we want to know where it falls over. File issues, send patches, or just tell us what serde attribute you reached for that was not there.
On the roadmap: tagged enum deserialization (serde’s #[serde(tag = "type")] discriminator pattern) is next. The ownership puzzle there is different — you need to peek at one key to decide which variant to deserialize, then replay the rest — but we suspect the FlattenMap + accepts_key machinery gives us a head start.
There is also a nice circle to close. vim_rs currently inlines all parent fields into every child’s serializer and deserializer because vim_build has total type knowledge. With midiserde’s FlattenMap and SerializeMapBuilder traits, that is no longer necessary — vim_build could emit #[mini(flatten)] on the parent field and let the derive macro handle composition through trait objects. The generated code would shrink dramatically (every VirtualDevice child currently duplicates 9 field slots and match arms), and vim_build would no longer need to walk the inheritance chain during code generation. It trades a small amount of runtime overhead (one Box<dyn FlattenMap> per ancestor) for a large reduction in generated code and generator complexity. Whether that trade-off is worth it depends on the project — but the option now exists.
If the vim_rs numbers resonate with your codebase, the flatten story in this article is the hardest part of the migration. The rest is just implementation.
Appendix: applying flatten traits to generated code — the vim_rs experiment
The closing paragraph above hypothesized that vim_build could use FlattenMap and SerializeMapBuilder to delegate parent field handling to parent types, shrinking generated code and simplifying the generator. We ran the experiment. The results were instructive — and negative.
Setup
We created a branch (topic/midiserde) where vim_build’s code generator was modified to:
- Emit
SerializeMapBuilderfor every struct. Each type’screate_ser_map()returns aBox<dyn ser::Map>that delegates to the parent’screate_ser_map()via the if-chain pattern (section “The solution: one field per flatten, no loop”), then yields only its own fields. ATypeNameWrapperstruct prepends_typeNamebefore delegating to the data map. - Emit
FlattenDeserializeandFlattenMap<T>for every struct. Each type’sFieldsstruct holds aBox<dyn FlattenMap<ParentType>>instead of duplicating parent field slots.key()checks own fields first, then delegates toparent_map.accepts_key()/parent_map.key().build()callsparent_map.build()to obtain the parent struct and composes the result.
The baseline branch (topic/miniserde) uses the original inline expansion where every child’s serializer and deserializer explicitly lists all inherited fields.
Build time and binary size
| Branch | Build | OS | Time | Size (MB) |
|---|---|---|---|---|
| midiserde | Debug | macOS | 1:35 | 137 |
| miniserde | Debug | macOS | 1:33 | 136 |
| midiserde | Release | macOS | 2:53 | 9.0 |
| miniserde | Release | macOS | 2:52 | 9.3 |
| midiserde | Debug | Windows | 2:45 | 60 |
| miniserde | Debug | Windows | 2:43 | 61 |
| midiserde | Release | Windows | 5:02 | 11.2 |
| miniserde | Release | Windows | 4:55 | 11.5 |
Build times are within noise. The ~200 KB release binary savings (~2%) comes from deduplicating the per-field match arms and field slot definitions across the inheritance hierarchy. But the new FlattenDeserialize, FlattenMap, SerializeMapBuilder, and accepts_key implementations added to every type largely offset the removed code, leaving the generated structs.rs at roughly the same line count (466K vs 474K).
The hypothesis that removing inherited field expansion would save 10–15% in compile time was wrong. The compiler’s bottleneck on a crate this size is not the number of match arms in a key() function — those are simple string-comparison patterns that compile efficiently. The cost is in type checking, trait resolution, and code generation for the overall type graph. Reshuffling what each type contains does not change how many types the compiler must process.
Runtime performance
Benchmarks were run on the same hardware, same payloads, using Criterion:
| Benchmark | miniserde (ns) | midiserde (ns) | Overhead |
|---|---|---|---|
| e1000 serialize | 269 | 355 | +32% |
| e1000 deserialize | 165 | 206 | +25% |
| vapp_property_fault serialize | 712 | 804 | +13% |
| vapp_property_fault deserialize | 1250 | 1275 | +2% |
| polymorphic e1000 serialize | 258 | 333 | +29% |
| polymorphic e1000 deserialize | 225 | 276 | +23% |
| roundtrip e1000 | 463 | 585 | +26% |
| roundtrip method_fault | 1861 | 1994 | +7% |
| roundtrip polymorphic e1000 | 524 | 658 | +26% |
| array of virtual_ethernet_card deserialize | 1714 | 1880 | +10% |
The 15–30% overhead is consistent and directional. The cost comes from dynamic dispatch through the delegation chain: VirtualE1000 extends VirtualEthernetCard extends VirtualDevice — three levels. Each serialized field traverses the chain via Box<dyn ser::Map> vtable calls. Each deserialized key routes through accepts_key() (virtual dispatch) then key() (virtual dispatch) at each level. Against miniserde’s baseline — a flat match seq { ... } with zero indirection — these vtable jumps are proportionally expensive.
Types with deeper hierarchies or more inherited fields show larger overhead (e1000: +32% serialize, most fields inherited). Types with more own fields relative to inherited ones show smaller overhead (vapp_property_fault: +13% serialize).
Why inline expansion wins for generated code
The flatten traits solve a real problem for hand-written code: avoiding manual duplication of parent fields across child structs. The derive macro generates the delegation machinery, and the programmer never sees or maintains the expanded field lists.
For generated code the calculus is different. The code generator already has full type knowledge — walking the inheritance chain to emit all fields is trivial and costs nothing at runtime. The inline expansion produces a flat, zero-indirection state machine per type. The compiler processes these efficiently because the patterns are simple and repetitive.
The delegation approach trades that zero-cost abstraction for runtime indirection. Each level in the hierarchy adds a heap allocation (Box<dyn FlattenMap<T>> or Box<dyn ser::Map>), a vtable, and per-key dispatch overhead. For a framework like miniserde where the baseline operations are already lightweight — string matching and Option slot assignment — the relative cost of adding vtable jumps is significant.
Conclusion
The experiment confirmed that midiserde’s flatten traits are well-suited for their intended purpose: hand-authored structs where compositional (de)serialization avoids code duplication and manual field routing. For machine-generated code with full type knowledge, the inline expansion remains the better choice — it produces faster runtime code, and the expected compile-time savings from reducing generated code volume did not materialize at scale.