the unprotected left
there's a left turn at San Antonio and El Camino Real that i drive three times a week. unprotected. four lanes of oncoming traffic. pedestrians who step off the curb before the walk signal. a cyclist who materializes from behind the Safeway truck every time like it's scripted.
the model handles it differently every time. not wrong, exactly. just differently. the gap acceptance threshold shifts. sometimes it's aggressive and i'm fine with it. sometimes it waits for a gap i would have taken. last Tuesday it started to go, stopped, then went. i didn't intervene but i thought about it.
gap acceptance is a real term from traffic engineering. it's the minimum time gap in oncoming traffic that a driver will accept to complete a turn. for humans, it varies by age, experience, visibility, how late you are. studies put the average around 5-6 seconds for an unprotected left. but it's not a fixed number. it shifts with context.
the model doesn't have a gap acceptance parameter. there's no line of code that says "if gap > 5.2 seconds, go." that's the whole point of end-to-end. there's a learned behavior that emerged from millions of examples of other cars making left turns, and what it learned is not a rule. it's something closer to intuition. or maybe "intuition" is too generous. it's a function that maps pixels to steering commands, and somewhere in the middle there's a representation of gap size and risk and "am i blocking traffic" that we don't have direct access to.
the model teaches me things sometimes. there's a moment in the turn where i usually commit. i'm about 20% into the intersection, i can see the oncoming lane clearly, and i go. the model consistently waits about 200ms longer than i would. after a few weeks of noticing this, i realized it was checking something i wasn't: the pedestrian crossing signal on the far side. it factors in whether someone might step off the curb mid-turn. i was not doing that reliably.
whether the model "knows" it's checking the pedestrian signal or whether that's just a statistical pattern that fell out of the training data, i genuinely cannot tell. and that's the thing. i optimize this network for a living. i can tell you exactly how many milliseconds each layer takes. i can tell you the memory access pattern of the attention mechanism. i cannot tell you why it hesitates at that intersection.
sitting in the driver's seat watching intuition happen in silicon at 30fps is either beautiful or terrifying depending on how your day is going.