The Matchmaking Engine That Failed First

The first version of my matching engine demoed beautifully and fell apart in practice. It was built to connect startups with the right investors, and on paper the approach was clean: embed everything, compare with cosine similarity, return the closest matches. In a demo that looks like magic. In production it produced confident nonsense.

How it failed

Pure embedding similarity gave me three problems at once. The results were full of near-duplicates that inflated the list without adding value. Many matches were semantically close but practically wrong: the right sector but the wrong stage, the right thesis but the wrong geography or cheque size. And worst of all, there was no way to explain why any given match had been made, so I could not tell a good result from a lucky one.

What fixed it

The fixes came from the structure I built around the model.

Hybrid retrieval. Hard filters first (stage, geography, cheque size), then embedding similarity, then a deterministic scoring layer I could actually reason about.
An evaluation harness. I measured precision against labelled examples, so changes were judged by the numbers rather than by how the demo felt that morning.
Deduplication. Collapsing near-duplicate records stopped the result set from looking busy while saying little.
Match reasons. Every match had to come with a why. That improved quality on its own, because failures became visible instead of hiding inside a similarity score.

The lesson

Embedding similarity is a brilliant first 80% and a treacherous last 20%. The wins came from treating it as one signal among several, and from building the evaluation loop that let me see, quickly and honestly, when I was wrong.