Startup Claims Breakthrough in LLM Speed and Efficiency

A Miami AI Startup Says It Solved One of the Biggest Problems in AI — And It Has the Data to Back It Up

For nearly a decade, researchers and engineers building large language models have wrestled with a stubborn mathematical problem: the bigger the document you feed an AI, the more brutally expensive and slow its processing becomes. It's a fundamental constraint that has shaped every major design decision in modern AI — from how models are trained to how much it costs to run them. Now, a Miami-based startup called Subquadratic is claiming it has finally cracked that problem, and it's starting to bring outside evidence to support its extraordinary claims.

What Is the LLM Bottleneck, Exactly?

To understand why Subquadratic's announcement matters, it helps to understand what the bottleneck actually is. Most large language models today rely on a mechanism called "attention," which was introduced in the landmark 2017 paper that gave us the Transformer architecture — the foundation of tools like ChatGPT and Claude. Attention works by comparing every word (or token) in a document against every other word to figure out which parts of the text are relevant to each other. The problem is how that comparison scales.

If you double the length of a document, the number of comparisons doesn't double — it quadruples. This relationship, known as quadratic scaling, means that processing long documents becomes eye-wateringly expensive very quickly. A document that's ten times longer than average doesn't cost ten times more to process; it costs a hundred times more. This is why context windows in AI models have historically been limited, and why running LLMs at scale demands enormous amounts of computing power and energy.

It's a well-known pain point in the industry, and it is not for lack of trying that no one has fully solved it. Many research teams have proposed alternatives, including various forms of "sparse attention," which attempts to skip unnecessary comparisons between words that are unlikely to be related. The idea is appealing on paper but has proven remarkably difficult to implement without degrading model quality to the point where the performance gains aren't worth it.

Enter Subquadratic and Its Model, SubQ

Subquadratic came out of stealth mode last month with a striking announcement: it claimed to have built a new model called SubQ that solves the scaling problem using sparse attention — and does so without sacrificing the performance that makes LLMs useful in the first place. According to the company, SubQ is not only faster and cheaper than competing models, but also consumes significantly less energy, which carries real implications for the environmental footprint of AI infrastructure.

The initial reception was, understandably, skeptical. Bold claims in the AI space are common, and the details Subquadratic shared at launch were thin. The field is littered with announcements that haven't held up to scrutiny. But the company has since taken a meaningful step toward credibility: it commissioned an independent evaluation of SubQ by Appen, a well-known third-party AI data and evaluation firm.

What the Independent Evaluation Found

The results of Appen's testing are notable. According to Subquadratic, the evaluation found that SubQ ran 56 times faster than rival approaches on comparable tasks. On a key benchmark measuring long-document retrieval — essentially testing how well the model can find and use information buried deep in a lengthy text — SubQ scored 98%. These are impressive numbers by any standard, and the involvement of an outside evaluator adds a layer of credibility that self-reported benchmarks simply cannot provide.

Long-document retrieval is a particularly telling test because it directly targets the weakness that quadratic scaling creates. Many current models struggle to accurately reference information from the far end of a long document, a phenomenon sometimes called the "lost in the middle" problem. A model that genuinely solves quadratic scaling should, in theory, handle this well — which is exactly what the Appen results suggest SubQ does.

Reasons for Cautious Optimism (and Continued Skepticism)

Despite the promising numbers, there are legitimate reasons to approach Subquadratic's claims with measured caution rather than uncritical enthusiasm.

The model isn't widely available yet. SubQ has not been released for broad public testing, which means independent researchers have not had the opportunity to stress-test it against the full range of real-world use cases. Controlled evaluations, however rigorous, are not the same as deployment at scale.
It was built on borrowed weights. Critics have pointed out that SubQ was developed using weights from an existing Chinese open-source model, rather than being trained entirely from scratch. This raises questions about how much of SubQ's architecture is truly novel and whether the sparse attention implementation is the primary driver of the performance gains.
One evaluation firm is not the final word. Appen is a reputable organization, but reproducibility across multiple independent teams is the gold standard in science. Until other researchers can replicate these findings, the results should be treated as highly suggestive rather than definitively proven.

Why This Still Matters for the Future of AI

Even setting aside the unresolved questions, the direction Subquadratic is pointing in is important. The computational cost of running LLMs is one of the most significant barriers to broader AI adoption. Smaller companies, research institutions, and developers in resource-constrained environments are effectively priced out of the most capable models. A genuine, durable solution to quadratic scaling would democratize access to powerful AI in a meaningful way — not just making existing applications cheaper, but opening the door to entirely new ones that require processing very long documents reliably and affordably.

The energy dimension matters too. As AI workloads have grown, so has concern about the electricity consumption of data centers running these models. A model that is 56 times faster on relevant tasks would, all else being equal, require a fraction of the energy to accomplish the same work.

What to Watch For Next

The next meaningful milestone for Subquadratic will be broader access. When SubQ becomes available for independent testing, the research community will quickly determine whether the Appen results hold up across a wider variety of benchmarks and conditions. The company's willingness to pursue third-party validation rather than relying solely on its own claims is an encouraging sign, but the real test is still ahead.

For now, Subquadratic occupies an intriguing middle ground — too credible to dismiss, but not yet proven enough to declare a revolution. That's not a comfortable place to sit, but in a field moving as fast as AI, it's where genuinely interesting breakthroughs tend to live before the rest of the world catches up.