HAAM Systems / From the room

July 1, 2026 · 13 min read · By Kris Haamer

Before ChatGPT, AI Had a Different Interface

I attended AI Frontiers in Santa Clara in 2017. Its programme imagined the future through Siri, Alexa, video understanding, robots, games, and self-driving cars. The capabilities arrived, but not through the interface the room expected.

The linguistic shift

The technology changed. So did the words used to contain it.

2017: Deep learning2026: Foundation and multimodal models

The unit of progress shifted from a specialist technique inside one product to models that can operate across language, images, audio, video, and software.

2017: Personal assistants2026: Agents and copilots

The assistant moved from answering a spoken request to working across tools, files, interfaces, and longer sequences of action.

2017: Video understanding2026: Generative and editable video

Systems first had to represent objects, motion, time, and actions before those representations could become creative controls.

2017: AI deployment2026: AI-native workflows

The question is no longer only where to add a model. It is how research, design, engineering, operations, and responsibility reorganize around it.

The programme was a map of what 2017 thought AI would become

In November 2017, I attended AI Frontiers at the Santa Clara Convention Center. The event described itself plainly as a conference about applied deep learning. Its programme divided the future into personal assistants, robots, video analysis, autonomous driving, games, and major algorithmic breakthroughs.

That list now feels both accurate and incomplete. AI did spread through those categories, but the interface that eventually made the technology feel general-purpose was not yet central to the room. Five years before ChatGPT, artificial intelligence was still presented mainly as a collection of specialist capabilities embedded inside separate products.

Returning to the programme is useful because old technology conferences preserve more than predictions. They reveal the categories people used to make a new technology understandable before its dominant interface had appeared.

AI as infrastructure, before it became an interface

Andrew Ng opened the first day with a talk titled AI Is the New Electricity. His argument was not that AI would remain a visible novelty. It would become a foundational capability that transformed major industries, and organizations would need to learn how to identify opportunities, deploy systems, and change the way they worked.

That infrastructure metaphor aged well. AI now sits inside search, office software, creative tools, customer support, analytics, and development environments. But infrastructure alone does not explain why the current wave reached so many people so quickly.

Electricity became useful through appliances, switches, sockets, standards, and systems built around human activity. AI also needed an accessible interface. The chat box supplied one: a familiar surface through which people could reach many capabilities without first understanding the architecture underneath them.

The model mattered, but the interaction model changed the market. A general-purpose technology became legible through a general-purpose conversation.

The assistant was already an interaction design problem

One session brought together Alex Acero from Apple Siri, Ruhi Sarikaya from Amazon Alexa, and Dilek Hakkani-Tür from Google. Their talks described advances in speech recognition, synthesis, translation, natural-language understanding, dialogue state, and task completion.

The most revealing problems were not purely computational. Sarikaya described users who did not know which skills existed, what those skills could do, or how to phrase a request. Hakkani-Tür described dialogue as a collaborative process in which the system had to infer requirements and help a person reach a goal.

A graphical interface exposes possibilities through menus, buttons, fields, and spatial hierarchy. A conversational interface can hide almost everything it knows. Its apparent simplicity creates a difficult design obligation: the system must reveal its capabilities, limits, context, confidence, and next possible actions without turning the conversation into a manual.

The same tension now appears in AI copilots and agents. A system can technically perform dozens of actions while leaving the user unsure about what it understood, what it changed, which information it used, and how to recover from a mistake. Better models do not remove the need for interaction design. They raise the cost of getting it wrong.

Before generative video, machines had to understand video

The 2017 programme devoted a full track to video understanding. Researchers from Google, Facebook, Alibaba, and Twenty Billion Neurons discussed annotation, action recognition, temporal aggregation, multimodal information, summarization, search, logistics, shopping, and common-sense scenes.

The central question was how a machine could understand what happened across time inside an existing video. Today, the visible product question has expanded: how can a person create, edit, search, translate, restyle, and direct video through language?

The newer interface depends on the older research problem. A video model cannot follow a camera instruction or preserve an action across frames without some representation of objects, movement, relationships, sequence, and physical continuity.

Generative video looks like a sudden creative breakthrough because the control surface is new. Underneath it sits a longer history of teaching machines to perceive events rather than isolated images.

One model was already trying to cross the categories

Google Brain researcher Łukasz Kaiser presented a session titled One Model to Learn It All. The system was trained across image recognition, translation, image captioning, speech recognition, and language parsing instead of using a separately designed architecture for every task.

The conference programme still organized AI into product categories, but the research direction was already pushing against those boundaries. Earlier that year, Kaiser and seven co-authors had published Attention Is All You Need, introducing the Transformer architecture through translation tasks.

The historical twist is that a conference divided into assistants, video, games, cars, and robots contained evidence that the technical future might be less divided. The model could become a shared layer across many interfaces and media types.

That convergence is now visible in multimodal systems. Language is no longer only an output. It has become a control layer for images, sound, video, code, search, and external tools.

Self-driving cars exposed the gap between a demonstration and a system

Autonomous driving occupied a prominent place in the event. Speakers from Nvidia and Uber described deep learning as essential to perception, simulation, vehicle computing, and the eventual transformation of transportation and cities.

The optimism was understandable. Improvements in perception made a difficult physical problem look as though it might yield to continued model progress. Deployment revealed a wider system: rare events, safety cases, regulation, mapping, sensors, infrastructure, liability, public trust, and the consequences of failure.

This is where physical AI and software AI diverge. A conversational product can be released, observed, restricted, and updated quickly, although its mistakes can still cause real harm. A vehicle must act inside an open environment where a convincing average performance is not enough.

The lesson extends beyond cars. Intelligence is only one component of reliability. The rest comes from engineering, operations, governance, interface design, institutional accountability, and the ability to handle situations the model has barely seen.

Capability dominated the room

The published agenda concentrated on what deep learning could make possible and how companies could deploy it. Governance, copyright, environmental cost, labor displacement, platform power, and model safety were not prominent organizing themes in the programme.

That does not make AI Frontiers unusual. It makes the event representative of its moment. In 2017, the field still had to prove that deep learning could produce commercially useful systems across industries.

By 2026, capability is no longer the only credibility problem. AI systems operate inside education, work, media, public information, creative production, and decision-making. The question has widened from whether a system can perform a task to who can inspect it, contest it, afford it, control it, and absorb the cost when it fails.

The missing topics are part of the historical evidence. A programme shows not only what a field sees, but what it has not yet learned to treat as central.

What aged well

Several claims embedded in the conference have become more important with time. AI did become a general-purpose layer across industries. Interaction remained a bottleneck. Models began crossing domains. Deployment required organizational change. Research progress did not automatically become a trustworthy product.

What changed most was the distance between those ideas and ordinary users. In 2017, many capabilities were demonstrated through specialist sessions and separate product categories. Today, a person can encounter several of them in one conversation and expect the system to move between writing, images, data, code, search, voice, and action.

That compression makes AI feel simpler while making the product system more complex. Every apparently effortless request may involve model selection, context assembly, retrieval, permissions, external tools, latency, evaluation, safety constraints, and a user interface that must explain just enough of this machinery to preserve agency.

Why return to an old conference?

Technology conferences are designed to make the future feel close. Years later, their real value is different. They let us compare what people could already see with what remained outside their frame.

AI Frontiers took place during a threshold year. The Transformer had been introduced. Researchers were already building systems that crossed tasks. Assistants were confronting context and discoverability. Video systems were learning time and action. Companies were trying to turn research into products.

What had not yet arrived was the interface that would gather many of those capabilities into one public object. Once it did, AI stopped feeling like a collection of industry applications and began to feel ambient.

The unresolved edge is no longer whether AI will become infrastructure. It is whether the interfaces placed on top of that infrastructure will help people understand what is acting on their behalf, what it knows, whose incentives shaped it, and when they should refuse its help.

Event

AI Frontiers: Applied Deep Learning

Date

November 3 to 5, 2017

Place

Santa Clara Convention Center, California

Help improve this website?

Optional Google Analytics and Microsoft Clarity measure content performance and usability. They load only if you allow them. Form values, email addresses, and chat messages are never included in analytics events.