What are latency and tokens per second, and why should you care if you – Brain and code tech

In recent months, numerous open-source artificial intelligence models have emerged, allowing users to run them on their own computers. However, while the idea of having local AI sounds appealing, in practice, this is inefficient for most users. The main reason: the speed of text generation, measured in tokens per second .

What is a token?

A token is the basic unit of processing in a language model. On average, a token is equivalent to about 0.75 words or approximately 4 characters . This means that to generate a 10-word sentence, an AI model has to process around 13-14 tokens .

Current AI models generate text at a speed of:

Gemini (Google) : 168 tokens/second.
ChatGPT-4 (OpenAI) : 119 tokens/second.
DeepSeek (free AI model): 21 tokens/second.

This means that ChatGPT-4 can generate around 90 words per second , while DeepSeek barely reaches 15 words per second . This is because these models run on servers with enormous computing power, designed to handle millions of simultaneous requests.

Running an AI on your computer

If you decide to install an AI on your computer, there are three ways to do it, but none are truly efficient for the average user:

Run it on the CPU of a normal computer
- Estimated speed: 1-2 tokens/second .
- For a 500-word response (approximately 700 tokens ), the AI would take between 6 and 12 minutes .
- Practically unsuitable for everyday use.
Run it with a graphics card (GPU) from a gaming computer
- Estimated speed: 5 tokens/second .
- For the same 500-word response , the waiting time would be around 2 minutes .
- It's still too slow for smooth use.
Run it with an NVIDIA H100 graphics card (€25,000 or more)
- Estimated speed: 60 tokens/second .
- Generation time for 500 words : about 10 seconds .
- However, not even the best gamers have these types of graphics cards, which are designed for data centers and large corporations.

Conclusion: Is it worth installing an AI on your computer?

The short answer is no . While technically possible, the generation speed is too slow for practical use. Most current AI models are designed to run on specialized servers with hardware optimized to handle high-speed inference processes.

Installing AI on your computer might serve as a technical exercise or experiment , but for real-world use, the best option remains accessing cloud services that offer these models with optimized response times. In short, if speed and efficiency are your priorities, it's better to let AI run on professional servers rather than trying to run it on your personal computer.

What are latency and tokens per second, and why should you care if you want to learn about Artificial Intelligence?

What is a token?

Running an AI on your computer

Conclusion: Is it worth installing an AI on your computer?

Leave a comment

Your cart

Secciones

Choose options