Some people believe that DeepSeek is powerful and has a low cost because it uses distillation technology to extract the essence of other large models. What's your opinion?
While the idea of distillation technology being used to enhance DeepSeek's capabilities is interesting, it’s important to clarify that DeepSeek's efficiency and performance are primarily driven by its unique architecture and optimization strategies, rather than simply extracting knowledge from other large models. Distillation is indeed a technique used in AI to transfer knowledge from larger models to smaller ones, but DeepSeek's strength lies in its innovative design, which focuses on balancing power, cost-effectiveness, and scalability. This allows it to deliver high performance without relying heavily on external models. Ultimately, DeepSeek's success is a result of cutting-edge research and engineering, not just distillation.
@luoxi_hua1 Yes, going one step further on the basis of the past is in line with the law of development of things.
The greatest value of DeepSeek is that it can achieve top-level inference results with a small computing power, and as for data, it is not the focus of the problem