January 30, 2025

9 MINUTES

Open Source vs. Proprietary LLMs: Arms Race for AI Leadership

Data Industry

LLMs

Sami Hero

CEO

This analysis provides a comparison of open-source and proprietary Large Language Models (LLMs), with special attention to emerging Chinese models like DeepSeek and Qwen.

The timing of DeepSeek and Qwen's announcements coincided strategically with U.S. tech companies' earnings week, creating significant market impact and contributing to volatility in Nvidia's stock price. This timing appears deliberate, maximizing media attention and market influence.

Deepseek made major waves as their announcement claimed that the model was trained with $5.6M and using older Nvidia GPU chips. The announcement let the market believe that a high performing model could be built in China for under $6M while in the west it costs billions to achieve that. Subsequent analysis hints that the hedge fund had spend closer to $500M in R&D before launching and there are some questions about the training dataset origination. Regardless, Deepseek launch has set the bar “low” for others—it’s becoming clear that the winners in the AI arms race must do 10X more for 1/10th of the cost today!

Current Market Landscape

While Deepseek has brought open source LLMs top of mind again, it is not the first nor the only one available in the market today. Meta, Google and Mistral have had open source models as well as MPT that was acquired by Databricks.

Prominent Open Source Models

Llama (Meta)
Gemma-2 (Google)
Mistral-3
DeepSeek
Qwen 2.5 (Alibaba)
Falcon (Technology Innovation Institute)
MPT (Databricks)

Proprietary Models

GPT (OpenAI)
Claude (Anthropic)
PaLM (Google)

Technical Comparison

Open Source Models: The Rise of Efficient Alternatives

‍Open source has historically delivered great innovations such as Linux and various Apache foundation innovations (search tools, NoSQL, Messaging, Cluster computing etc.). It has solidified its place in the tech stacks across the world and offers a solid alternative to commercial products. Many open source solutions have a commercial counterpart offering enterprise class features and support in addition to the community driven innovation.

‍Strengths of Open Source Models

Generally lower cost to commercial peers
Full control over deployment and infrastructure
Ability to re-train and customize for specific use cases particularly for industry specific vertical model
No data sharing or risk of training dataset might commingle with others
Lower operational costs at scale although there are still questions about the cost to operate inference models effectively and train the models where needed
Community-driven improvements and updates
Transparency in model architecture and training (although limited with the new Chinese models)

General Limitations

Generally lower performance compared to top proprietary models
Requires significant technical expertise to deploy and maintain
Higher upfront infrastructure costs as you would need to run the models in your own VPC or on-premise.
Support options are primarily community driven with limited vendor options
Potential legal uncertainties around training data as well as local bias for China based vendors. Deepseek has been proven to have major bias in the training set in regards to topics considered “difficult” for the Chinese government

Proprietary Models

‍OpenAI brought large language models to the masses with the launch of ChatGPT. Despite the name, the models are not open access and have been commercialized particularly through their partnership with Microsoft as Azure AI and Copilot in 365. Anthropic has been innovating rapidly with multilanguage and reasoning support. One of the areas where the commercial models excel is multi-modal access including vision and voice.

‍Strengths of Commercial Models

Superior performance in general tasks due to massive training set
Multimodal (voice, vision, text) capabilities
Regular updates and improvements
Professional support and documentation
Simplified deployment through APIs and in platforms such as AWS Bedrock
Proven reliability and stability
Strong security measures and compliance features

Limitations

Higher operational costs at scale
Limited control over model behavior
Data privacy concerns unless run in a private instance
Vendor lock-in risks
Dependency on provider's infrastructure

A look into the Chinese Models

‍China has several advantages to their western peers. In the absence of high end GPUs, engineering teams must leverage ingenuity to find alternative ways to build and train their models. They also have an added benefit in being able to access cheap electricity for data centers running the chips required for training and finetuning of the models. GPT-4 required about 50 GWh of power to train. It is estimated that GPT-5 will need 1,500 GWh to train.Deepseek announced on their research paper that they spent only $5.6M to train its model.

They didn’t specifically cover any other costs related to the project and some analyst firms are estimating that their infrastructure spend was closer to $500M when taken into account the development efforts.

‍DeepSeek

‍DeepSeek’s research is bankrolled by hedge fund High-Flyer Capital, started in 2015. Their AI research was supporting the fund’s investment strategies and they amassed more than 10K of Nvidia’s earlier generation GPU chips. The US embargo on high-end chipsets challenged their approach compared to western peers. The company is also aggressively going after AI talent in China offering nearly double salaries compared to other leading companies there.

‍Key Aspects

Architecture based on transformer model (neural network capable of transforming inputs and outputs to different formats)
Available in multiple sizes (7B, 13B, 67B parameters)
Strong performance in coding, language and technical tasks
Multilingual capabilities with emphasis on Chinese-English translation
Primary focus in text, math and coding use cases

Potential Use Cases

Software development and code generation
Technical documentation
Multilingual content creation
Research and analysis tasks

Qwen

‍Alibaba announced Qwen 2.5-Max, the latest of its LLM series. The unveiling of this open-source model can easily be perceived as a direct challenge to Deepseek and domestic rivals. Timing of the launch during Chinese new year was probably forced by the Deepseek announcement.

‍Key Aspects

Available as API on Alibaba cloud services
Trained with over 20T tokens
Available in various sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B)
Extended context handling
Strong performance in Chinese language tasks
Competitive performance in English language tasks
Enhanced reasoning and problem-solving

Likely Use Cases

E-commerce applications
Customer service automation
Content generation
Cross-cultural communication

Strategic Recommendations

When to Choose Open Source

‍Organizations that have strong technical in-house skills and are cost constrained with specific vertical needs will benefit most from using open source models. For instance, building vertical AI solutions with specific domain needs that also have strict compliance requirements are great candidates as developers can train the models with proprietary content and run them on-premise or in virtual private cloud (VPC). This gives the organization full control over applications data and infrastructure.

There is a case for open source in cost-constrained applications where large consumption of tokens could make proprietary models like Claude too expensive to operate as the usage scales.

Open source approach generally shines with research and development focused initiatives where the code itself is being analyzed and fine-tuned or customized for specific use cases.

‍When to Choose Proprietary

‍Many organizations have benefited from OpenAI and Microsoft partnership in making GPT highly accessible through an easy-to-use API approach. AWS also launched their Bedrock framework with Google following suite making various proprietary models easy to integrate and deploy as part of applications leading to many major software companies such as ServiceNow, Salesforce, Hubspot and others to integrate the capabilities.

Proprietary models are excellent for horizontal applications requiring general intelligence and large context windows with applications such as automated customer service, creating images and videos, translating content in multiple languages etc. For instance, Claude Sonnet 3.5 was trained with over 175B parameters and has a context window of over 200K tokens.

These models can be deployed rapidly through well documented APIs on AWS Bedrock, GCP or Azure making the integration fairly seamless. Due to ease of deployment, the technical resources can be leveraged for application of the technology instead of hosting and infrastructure needs.

Commercial LLM platforms also offer consistent SLA for performance which is evident in constant public model comparison benchmarks. There is also a simpler audit path for regulatory compliance as commercial models have accepted terms of use and support security standards such as ISO27001 or SOC2.

‍Future Outlook

‍The battle is forming with China against the rest of the world with the new models. While the new approach by Deepseek requires less GPUs in a data center for training, the existing leading LLMs from Anthropic and OpenAI require massive data centers.

China has invested heavily with electrical infrastructure and is launching 2-10 GW renewable and coal/oil firing power stations in advance to develop small nuclear power plants capable of running data centers. In the world of AI data centers, having access to energy below 5c a kWh is a critical factor and there’s a risk for Europe and the United States not being able to compete.

‍Market Trends

‍Increased competition in open source models with latency becoming one of the key differentiators as the capabilities become generally “good enough”. For instance, Mistral 3 offers significantly improved latency when compared to many other reasoning engines.

There will be continued improvement in model performance as there will be more options between “small” models with 24B or less tokens as well as large models. With the expansion of use cases there will be a growing emphasis on efficiency and deployment costs. Open source offers a great mechanism to manage the costs and maintain predictability.

While the Apache licensing framework is accepted, it’s likely that there will be evolution of licensing and commercial terms particularly with the models coming from China.

The emergence of specialized models for specific domains; for instance, EXL Services built their own model for insurance industry optimized for claims and underwriting processes.

‍Strategic Implications

‍Ideal strategy for companies looking to deploy large language models for their AI initiatives is to ensure there’s an option to easily swap underlying LLMs. There is a growing importance of model selection flexibility as new capabilities arrive constantly that can better meet the needs of the applications. Once the models are considered “good enough” there will likely be lower cost options offering the same or better performance.

There is also a need for hybrid approaches combining multiple models as some use cases can only be solved leveraging multiple models—for instance, multi language support with complex reasoning needs for an industrial company’s support system might require two or more models. Or a sophisticated recommendation engine for art in the context of interior decoration of a client's apartment would require a computer vision model identifying objects and color scheme and a conversational agent to offer ideal objects that fit the decor.

As the use cases become more common and scalable there is an increasing focus on model efficiency and cost optimization. As mentioned in the intro, there’s a need to do 10X more at 1/10th of the cost—if the models allow great unit economy with the token consumption, more mundane tasks can and will be automated as the cost becomes less of a constraint.

The evolution of deployment and infrastructure strategies continues and highlights the importance of maintaining technical adaptability and general maintainability of the infrastructure and models. Organizations must develop technical operational skills (AIOps) to monitor the models especially if they are constantly fine-tuned based on the inputs from the users. Just like in machine learning algorithms, there’s a risk for model drift with LLMs.

‍Conclusion

‍The choice between open source and proprietary LLMs should be based on specific use cases, technical requirements, and strategic considerations. While open source models offer greater control and customization potential, proprietary models currently lead in general performance and ease of deployment. The emergence of powerful Chinese models like DeepSeek and Qwen adds new options to consider, particularly for specialized applications and multilingual requirements.

‍Recommendations

Ensure flexibility in model selection for portfolio companies especially ones operating in vertical AI assisted sectors
Consider hybrid approaches combining multiple models or optimize model selection on the fly depending on use case. Claude Sonnet for instance for complex reasoning tasks
Evaluate total cost of ownership beyond initial deployment as token costs in proprietary models can surprise.
Assess technical capabilities and resource requirements in-house. Deploying open source models is not a trivial effort.
Monitor evolving market dynamics and model improvements
Consider strategic value of data control and customization
Evaluate long-term scaling implications and hosting options. Does the hosting provider have long term energy deals?