Trying to figure out MCP by actually building an app from scratch with open source and SLMs
1. Trying to figure out MCP by
actually building an app from
scratch with open source and SLMs
Julien Simon, Chief Evangelist
julien@arcee.ai
https://www.julien.org
https://github.com/juliensimon/smolagents-mcp-demo
2. Just another remote procedure call protocol 👴
1982: UNIX RPC – First mainstream RPC, used in NFS.
1994: CORBA – Early cross-language object middleware.
2000: REST – Dominant HTTP API style.
2000: SOAP – Enterprise XML-based APIs.
2005: gRPC – Protobuf + HTTP/2, high-performance.
2007: Thri
f
– Mul
ti
-language RPC.
2010: WebSockets – Real-
ti
me, bidirec
ti
onal communica
ti
on.
2014: GraphQL – Client-driven data queries.
2023: MCP – Standardizes AI data integra
ti
on ⬅ YOU ARE HERE
3. Food for thought
Security, Trust, Audi
ti
ng - CRITICAL for enterprise adop
ti
on
• What measures are in place to authen
ti
cate and validate the servers your applica
ti
on communicates with?
• How do you ensure each func
ti
on is accessible only to authorized users under appropriate condi
ti
ons?
• How do you establish clear iden
ti
ty management to a
tt
ribute ac
ti
ons accurately within MCP systems?
Discoverability & Rou
ti
ng
• How does your applica
ti
on discover and connect to remote servers dynamically?
• How do you determine the most suitable server and func
ti
on(s) for each task within your applica
ti
on?
Versioning & Compa
ti
bility
• How can you ensure that updates do not disrupt exis
ti
ng func
ti
onali
ti
es for users?
• How does MCP live alongside other protocols (REST, OpenAI-style func
ti
on calling, etc.)?
Documenta
ti
on & Usability
• How do you ensure that func
ti
on descrip
ti
ons are detailed and understandable for models?
Performance & Cost Management
• How do you op
ti
mize latency and token consump
ti
on in agen
ti
c systems?
4. Arcee AI - Post-trained models
State-of-the-art tech stack based on open-source libraries
Spectrum (continuous pre-training), MergeKit (merging), DistilKit (distillation), EvolKit (dataset improvement)
Best-in-class models based on open-source architectures
Hugging Face OpenLLM Leaderboard benchmarks
Llama 3.1 70B
🥇
Best 70B model
Qwen2 1.5B
🥇
Best 1.5B model
Llama 3.1 8B
🥇
Best 8B model
Qwen2.5 14B
🥇
Best 14B model
Qwen2 72B
🥇
Best Arabic model
7. AFM-4.5B-Preview vs. Qwen-3-4B
8/10. Tie on Industrials, loss on Communication Services.
200 questions generated by Claude Sonnet 3.7
20 questions for each one of the top 10 industries in the S&P 500
Judge: DeepSeek-R1 (670B)
https://github.com/juliensimon/radar-evaluator
8. AFM-4.5B-Preview vs. Google Gemma-3n-E4B-it
8/10, tie on Healthcare, loss on IT
200 questions generated by Claude Sonnet 3.7
20 questions for each one of the top 10 industries in the S&P 500
Judge: DeepSeek-R1 (670B)
https://github.com/juliensimon/radar-evaluator
9. AFM-4.5B-Preview vs. Llama-3.2-8B
10/10 😃
200 questions generated by Claude Sonnet 3.7
20 questions for each one of the top 10 industries in the S&P 500
Judge: DeepSeek-R1 (670B)
https://github.com/juliensimon/radar-evaluator
10. AFM-4.5B-Preview vs. Mixtral-8x7B-Instruct
Almost tied (4/10) with 8% of Mixtral’s size
200 questions generated by Claude Sonnet 3.7
20 questions for each one of the top 10 industries in the S&P 500
Judge: DeepSeek-R1 (670B)
https://github.com/juliensimon/radar-evaluator
11. Julien Simon, Chief Evangelist
julien@arcee.ai
https://www.julien.org
https://github.com/juliensimon/smolagents-mcp-demo
Models on Hugging Face;
OpenRouter and Together AI
Chat with AFM
AFM blog post