1. | | Autothrottle: Resource Management for SLO-Targeted Microservices (usenix.org) |
| 22 points by mlerner 6 months ago | past | 1 comment |
|
2. | | Resiliency at Scale: Managing Google's TPUv4 Machine Learning Supercomputer (micahlerner.com) |
| 1 point by mlerner 6 months ago | past |
|
3. | | How the anti-cheat is anti-cheating so far (leagueoflegends.com) |
| 3 points by mlerner 11 months ago | past |
|
4. | | ServiceRouter: Hyperscale and Minimal Cost Service Mesh at Meta (micahlerner.com) |
| 1 point by mlerner on March 31, 2024 | past |
|
5. | | A Cloud-Scale Characterization of Remote Procedure Calls (micahlerner.com) |
| 2 points by mlerner on March 5, 2024 | past |
|
6. | | A Cloud-Scale Characterization of Google's Remote Procedure Calls (micahlerner.com) |
| 1 point by mlerner on March 4, 2024 | past |
|
7. | | Gemini, Amazon's system for fast failure recovery in distributed model training (micahlerner.com) |
| 2 points by mlerner on March 2, 2024 | past |
|
8. | | Defcon: Preventing overload with graceful feature degradation (2023) (micahlerner.com) |
| 237 points by mlerner on Feb 29, 2024 | past | 95 comments |
|
9. | | MotherDuck: DuckDB in the Cloud and in the Client [pdf] (cidrdb.org) |
| 5 points by mlerner on Feb 4, 2024 | past |
|
10. | | Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints (micahlerner.com) |
| 3 points by mlerner on Feb 4, 2024 | past |
|
11. | | Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints (micahlerner.com) |
| 4 points by mlerner on Jan 31, 2024 | past |
|
12. | | XFaaS: Hyperscale and Low Cost Serverless Functions at Meta (micahlerner.com) |
| 3 points by mlerner on Jan 27, 2024 | past |
|
13. | | XFaaS: Hyperscale and Low Cost Serverless Functions at Meta (micahlerner.com) |
| 3 points by mlerner on Jan 24, 2024 | past | 1 comment |
|
14. | | Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications (micahlerner.com) |
| 2 points by mlerner on Jan 21, 2024 | past |
|
15. | | Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints [pdf] (rice.edu) |
| 50 points by mlerner on Jan 21, 2024 | past | 13 comments |
|
16. | | XFaaS: Hyperscale and Low Cost Serverless Functions at Meta [pdf] (upenn.edu) |
| 4 points by mlerner on Jan 20, 2024 | past |
|
17. | | Efficient Memory Management for Large Language Model Serving with PagedAttention (micahlerner.com) |
| 3 points by mlerner on Jan 20, 2024 | past |
|
18. | | Efficient Memory Management for Large Language Model Serving with PagedAttention (micahlerner.com) |
| 1 point by mlerner on Jan 11, 2024 | past |
|
19. | | Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications (micahlerner.com) |
| 1 point by mlerner on Jan 4, 2024 | past |
|
20. | | Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications (micahlerner.com) |
| 5 points by mlerner on Jan 2, 2024 | past | 2 comments |
|
21. | | Defcon: Preventing Overload with Graceful Feature Degradation (micahlerner.com) |
| 4 points by mlerner on July 29, 2023 | past | 1 comment |
|
22. | | Defcon: Preventing Overload with Graceful Feature Degradation (micahlerner.com) |
| 4 points by mlerner on July 25, 2023 | past |
|
23. | | Ask HN: Streaming reading academic CS papers? |
| 2 points by mlerner on July 5, 2023 | past |
|
24. | | Towards an adaptable systems architecture for memory tiering at warehouse-scale (micahlerner.com) |
| 28 points by mlerner on June 29, 2023 | past | 4 comments |
|
25. | | Sundial: Fault-Tolerant Clock Synchronization for Datacenters (micahlerner.com) |
| 2 points by mlerner on June 27, 2023 | past | 1 comment |
|
26. | | Empowering Azure Storage with RDMA (usenix.org) |
| 2 points by mlerner on June 25, 2023 | past |
|
27. | | Sundial: Fault-Tolerant Clock Synchronization for Datacenters (micahlerner.com) |
| 3 points by mlerner on June 25, 2023 | past |
|
28. | | Automatic Reliability Testing for Cluster Management Controllers (micahlerner.com) |
| 1 point by mlerner on June 18, 2023 | past |
|
29. | | TelaMalloc: Efficient On-Chip Memory Allocation for Production ML Accelerators (micahlerner.com) |
| 52 points by mlerner on June 7, 2023 | past | 1 comment |
|
30. | | Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems (micahlerner.com) |
| 2 points by mlerner on April 17, 2023 | past |
|
|
| More |