Serdar Yegulalp
Senior Writer

Yandex open sources CatBoost machine learning library

news
Jul 18, 20172 mins
Data ScienceMachine LearningOpen Source

The Russian search giant has released its own system for machine learning, with trained results that can be used directly in Apple's Core ML system

Machine learning gets down to business
Credit: geralt

Russian search engine creator Yandex has joined the ranks of Google, Amazon, and Microsoft by releasing its own open source machine learning library, CatBoost.

The Apache-licensed CatBoost is for “open-source gradient boosting on decision trees,” according to its GitHub repository’s README. It provides a way to perform classifications and rankings of data by using a collection of decision-making mechanisms, or “learners,” rather than a single one. Results generated by the learners are weighted and classified based on the strengths and weaknesses of each learner. By combining many learners, CatBoost can yield better results than decision-making systems that rely on individual learners.

CatBoost comes with support for Python and R, as well as a command-line interface to drive the machine learning library. The Python packages for CatBoost also include data visualization tools for plotting statistics of the training process. The resulting plots can be viewed in a Jupyter notebook or in CatBoost’s own data viewer application.

Many machine learning libraries already implement some manner of gradient boosting algorithm. Python’s Scikit-learn package has one versionXGBoost is available for multiple languages and data platforms; and Microsoft has the LightGBM library as part of its Distributed Machine Learning Toolkit project.

CatBoost is meant to stand apart from those projects, according to Yandex, by being pre-tuned to perform at scale for Yandex’s own services. Yandex noted that it uses CatBoost to deliver predictions for its weather services, and that CatBoost has been deployed at the European Organization for Nuclear Research (CERN) to refine results from the particle experiments conducted there. 

Trained models created in CatBoost can be deployed in Apple’s Core ML format, for use in MacOS, iOS, tvOS, and watchOS apps backed by machine learning.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author