Introducing Tiny BPE TrainerMost modern NLP models today from GPT to RoBERTa, rely on subword tokenization using Byte Pair Encoding (BPE). But what if you want to train your own vocabulary in pure C++? Meet Tiny BPE Trainer - a blazing-fast, header-only BPE trainer written in modern C++17/20, with zero dependencies,...

Introducing Modern Text TokenizerModern natural language processing (NLP) models like BERT, DistilBERT, and other transformer-based architectures rely heavily on effective tokenization. But C++ developers often face limited options like bloated dependencies, poor Unicode support, or lack of compatibility with...

Embarking on a journey into the world of programming can be both exciting and overwhelming. With countless programming languages to choose from, it’s essential to pick the right one that aligns with your goals and aspirations. In this article, we will explore the factors to consider when choosing your first programming...

If you are playing around with .NET and you come from a C/C++ background like myself, you will quickly notice the data types are different. This post covers most common data types from C/C++ to .NET (C#) for your convenience in development. When you search for a data type just hit CTRL + F and the browser will help you...

If you are here reading this post, it means you are looking for a way to convert your DOS and/or NT paths for your software. Rest assured that this is what you will learn here today! The problem with Windows paths are clear; it’s so confusing. Let me repeat that, so confusing. This becomes a problem when you are...