Introducing Tiny BPE TrainerMost modern NLP models today from GPT to RoBERTa, rely on subword tokenization using Byte Pair Encoding (BPE). But what if you want to train your own vocabulary in pure C++? Meet Tiny BPE Trainer - a blazing-fast, header-only BPE trainer written in modern C++17/20, with zero dependencies,...

Introducing Modern Text TokenizerModern natural language processing (NLP) models like BERT, DistilBERT, and other transformer-based architectures rely heavily on effective tokenization. But C++ developers often face limited options like bloated dependencies, poor Unicode support, or lack of compatibility with...

If you are playing around with .NET and you come from a C/C++ background like myself, you will quickly notice the data types are different. This post covers most common data types from C/C++ to .NET (C#) for your convenience in development. When you search for a data type just hit CTRL + F and the browser will help you...

Runtime Encrypted StringsToday we will go through the basics of runtime encrypted strings, why do we need to encrypt our strings and learn how to create our own. In this article you will understand and learn: What is runtime encryption and decryption Why do you need to encrypt your strings See how anybody can see your...