In this post we will compare ChatGPT vs Gemini vs Grok vs Deepseek vs Claude for several uses cases to understand which one suits best for your needs using the available free models.
What we will test are the following things:
- Code generation
- Content generation
- Problem solving
What we will compare are the following aspects:
- Generation Speed
- Code/Content Quality/Plagiarism
- Limitations
- Robustness
- Readability
- Bugs/Issues
As a reminder, all the tests are executed using free models only.
AI Code Generation
I decided to use a short and simple, yet challenging prompt for code generation using python as the desired language since it’s quite popular.
The prompt is asking for a CSV parser script without using any external libraries, so it should be straightforward.
Prompt for all models:
1Create a python script that can parse CSV's, without using an external library.
Results/Comparison
Some models executed and returned a script very fast whilst some models took even 5 minutes to “think” 🥴 on it. All scripts however executed without any errors, which is nice.
Model | Generation Speed | Code Quality | Limitations | Robustness | Readability | Bugs/Issues |
---|---|---|---|---|---|---|
ChatGPT-4o | 19 seconds | Good, basic | No custom delimiter; only lists (not by column name) | Handles tricky CSVs with newlines | Clear, straightforward | May fail with broken CSV quotes |
ChatGPT-4o Think | 25 seconds | Excellent | Doesn’t handle newlines inside quotes | Great for normal CSVs | Very clear, well-documented | None obvious, warns about limits |
Claude Sonnet 4 | 30 seconds | Good, friendly | Doesn’t support newlines in quotes; no dict output | Works for standard files | Beginner-friendly | Warns if row lengths don’t match |
Claude Sonnet 4 Think | 37 seconds | Good, detailed | No newlines in quotes; basic output | Works for most files | Detailed, lots of examples | Warns about data mismatches |
DeepSeek | 62 seconds | Very good | No column name access; uses more memory for big files | Handles complex cases, newlines OK | Clean, simple | None obvious; may be slower on large files |
DeepSeek Think | 653 seconds | Simple | Basic only; no headers; can’t handle newlines in quotes | Works for simple CSVs only | Minimal, easy to follow | May break on complex CSVs |
Gemini 1.5 Pro | 29 seconds | Very good | Doesn’t handle newlines in quotes | Warns/skips broken rows | Friendly, tidy | Skips malformed lines |
Grok-3 | Basic | 9 seconds | No headers; can’t handle newlines in quotes | OK for basic CSVs | Simple, short | No error handling for broken files |
Grok-3 Think | 87 seconds | Basic | Very simple; no headers or complex cases | Fine for small/simple files | Short, readable | Minimal errors shown |
Overall Best
In my opinion, the winner between these models is ChatGPT-4o (Think). A balance between decent generation time and output.
- Most practical, especially for “normal” CSV files (no weird newlines inside cells).
- Offers both dictionary (column names) and list output.
- Customizable delimiter, good error handling, and clear feedback.
- Very easy to read, extend, and integrate.
- Works from command line or as an imported function.
If you want to see beautiful code, learn, or use it in your own scripts: ChatGPT-4o (Think) is the best. For most people and most files, ChatGPT-4o (Think) is the winner!
The scripts are available to download and view on my repository here .
AI Content generation
Code generation is by far the only thing AI’s are used for nowadays, as more and more people use AI to generate content. This ranges from blog posts, documents, emails and more.
Below we will test two categories: Email and Academic Style Writing.
Email Generation
For our Email content generation test, I will ask for a simple email pitch with the following prompt:
1Create an email pitch about my new floral shop. I sell cut flowers, often arranged into bouquets or floral designs. I also offer custom arrangements, provides daily or weekly flower deliveries, and may offer services like wedding or events styling.
Results/Comparison
Every AI/model executed ridiculously fast in generating the content so I will not include the Generation Speed column below in the table.
Model/Variant | Content Quality | Limitations | Robustness | Readability | Bugs/Artifacts |
---|---|---|---|---|---|
ChatGPT-4o | Very natural, clean | Slightly generic | Very strong | Excellent | None |
ChatGPT-4o Think | Creative, clear, human | Safe, but adds subtle marketing | Excellent | Very high | None |
Claude Sonnet 4 | Professional, warm | Slightly formal, a bit long | Very strong | Very high | None |
Claude Sonnet 4 Think | Detailed, sectioned | Overly verbose, too “website-like” | Robust | High | None, but too long for a pitch email |
DeepSeek | Friendly, clear | Adds “P.S.”, slight template feel | Strong | Very high | None, but slightly generic |
DeepSeek Think | Multiple subject/body options | Multiple full emails in one file | Good | Good | Did NOT follow “one email” rule; too many choices |
Gemini 1.5 Pro | Polished, professional | Three emails in one (for diff. clients) | Good | High | Ignored “one email” rule; too much per file |
Grok 3 | Warm, direct | Slightly repetitive language | Good | Good | None, but a bit formulaic |
Grok 3 Think | Friendly, clear, sectioned | Long intro, slightly “chunky” format | Good | Good | None, just a bit segmented |
Overall Best
In my opinion ChatGPT-4o (both versions) wins because of:
- Producing a single, ready to use, natural email per file.
- No formatting oddities, no AI artifacts, no excessive length, and high readability.
It wrote the most natural, easy to read, and professional-sounding email. It followed my instructions exactly (one email per file, no extra formatting or AI mistakes), so you can use it’s email pitch right away with just a few personal details added.
The contents generated are available to download and view on my repository here .
Academic Style Writing
It’s no surprise that more and more people are using AI’s to generate academic style writing, however most of the content generated will fail due to plagiarism checks because of either the AI not generating unique content, either because of the prompt input.
For our second test, I will ask for a short essay with the following prompt:
1Create a short essay (max 1000 words) about the evolution of CPU's, using academic style writing and unique content. Do not use content from already existing essays or sources. Include references where appropriate.
Results/Comparison
Model/Variant | Content Quality | Limitations | Robustness | Readability | Bugs/Artifacts |
---|---|---|---|---|---|
ChatGPT-4o | Structured, academic, concise | Slightly formulaic; no narrative flair | Strong, up-to-date | High (for technical readers) | None |
Claude Sonnet 4 | Encyclopedic, narrative | Verbose, minor repetition | Comprehensive | Smooth, accessible | None; slightly wordy |
DeepSeek | Concise, factual, survey-like | Less context, abrupt transitions | Focused, accurate | Moderate (technical) | None |
Grok-3 | Engaging, thematic | Occasional generalization | Broad, accessible | Very high | None; minor cliché |
Gemini 2.5 Pro | Technical, rigorous | Dense, expects technical background | Very robust | Lower (non-technical) | None |
Overall Best
The winner is Grok 3 in my opinion for academic style writing.
- Uses storytelling and accessible metaphors (“technological odyssey”), making it pleasant to read for non-specialists.
- Walks the reader chronologically through CPU history while covering modern themes (multi-core, specialization, the future).
- Explains key concepts without overwhelming the reader with jargon or dry technicalities.
- You don’t need a deep technical background to follow and enjoy it.
Plagiarism Check
I used the Plagiarism Checker from Grammarly to check the content and see where each AI stands.
You can see the results in the following table:
Model/Variant | Plagiarism | Grammar | Spelling | Punctuation | Conciseness | Readability |
---|---|---|---|---|---|---|
ChatGPT-4o | 8 writing issues | OK | FAIL | FAIL | FAIL | OK |
Claude Sonnet 4 | 8 writing issues | FAIL | OK | OK | FAIL | OK |
DeepSeek | 2 writing issues | OK | OK | FAIL | OK | OK |
Grok-3 | 8 writing issues | OK | OK | OK | FAIL | OK |
Gemini 2.5 Pro | 22 writing issues | FAIL | FAIL | OK | FAIL | OK |
The winner is clearly Grok 3 even though it’s not quite perfect; you can fix the small issues yourself and have an “award winning” essay 😊 (lol)
AI Problem Solving
A + B Integral Problem
It’s well know already that AI’s have wast amount of computing power and knowledge, but how do they compare to one another?
Let’s use this popular math quiz provided in some high schools:
1A = Integral from 0 to 1 of ex^2
2B = Integral from 1 to e2 of ln(√x)
3Find A + B
Results/Comparison
This is where every AI’s issues will start to shine. Every single AI struggled to provide a ready copy/paste solution, which is the simplest part of the “problem”. The math was done, but it wasn’t actually copiable, so I had to ask several times in several formats until I could somehow save it as .txt files for you.
Model/Variant | Code Quality | Limitations | Robustness | Readability | Copy-Paste Friendliness |
---|---|---|---|---|---|
Claude 4 Sonnet | Excellent | None | Very High | Excellent | Best (easy, markdown, stepwise) |
DeepSeek | Excellent | Slight header overuse | Very High | Excellent | Excellent |
ChatGPT-4o | Excellent | None | Very High | Excellent | Excellent |
Grok 3 | Good | Verbose, slightly cluttered | High | Good | Good |
Gemini 1.5 Pro | Adequate | No exact symbolic answer | High | Excellent | Good (but summary only) |
The one that stands out here is Claude Sonnet 4, which is the winner in my opinion. For a quick numeric result, Gemini is fastest, but for complete clarity and reusability, stick with Claude, DeepSeek, or ChatGPT-4o.
Broken Code Problem
Let’s kick this up a notch and see how smart are AI’s between them, by asking to fix a piece of broken plain C code:
1#include <stdio.h>
2#include <stdlib.h>
3#include <string.h>
4
5char* copy_string(const char* src) {
6 char* dest;
7 strcpy(dest, src);
8 return dest;
9}
10
11int main() {
12 char* original = "Hello, world!";
13 char* copy = copy_string(original);
14
15 printf("Copied string: %s\n", copy);
16
17 return 0;
18}
What’s wrong with the above code? Let me explain:
dest
incopy_string
is used uninitialized, no memory allocated.- Using
strcpy(dest, src)
with an uninitialized pointer causes undefined behavior and likely a crash. - The memory for the copy (if it had been allocated) is never freed—potential memory leak.
- The code prints the copy without checking for success.
With the above code, we formulate the prompt as follows:
1Fix the following code for me and provide summary of fixes:
2```
3#include <stdio.h>
4#include <stdlib.h>
5#include <string.h>
6
7char* copy_string(const char* src) {
8 char* dest;
9 strcpy(dest, src);
10 return dest;
11}
12
13int main() {
14 char* original = "Hello, world!";
15 char* copy = copy_string(original);
16
17 printf("Copied string: %s\n", copy);
18
19 return 0;
20}
21```
Results/Comparison
Honestly, every model produced a correct and professional fix.
Model/Variant | Content Quality | Limitations | Robustness | Readability | Bugs/Artifacts |
---|---|---|---|---|---|
Claude 4 Sonnet | Excellent (edge-case handling, clear) | Slightly verbose summary | Checks for NULL input and allocation; frees memory | Very clear, neat | None |
DeepSeek | Excellent (succinct, correct) | Slightly less verbose on input validation | Checks allocation; error handling; frees memory | Clear, concise | None |
ChatGPT-4o | Excellent (concise, covers all) | No NULL input check (for src) | Checks allocation; error handling; frees memory | Very readable | None |
Grok 3 | Excellent (thorough, professional) | Exits on alloc fail (not best for libraries); no NULL input check | Handles allocation error; frees memory | Slightly verbose | None |
Gemini 1.5 Pro | Excellent (professional, extra detail) | No explicit input NULL check; lots of comments | Handles alloc errors, sets pointer NULL after free | Very readable | None |
All models produced a correct and professional fix. Claude 4 Sonnet went above and beyond with edge-case handling and explanation, but all answers are solid and suitable for copy/paste into a C project. No model introduced any new errors.
The codes generated are available to download and view on my repository here .
Conclusion
After testing the latest generation of AI models on three very different tasks like academic essay writing, business email marketing, and hands-on coding, I found that no single AI rules them all. Instead, each model brings its own strengths, quirks, and ideal use cases.
But, Who Wins Overall?
It depends on what you need:
- Ready to automate or build something serious? Go with ChatGPT-4o or Claude 4 Sonnet for code.
- Need friendly, customer focused communication? Grok-3.
- Want to inform and delight readers? Grok-3 is your best friend.
There is no single “best” AI, just the right tool for the right job.
The smartest way to use AI is to match the model to your mission, because as this experiment shows, even the most advanced bots have their own personalities and strengths.
References/Links
Thank you for taking the time to read my article and please feel free to share it with friends.
Comments