Let’s say you want to build a trigram from the first and last name of a person.
The trigram can equal to the first letter of the first name and the first 2 letters of the last name of the person.
The characters of the trigram are usually capitalized.
For example:
- “Jéremie Litzler” outputs “JLI”
- “John Doe” outputs “JDO”
- “Maxime Fèvre” outputs “MFE”
- “Carlos Di Montis” outputs “CDI”.
Let’s review the code in Python for this.
AI First Attempt
Today, I’ll share how AI can speed (and where it shows its limitation) the coding process a draft of the code for the above specifications.
It provided me with this (I edited the comments):
|
|
Let’s Test This
|
|
Did you get a passing test? I didn’t! The code succeeds on 2 out of 4, failing on “Carlos Di Montis”.
Let’s not blame the AI completely, the specification didn’t mention that the first name comes first and what follows the first space represents the last name, no matter the number of parts in it.
What Won’t Wrong
First, with full_name.split()
, you get an array of 3 strings: “Carlos”, “Di”, “Montis”.
Next, why take the last element of the array as the actual last name?
Of course, I could have written clearer specifications, but that’s what you call iterative programming!
Let’s Fix The Specification and The Code
Firstly, our new requirement is to take the first 2 characters of the lastname, which may contain spaces. So we need to split the full name at the first space, instead of “at each space”.
But how do you fix the split issue?
Split in Python can take two arguments: the first one is the delimiter and the second tells after how many occurrences to stop.
The delimiter is a space and we need to stop after one occurrence (assuming that the full name excludes multiple first names, of course).
So, we update the line:
|
|
into :
|
|
Let’s run the test. And again, it fails…
What Is The Next Issue
The code doesn’t evaluate “Maxime Fèvre” to “MFE” but “MFV”. Why?
The accent, of course! Why? The accent was skipped, because it’s a special character and Python acted as if it wasn’t present.
Luckily, there is a solution for that: we call it “Unicode normalization” and we have 4 forms out there. For details, you can read this detailed article.
In our Maxime Fèvre, we find an accent in the last name.
To remove it and keep the unaccented “e”, we’ll use NFKD normalization form in the following code:
|
|
This function uses the unicodedata
module to handle Unicode characters. Here’s how it works:
unicodedata.normalize('NFKD', input_str)
:- We use the
normalize
function to convert the string to a normalized form. - “NFKD” stands for “Normalization Form Compatibility Decomposition”.
- This decomposition separates the base characters from their diacritical marks (accents).
- For example, it decomposes “è” into “e” and the accent mark.
- We use the
[character for character in nfkd_form if not unicodedata.combining(c)]
:- This is a list comprehension that iterates through each character in the normalized form.
unicodedata.combining(character)
returnsTrue
for characters that are combining marks (like accents).- The
not
inverts this, so we keep only characters that aren’t combining marks.
''.join([...])
:- This joins all the kept characters back into a string.
So, essentially, the function works by:
- Splitting each character into its base form and any accent marks.
- Keeping only the base characters and discarding the accent marks.
- Rejoining the remaining characters into a string.
For example, with the last name “Fèvre”:
- It’s normalized to something like
['F', 'e', '
“, “v”, ’r’, “e”]` - The accent ‘`” is discarded because it’s a combining character
- It joins the the remaining characters into a string again, resulting in “Fevre”
This method is particularly effective because it works for many accented characters and other diacritical marks across many languages, not just French accents.
Now, you can now use it in the evaluated method:
|
|
Hooray! The test is passing. 🎇
Final Solution (Don’t Cheat By Clicking The Table Of Contents To Soon!)
So let’s put it together.
|
|
Conclusion
OK, I have lessons to share:
- First: don’t trust the AI on the first shot. Always review the code.
- Second: Always test the code and find the edge cases 😊.
Yes, you’ll need to think thoroughly to code a complete solution. The AI didn’t guess the last name or the accents issue.
You can because you pick good test sets and, in the end, good tests will provide a complete solution. Who said AI would replace software engineers? 😋
And beware of accents in general!
Follow me
Thanks for reading this article. Make sure to follow me on X, subscribe to my Substack publication and bookmark my blog to read more in the future.