Skip to main content
RegEx Corner

RegEx Corner: Restore trailing characters in a file

Back-end Development

Regular expressions are an invaluable development tool, and also extremely handy for non-developers who need to comb through plain text in an editor. In this article, we'll look at a simple regex problem and dissect a possible solution.

The Problem

We have a large CSV file that has been corrupted in a particular way: every line should end with a double quote character, but for some reason, many but not all of the lines are missing this character. We'd like a way to add a quote to the end of the lines where they are missing.

The Text

1,A,"apple"
2,B,"banana"
3,C,"cherry
4,D,"durian"
5,E,"elderberry
6,F,"Fanta™"
7,G,"grape
8,H,"honeydew"

The Regex

Plain regular expressions are very good at finding something, but not finding the absence of something. This capability is granted by some extensions like negative assertions which we will cover in another article. But today we'll look at how to solve this problem with just the basic toolset, but reframing the problem.

Instead of looking for a line that does not have a quote at the end, look for a line that has something other than a quote at the end. Regular expressions offer character classes for this task:

[^"]

Square brackets say "match one character from the set given here." For example, [abc] would match either a, b, or c. If the character class starts with a caret as ours does here, that means "match one character that isn't in the set given here." So this will match a single non-quote character.

Next, we want to make sure this character is at the end of the line, so we use the special end-of-line marker $ for that:

[^"]$

This will now grab the final character of each of these lines. We don't want to modify this character, though, but rather to add something after it. So in our replacement string, we need to make sure to preserve the matched text as part of the replacement.

$0"

The token $0 means "everything that was matched," so this replacement gives us the final non-quote character we found, followed by the new quote we're adding.

The Result

1,A,"apple"
2,B,"banana"
3,C,"cherry"
4,D,"durian"
5,E,"elderberry"
6,F,"Fanta™"
7,G,"grape"
8,H,"honeydew"

Try It Yourself

At Regex 101, you can freely play with regular expressions and generate code specific to your programming language.

Experiment with this regular expression at Regex 101

See you next time!