CSV parser by Regular Expression

Each column in a CSV file is seperated by a comma (,), if the field contains a comma inside, then it has to be bound by a pair of quote (“). Here below is an example:

0566000,1005660003,UG,MEDUU,MEDU_PGF,,MED,MBCHB,F,PGM,MEDUU,F,MBCHB,”M.B., CH.B.”,UG

Therefore, we cannot simply parse the file by string.split(new char[] {,}). In this passage, I will use the regular expression instead to separate each column. First, we define the regular expression we use:

(((\”[^\”]*\”)|([^\,\”]*))\,{1})|(\”[^\”]*\”)|([^\,\”]*)

the source code would be:

string s = sr.ReadLine();
Regex Reg = new Regex(“(((\\\”[^\\\”]*\\\”)|([^\\,\\\”]*))\\,{1})|(\\\”[^\\\”]*\\\”)|([^\\,\\\”]*)”, RegexOptions.IgnoreCase);
MatchCollection match = Reg.Matches(s);
string[] columns = new string[match.Count];
for (int j=0; j < match.Count ; j++)
{
columns[j] = match[j].Value;
}

So, we get the result.

Of course, you will then need to remove the comma or quote if you want. But at least we can parse them without any problem.