Freitag, 25. März 2011

Most recent marks

Problem We have two files with the grades of the students. One file representing the first exam -  marks1.txt - the second file the second exam (for mending ones mark - thus overwriting the first mark) - marks2.txt.

The second exam though is optional.

The exam file syntax is:


For sorting we have to be able to identify from which file the entry stems from - so we prepend a "file-id" column - with value 2 for 1st exam and 1 for 2nd exam.

We sort now both files, firstly on the student-id (column 2) and secondly on the file-id (column 1).

i.e.:
marks1.txt:
012  3             -->   2 012 3


marks2.txt
012  4            -->    1 012 4

sort -k 2,2 -k 1,1 
       <(
         sed -e 's/^/2 /' marks1.txt
        ) 
 
       <(
         sed -e 's/^/1 /' marks2.txt
        )

After sorting we obtain:
1 012 4
2 012 3

If we have duplicates we only want to retain the first line (the most recent mark).
So lets first remove the file-id column again with a
cut --delimiter=' ' --fields=2- 

Then remove duplicates, checking on the student-id column (1st column 3 bytes wide):
uniq -w 3

Finally, we wan to link the student-ids to the student's names.
Let's say we have a file id_names.txt with the format:


All we have to do is make a join the student-id column.

The whole code:
join 
  <(
    sort -k 2,2 -k 1,1 
       <(
         sed -e 's/^/2 /' marks1.txt
        ) 
 
       <(
         sed -e 's/^/1 /' marks2.txt
        )
    |
     cut --delimiter=' ' --fields=2- 
    |  
     uniq -w 3
   )

  <(
    sort id_names.txt
   )

  > marks.txt
d