Wednesday, August 11, 2010

Using hash for counting

When we have an array or a list of items and we want to find out the number of occurrences of a particular item then we generally use the following kind of logic:
 my $count = 0;
for (@list) {
$count++ if $_ eq "apple";
}
This can be made better by using the grep function like:
    $count = grep $_ eq "apple", @list;
Here we use the list returned by grep in a scalar context here by getting the number of elements in the grep -ed list.However if we have to find the count for more than one element in the list, then by this approach we need to make repetitive sentences like:
    $count_apples = grep $_ eq "apple", @list;
$count_pears = grep $_ eq "pear", @list;
A better method will be something like:
    my %histogram;
$histogram{$_}++ for @list;
This hash has a count of each individual item in the list and it also traverses the list only once.Moreover, to find number of unique elements in the list,all we have to do is:
    $unique = keys %histogram;
In order to find the five most popular items in the list we can do something like:
    @popular = (sort { $histogram{$b} <=> $histogram{$a} } keys %histogram)[0..4];
This sorts the unique elements of the list ie the keys of hash, in a descending order and pulls out only the top five.

No comments: