Monday, 17 January 2011

The C Programming Language (K&R) 01x0E—Histogram of Chars—Exercise 1-14


I am reading “The C Programming Language” by Brian Kernighan and Dennis Ritchie aka K&R. Making notes. Running their sample code. Writing code, programs, based on their exercises. Blogging about it here. I wouldn’t normally do these exercises, but it is beneficial and it’s satisfying to see the code work.

Exercise 1-14.

Write a program to print a histogram of the frequencies of different characters in its input.

A histogram is a graph that shows the frequency or count of some variable over a range. In this exercise it’s the frequency of characters entered by the user at the keyboard. Since the authors say “different characters,” it’s ambiguous. If I wrote code to account of every possible character, the graph would be too big. So, I will create groups of characters: whitespace, lowercase alphabet, uppercase alphabet, digits, other. This means every character typed by the user will be counted and analyzed, by group, not individually.

The Pseudocode.

While not EOF
            get input from user through keyboard
            increment counter for character by grouping
                        1. lowercase alpha
                        2. whitespace
                        3. uppercase alpha
                        4. digit
                        5. other
Display histogram

We don’t need to keep the actual data, just counts of them.

In an ASCII environment the asterisk (*) is often used to graph the count. A one-to-one ratio seems as if it would be too big. Try five-to-one.

In testing the input to see what group it falls into, I will use a series of if-else if-else statements. This code tests until it finds a true condition. Once true is found, it bypasses all the remaining tests. This means to make the code more efficient, the most likely character input should be tested first (lowercase) following in order with the last being the least common.

Time to write the code.

*    *    *

No surprises in coding. The only question is the ratio to use to condense the count. 5 to 1 is good for a lot of data, but with a small sample I changed it to 2 to 1 and used a variable to hold this value.

I also used functions for the tests because I already had a IsWhitespace function and reused it. I find it easier to read using functions. It makes it clear what’s being tested.

In testing a character it is possible to use a literal number or a character in single quotes. For example, to test if c is a space you can write: c = 32 or c = ‘ ’. Either work and they both work with other logical operators like >= or <=.

In looking at the code, it would be possible to use an array for all counts and one for loop to display the histogram with a cascading if statement to display the category headings.


Sample Code.

I am using Visual C++ 2010 and created the sample code as a console application.

// Function prototypes
bool IsLowercase(char);
bool IsWhitespace(char);
bool IsUppercase(char);
bool IsDigit(char);

// The standard library includes the system function.
#include <cstdlib>

// C++ standard I/O library.
#include <cstdio>

int main()
{
     // Character input.
     char c;
     // Counters.
     int WhitespaceCount = 0;
     int DigitCount = 0;
     int LowercaseCount = 0;
     int UppercaseCount = 0;
     int OtherCount = 0;

     // Get user input and count.
     while ((c = getchar()) != EOF) {
           if (IsLowercase(c))
                ++LowercaseCount;
           else if (IsWhitespace(c))
                ++WhitespaceCount;
           else if (IsUppercase(c))
                ++UppercaseCount;
           else if (IsDigit(c))
                ++DigitCount;
           else
                ++OtherCount;
     }
     // Display histogram.
     printf("\n\nHISTOGRAM\n");

     // Ratio.
     // Compact count: n to 1.
     int ratio = 2;

     printf("\n Lowercase:\t");
     for (c = 1; c < LowercaseCount; c += ratio)
           printf("*");
     printf("\nWhitespace:\t");
     for (c = 1; c < WhitespaceCount; c += ratio)
           printf("*");
     printf("\n Uppercase:\t");
     for (c = 1; c < UppercaseCount; c += ratio)
           printf("*");
     printf("\n    Digits:\t");
     for (c = 1; c < DigitCount; c += ratio)
           printf("*");
     printf("\n     Other:\t");
     for (c = 1; c < OtherCount; c += ratio)
           printf("*");

     printf("\n\n");

     // Keep console window open.
     system("pause");

     // Return some value.
     return 0;

} // end main

bool IsLowercase(char t)
{
     // Return true for lowercase alpha chars.

     // a to z
     // ASCII 97 to 122
     if (t >= 97 && t <= 122)
           return true;

     // If we get here, not lowercase.
     return false;
} // IsLowercase

bool IsWhitespace(char t)
{
     // Return true for whitespace chars.

     // Space.
     if (t == 32)
     return true;

     // Newline.
     if (t == 10)
           return true;

     // Tab.
     if (t == 9)
           return true;

     // If we get here, no whitespace.
     return false;
} // IsWhitespace

bool IsUppercase(char t)
{
     // Return true for uppercase chars.

     // A to Z
     // ASCII 65 to 90
     if (t >= 65 && t <= 90)
     return true;

     // If we get here, not uppercase.
     return false;
} // IsUppercase

bool IsDigit(char t)
{
     // Return true for whitespace chars.

     // 0 to 9
     // ASCII 48 to 57
     if (t >= 48 && t <= 57)
           return true;

     // If we get here, no whitespace.
     return false;
} // IsDigit


Output.



No comments:

Post a Comment