Tuesday 18 January 2011

The C Programming Language (K&R) 01x10—Histogram of Words—Exercise 1-13


I’m making my way through “The C Programming Language” by Brian Kernighan and Dennis Ritchie aka K&R.

Exercise 1-13.

Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.

My previous entry (here) was one answer to this exercise. This entry takes the previous code and makes two changes.
 
1. Display the histogram vertically instead of horizontally.
2. Fix a bug in the word parsing.

Displaying the histogram vertically is more difficult. In the previous code, you only looked at one length of word at a time. Start with the count for 1 letter words and print as many asterisks as needed. Then go to the 2 letter words and so on. And there were 14 such categories so you loop 14 times for each length. Straightforward. Not so with a vertical display.

Here you have to add each row for all the word lengths which may have a value or not. That means choosing to print an asterisk or space. So row by row you check each word length to see if there’s an asterisk to add. But how long do you keep this up? It’s not the size of the array. It’s the maximum value in the any one of the elements. There may be 30 three-letter words and 20 four-letter words. You don’t stop at 20. To figure that out I introduced a variable, maxrows, and looped through the array to find the largest count value. If that’s 20, then there has to be one column with 20 asterisks.

As for the bug, the previous code focused on whitespace. Anything that wasn’t whitespace formed a word or part of a word. It shouldn’t. I wanted to change the code so all punctuation and the like were also ignored. So my IsWhitespace function became IsNoneWordChar. But instead of testing for exclusions like say, a period, I tested for valid word characters (i.e., a-z, A-Z, 0-9). Same effect, but different approach.

One of the toughest parts was getting the histogram to line up in the right columns. It took some doing, but I finally figured what was needed.


Sample Code.

I am using Visual C++ 2010 and created the sample code as a console application.

// Function prototypes
bool IsNoneWordChar(char);

// The standard library includes the system function.
#include <cstdlib>

// C++ standard I/O library.
#include <cstdio>

int main()
{
     // Character input.
     char c = 0;
     // Previous character.
     char prevChar = 0;
     // Word Length.
     int WordLength = 0;
     int WordLengthCount[15];

     // Initialize array with zero.
     for (c = 0; c < 15; ++c)
           WordLengthCount[c] = 0;

     // Process user input from keyboard.
     while ((c = getchar()) != EOF) {
           if (IsNoneWordChar(c))
                if (IsNoneWordChar(prevChar))
                     // Ignore extra non-word char.
                     ; // Null stmt.
                else {
                     // Prev char word char.
                     // Means end of word.
                     // Store length in array.
                     if (WordLength < 14)
                           ++WordLengthCount[WordLength];
                     else
                           // Words 14 chars or longer.
                           ++WordLengthCount[14];
                     // Reset for new word.
                     WordLength = 0;
                }
           else {
                // Is a word char, therefore not end of word.
                // Increment counter for word length.
                ++WordLength;
           }

           // Set prev char to current.
           prevChar = c;

     } // end while

     // Display histogram.
     printf("\n\nHISTOGRAM - Word length - Bottom to Top\n\n");

     // Counters.
     int i, col;
     int maxrows = 0;

     // Find max rows.
     for (i = 1; i < 15; ++i) {
           if (WordLengthCount[i] > maxrows)
                maxrows = WordLengthCount[i];
     }

     // Bottom to top approach.
     // Loop through the 14 word length categories
     // and display the number as label.
     for (i = 1; i < 15; ++i) {
           // Label
           if (i < 10)
                printf("  %d", i);
           else
                printf(" %d", i);
     }
     printf("\n\n");

     // Loop row by row, count 1, 2, 3...max
     // One column for each length of word.
     for (i= 1; i <= maxrows; ++i) {
           for (col = 1; col < 15; ++col) {
                if (WordLengthCount[col] >= i)
                     printf("  *");
                else
                     printf("   ");
           }
           printf("\n");
     }

     printf("\n\n");

     // Top to bottom approach.

     // Header.
     printf("\n\nHISTOGRAM - Word length - Top to Bottom\n\n");

     // Loop row by row, count max to 1.
     // One column for each length of word.
     for (i= maxrows; i > 0; --i) {
           for (col = 1; col < 15; ++col) {
                if (WordLengthCount[col] >= i)
                     printf("  *");
                else
                     printf("   ");
           }
           printf("\n");
     }

     // Loop through the 14 word length categories
     // and display the number as label.
     for (i = 1; i < 15; ++i) {
           // Label
           if (i < 10)
                printf("  %d", i);
           else
                printf(" %d", i);
     }
     printf("\n\n");

     // Keep console window open.
     system("pause");

     // Return some value.
     return 0;

} // end main


Output.


To show the word parsing has been fixed.


 
I wrote the code starting from the top down because it was easier to see if it was working. I then reversed the print loop to display it in a more traditional way. You can compare each method.






No comments:

Post a Comment