The code and output belowdemonstrates some of RinRuby’s features. Ruby code counts the number ofoccurences of each word in Lincoln’s Gettysburg Address and filtersout those occurring less than three times or shorter than four letters.R code — through the RinRuby library — produces a bar plot of the mostfrequent words and computes the correlation between the length of aword and the usage frequency. Finally, the computed correlation isprinted by Ruby. The text for the "gettysburg.txt" file is here.Code:tally = Hash.new(0)File.open('gettysburg.txt').each_line do |line| line.downcase.split(/\W+/).each { |w| tally[w] += 1 } end total = tally.values.inject { |sum,count| sum + count } tally.delete_if { |key,count| count < 3 || key.length < 4 } require "rinruby" R.keys, R.counts = tally.keys, tally.values R.eval <<EOF names(counts) <- keys barplot(rev(sort(counts)),main="Frequency of Non-Trivial Words",las=2) mtext("Among the #{total} words in the Gettysburg Address",3,0.45) rho <- round(cor(nchar(keys),counts),4) EOF puts "The correlation between word length and frequency is #{R.rho}." Output: The correlation between word length and frequency is -0.2779. |
Example: Gettysburg
Subpages (1):gettysburg.txt