ePolyglot: Examination and development of multilanguage programming using Eiffel, Python, and Haskell | ||
---|---|---|
Prev | Chapter 13. Haskell from Eiffel | Next |
We'll walk through our sample program in somewhat more detail this time, because the process of getting Haskell to play nicely with Eiffel was a slight bit more complex than the Eiffel/Python interactivity earlier. First, the problem.
The Software Engineering Institute's Personal Software Process training program contains several exercises, most having to do with statistical analysis and prediction. The program we're going to do as a sample is that of linear regression, program 4a in the PSP curriculum:
Write a program to calculate the linear regression size-estimating parameters for a set of n programs where historical object LOC and new and changed LOC data are available. | ||
--Humphrey95 |
Essentially, this just requires reading pairs of numbers from a file and doing a regression analysis to find the Beta-0 and Beta-1 parameters for linear regression. In terms of multilanguage programming, we'll divide the problem thusly: Eiffel, with its strong typing and program control, will handle the logic and flow; Python, with its powerful string and file manipulation facilities, will handle the reading of data from the data file, and Haskell, with its superior functional approach, will handle the actual calculation.
Note: This is, of course, a demonstration program. When we finish it in the next section, the final executable will be 2.1 megabytes before stripping, and 875k afterward--a far cry from the 30k or 21k from the pure Eiffel or C++ versions of this program!
To show why Haskell is a good language for this sort of analysis, it's interesting to compare similar code in C++ and Eiffel. In C++, the code to calculate the regression parameters looks something like this, where m_xs and m_ys are linked lists holding the data for analysis, and supporting functions (such as the x_mean and y_mean functions) are not listed for clarity:
Example 13-1. C++ code for the regression parameters
double paired_number_list::beta_1_numerator( void ) const { double Result = 0; list<double>::const_iterator x_iter; list<double>::const_iterator y_iter; for( x_iter = m_xs.begin(), y_iter = m_ys.begin(); (x_iter != m_xs.end()) && (y_iter != m_ys.end()); ++x_iter, ++y_iter ) { Result += (*x_iter)*(*y_iter); } Result -= static_cast<double>(entry_count())*x_mean()*y_mean(); return Result; } double paired_number_list::beta_1_denominator( void ) const { double Result = 0; list<double>::const_iterator x_iter; list<double>::const_iterator y_iter; for( x_iter = m_xs.begin(), y_iter = m_ys.begin(); (x_iter != m_xs.end()) && (y_iter != m_ys.end()); ++x_iter, ++y_iter ) { Result += (*x_iter)*(*x_iter); } Result -= static_cast<double>(entry_count())*x_mean()*x_mean(); return Result; } double paired_number_list::beta_1( void ) const { return beta_1_numerator() / beta_1_denominator(); } double paired_number_list::beta_0( void ) const { return y_mean() - beta_1() * x_mean(); } |
The equivalent code in Eiffel looks something like this:
Example 13-2. Eiffel code for the linear regression analysis
beta_1_numerator : DOUBLE is local index : INTEGER do Result := 0 from index := xs.lower until not ( xs.valid_index( index ) and ys.valid_index( index ) ) loop Result := Result + xs.item(index) * ys.item( index ) index := index + 1 end Result := Result - entry_count.to_double * x_mean * y_mean end beta_1_denominator : DOUBLE is local index : INTEGER do Result := 0 from index := xs.lower until not ( xs.valid_index( index ) and ys.valid_index( index ) ) loop Result := Result + xs.item(index) * xs.item( index ) index := index + 1 end Result := Result - entry_count.to_double * x_mean * x_mean end beta_1 : DOUBLE is do Result := beta_1_numerator / beta_1_denominator end beta_0 : DOUBLE is do Result := y_mean - beta_1 * x_mean end |
In contrast, the Haskell code for the same is not only much more terse, but also surprisingly readable and understandable once you get past a few small hurdles:
Example 13-3. Equivalent code in Haskell
beta_1 :: [Double] -> [Double] -> Double beta_1 xs ys = top / bottom where top = sum( mult_terms xs ys ) - dcount xs * average xs * average ys bottom = sum ( map square xs) - dcount xs * square ( average xs ) beta_0 :: [Double] -> [Double] -> Double beta_0 xs ys = average ys - beta_1 xs ys * average xs |
The above beta_1 and beta_0 functions are those we'd like to use from Haskell--they are clean and straightforward, each taking two lists of double-precision values ("[Double]" in the function signature) and returning a double-precision value as a result. Our sample program in this chapter will use static arrays of numbers in order to avoid the extra hassle of reading the numbers from a file with Python; that happens next chapter.