Our sample problem: Linear Regression

We'll walk through our sample program in somewhat more detail this time, because the process of getting Haskell to play nicely with Eiffel was a slight bit more complex than the Eiffel/Python interactivity earlier. First, the problem.

The Software Engineering Institute's Personal Software Process training program contains several exercises, most having to do with statistical analysis and prediction. The program we're going to do as a sample is that of linear regression, program 4a in the PSP curriculum:

 

Write a program to calculate the linear regression size-estimating parameters for a set of n programs where historical object LOC and new and changed LOC data are available.

 
--Humphrey95 

Essentially, this just requires reading pairs of numbers from a file and doing a regression analysis to find the Beta-0 and Beta-1 parameters for linear regression. In terms of multilanguage programming, we'll divide the problem thusly: Eiffel, with its strong typing and program control, will handle the logic and flow; Python, with its powerful string and file manipulation facilities, will handle the reading of data from the data file, and Haskell, with its superior functional approach, will handle the actual calculation.

Note: This is, of course, a demonstration program. When we finish it in the next section, the final executable will be 2.1 megabytes before stripping, and 875k afterward--a far cry from the 30k or 21k from the pure Eiffel or C++ versions of this program!

To show why Haskell is a good language for this sort of analysis, it's interesting to compare similar code in C++ and Eiffel. In C++, the code to calculate the regression parameters looks something like this, where m_xs and m_ys are linked lists holding the data for analysis, and supporting functions (such as the x_mean and y_mean functions) are not listed for clarity:

Example 13-1. C++ code for the regression parameters


double
paired_number_list::beta_1_numerator( void ) const
{
  double Result = 0;
  list<double>::const_iterator x_iter;
  list<double>::const_iterator y_iter;

  for( x_iter = m_xs.begin(), y_iter = m_ys.begin();
       (x_iter != m_xs.end()) && (y_iter != m_ys.end());
       ++x_iter, ++y_iter )
    {
      Result += (*x_iter)*(*y_iter);
    }
  Result -= static_cast<double>(entry_count())*x_mean()*y_mean();
  return Result;
}

double
paired_number_list::beta_1_denominator( void ) const
{
  double Result = 0;
  list<double>::const_iterator x_iter;
  list<double>::const_iterator y_iter;
  for( x_iter = m_xs.begin(), y_iter = m_ys.begin();
       (x_iter != m_xs.end()) && (y_iter != m_ys.end());
       ++x_iter, ++y_iter )
    {
      Result += (*x_iter)*(*x_iter);
    }
  Result -=  static_cast<double>(entry_count())*x_mean()*x_mean();
  return Result;
}

double
paired_number_list::beta_1( void ) const
{
  return beta_1_numerator() / beta_1_denominator();
}

double
paired_number_list::beta_0( void ) const
{
  return y_mean() - beta_1() * x_mean();
}

The equivalent code in Eiffel looks something like this:

Example 13-2. Eiffel code for the linear regression analysis

   beta_1_numerator : DOUBLE is
      local
         index : INTEGER
      do
         Result := 0
         from
            index := xs.lower
         until
            not ( xs.valid_index( index ) and ys.valid_index( index ) )
         loop
            Result := Result + xs.item(index) * ys.item( index )
            index := index + 1
         end
         Result := Result - entry_count.to_double * x_mean * y_mean
      end
   
   beta_1_denominator : DOUBLE is
      local
         index : INTEGER
      do
         Result := 0
         from
            index := xs.lower
         until
            not ( xs.valid_index( index ) and ys.valid_index( index ) )
         loop
            Result := Result + xs.item(index) * xs.item( index )
            index := index + 1
         end
         Result := Result - entry_count.to_double * x_mean * x_mean
      end

   beta_1 : DOUBLE is
      do
         Result := beta_1_numerator / beta_1_denominator
      end
   
   beta_0 : DOUBLE is
      do
         Result := y_mean - beta_1 * x_mean
      end
   

In contrast, the Haskell code for the same is not only much more terse, but also surprisingly readable and understandable once you get past a few small hurdles:

Example 13-3. Equivalent code in Haskell

beta_1 :: [Double] -> [Double] -> Double
beta_1 xs ys = top / bottom
         where
           top = sum( mult_terms xs ys ) - dcount xs * average xs * average ys
           bottom = sum ( map square xs) - dcount xs * square ( average xs )

beta_0 :: [Double] -> [Double] -> Double
beta_0 xs ys = average ys - beta_1 xs ys * average xs

The above beta_1 and beta_0 functions are those we'd like to use from Haskell--they are clean and straightforward, each taking two lists of double-precision values ("[Double]" in the function signature) and returning a double-precision value as a result. Our sample program in this chapter will use static arrays of numbers in order to avoid the extra hassle of reading the numbers from a file with Python; that happens next chapter.