Skip to content

libsvm File Format

This file format is used by rumale.

We're choosing the "Adj Close" column as the one that we want to predict.

The libsvm file format is simple. All values are numberic.

The first entry on a line is the thing that we want to predict. In this case it is the adjusted closing price. This is followed by a space.

What follows is a series of data pairs seperated by spaces in the form:

  • index:value

where index is the column number and value is the value for that item.

require 'csv'

# Read CSV file
data = CSV.read('input.csv', headers: true)

# Open output file
output_file = File.open('output.txt', 'w')

# Convert data into libsvm format and write to output file
data.each do |row|
  # Get the label (the "close" value)
  label = row['Adj Close']

  # Start building the libsvm formatted line
  libsvm_line = "#{label} "

  # Add feature indices and values
  row.each_with_index do |(column, value), index|
    next if column == 'Date' || column == 'Adj Close' # Skip irrelevant columns
    libsvm_line += "#{index}:#{value} "
  end

  # Write the libsvm formatted line to the output file
  output_file.puts(libsvm_line)
end

# Close files
output_file.close