Welcome to svmloader’s documentation!

svmloader is a very fast python module (written in cython) intended to load sparse data written at libsvm format.

It is not fully equivalent to sklearn.datasets.load_svmlight_file, in particular query_id are not supported and dtype is restricted.

The types of data and labels are distinguished. The labels types supported are int and float (default int), and data can be parsed as numpy.float64 or numpy.float32 type (float64 by default).

Compressed data in .gz or .bz2 format is supported as well.

API

svmloader.load_svmfile(filename, dtype='d', ltype='i', nfeatures=None, zero_based=True, multilabels=False)

Load a sparse CSR matrix from filename at svmlib format.

Files in .gz or .bz2 format will be uncompressed on the fly.

Parameters:
  • filename (str) – the file name
  • dtype (str) – type of data, must be either ‘d’ (double) or ‘f’ (float)
  • ltype (str) – type of labels, must be either ‘i’ (int) or ‘d’ (double)
  • nfeatures (int) – the number of columns (infered from file if is None)
  • zero_based (bool) – indicates if columns indexes are zero-based or one-based
  • multilabels (bool) – indicates if file uses multiple labels per row
Returns:

(labels, sparse_matrix) tuple

Return type:

(numpy.ndarray, scipy.sparse.csr_matrix)

svmloader.load_svmfiles(filenames, dtype='d', ltype='i', zero_based=True, multilabels=False)

Load a sparse CSR matrix list from list of filenames at svmlib format.

Files in .gz or .bz2 format will be uncompressed on the fly.

The number of features will be infered from the maximum indice found on all files.

Parameters:
  • filenames (list) – the list of files names
  • dtype (str) – type of data, must be either ‘d’ (double) or ‘f’ (float)
  • ltype (str) – type of labels, must be either ‘i’ (int) or ‘d’ (double)
  • zero_based (bool) – indicates if columns indexes are zero-based or one-based
  • multilabels (bool) – indicates if file uses multiple labels per row
Returns:

a list [labels_0, matrix_0, .., labels_n, matrix_n]