Welcome to svmloader’s documentation!

svmloader is a simplist but very fast python module (written in cython) to load sparse data written at libsvm format.

It is not functionnaly equivalent to sklearn.datasets.load_svmlight_file, and handle only the simplest cases.

The labels type supported are int and float (default int), and data can be parsed as numpy.float64 or numpy.float32 type (float64 by default). Multiple labels currently are not supported.

svmloader.load_svmfile(filename, dtype='d', ltype='l', nfeatures=None, zero_based=True)

Load a sparse matrix from filename at svmlib format.

Files in .gz or .bz2 format will be uncompressed on the fly.

Parameters:
  • filename (str) – the file name
  • dtype (str) – type of data, must be either ‘d’ (double) or ‘f’ (float)
  • ltype (str) – type of labels, must be either ‘l’ (int) or ‘d’ (double)
  • nfeatures (int) – the number of columns (infered from file if is None)
  • zero_based (bool) – indicates if columns indexes are zero-based or one-based
Returns:

(labels, sparse_matrix) tuple

Return type:

(numpy.ndarray, scipy.sparse.csr_matrix)

svmloader.load_svmfiles(filenames, dtype='d', ltype='l', zero_based=True)

Load a sparse matrix list from list of filenames at svmlib format.

Files in .gz or .bz2 format will be uncompressed on the fly.

The number of features will be infered from the maximum indice found on all files.

Parameters:
  • filenames (list) – the list of files names
  • dtype (str) – type of data, must be either ‘d’ (double) or ‘f’ (float)
  • ltype (str) – type of labels, must be either ‘l’ (int) or ‘d’ (double)
  • zero_based (bool) – indicates if columns indexes are zero-based or one-based
Returns:

a list [labels_0, matrix_0, .., labels_n, matrix_n]