python - Why is cffi so much quicker than numpy? -
i have been playing around writing cffi modules in python, , speed making me wonder if i'm using standard python correctly. it's making me want switch c completely! truthfully there great python libraries never reimplement myself in c more hypothetical really.
this example shows sum function in python being used numpy array, , how slow in comparison c function. there quicker pythonic way of computing sum of numpy array?
def cast_matrix(matrix, ffi): ap = ffi.new("double* [%d]" % (matrix.shape[0])) ptr = ffi.cast("double *", matrix.ctypes.data) in range(matrix.shape[0]): ap[i] = ptr + i*matrix.shape[1] return ap ffi = ffi() ffi.cdef(""" double sum(double**, int, int); """) c = ffi.verify(""" double sum(double** matrix,int x, int y){ int i, j; double sum = 0.0; (i=0; i<x; i++){ (j=0; j<y; j++){ sum = sum + matrix[i][j]; } } return(sum); } """) m = np.ones(shape=(10,10)) print 'numpy says', m.sum() m_p = cast_matrix(m, ffi) sm = c.sum(m_p, m.shape[0], m.shape[1]) print 'cffi says', sm
just show function works:
numpy says 100.0 cffi says 100.0
now if time simple function find numpy slow! using numpy in correct way? there faster way calculate sum in python?
import time n = 1000000 t0 = time.time() in range(n): c.sum(m_p, m.shape[0], m.shape[1]) t1 = time.time() print 'cffi', t1-t0 t0 = time.time() in range(n): m.sum() t1 = time.time() print 'numpy', t1-t0
times:
cffi 0.818415880203 numpy 5.61657714844
numpy slower c 2 reasons: python overhead (probably similar cffi) , generality. numpy designed deal arrays of arbitrary dimensions, in bunch of different data types. example cffi made 2d array of floats. cost writing several lines of code vs .sum()
, 6 characters save less 5 microseconds. (but of course, knew this). want emphasize cpu time cheap, cheaper developer time.
now, if want stick numpy, , want better performance, best option use bottleneck. provide few functions optimised 1 , 2d arrays of float , doubles, , blazing fast. in case, 16 times faster, put execution time in 0.35, or twice fast cffi.
for other functions bottleneck not have, can use cython. helps write c code more pythonic syntax. or, if will, convert progressively python c until happy speed.
Comments
Post a Comment