python - Why is cffi so much quicker than numpy? -


i have been playing around writing cffi modules in python, , speed making me wonder if i'm using standard python correctly. it's making me want switch c completely! truthfully there great python libraries never reimplement myself in c more hypothetical really.

this example shows sum function in python being used numpy array, , how slow in comparison c function. there quicker pythonic way of computing sum of numpy array?

def cast_matrix(matrix, ffi):     ap = ffi.new("double* [%d]" % (matrix.shape[0]))     ptr = ffi.cast("double *", matrix.ctypes.data)     in range(matrix.shape[0]):         ap[i] = ptr + i*matrix.shape[1]                                                                     return ap   ffi = ffi() ffi.cdef(""" double sum(double**, int, int); """) c = ffi.verify(""" double sum(double** matrix,int x, int y){     int i, j;      double sum = 0.0;     (i=0; i<x; i++){         (j=0; j<y; j++){             sum = sum + matrix[i][j];         }     }     return(sum); } """) m = np.ones(shape=(10,10)) print 'numpy says', m.sum()  m_p = cast_matrix(m, ffi)  sm = c.sum(m_p, m.shape[0], m.shape[1]) print 'cffi says', sm 

just show function works:

numpy says 100.0 cffi says 100.0 

now if time simple function find numpy slow! using numpy in correct way? there faster way calculate sum in python?

import time n = 1000000  t0 = time.time() in range(n): c.sum(m_p, m.shape[0], m.shape[1]) t1 = time.time()  print 'cffi', t1-t0  t0 = time.time() in range(n): m.sum() t1 = time.time()  print 'numpy', t1-t0 

times:

cffi 0.818415880203 numpy 5.61657714844 

numpy slower c 2 reasons: python overhead (probably similar cffi) , generality. numpy designed deal arrays of arbitrary dimensions, in bunch of different data types. example cffi made 2d array of floats. cost writing several lines of code vs .sum(), 6 characters save less 5 microseconds. (but of course, knew this). want emphasize cpu time cheap, cheaper developer time.

now, if want stick numpy, , want better performance, best option use bottleneck. provide few functions optimised 1 , 2d arrays of float , doubles, , blazing fast. in case, 16 times faster, put execution time in 0.35, or twice fast cffi.

for other functions bottleneck not have, can use cython. helps write c code more pythonic syntax. or, if will, convert progressively python c until happy speed.


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -