Processing images using hadoop -
i'm new hadoop , i'm going develop application process multiple images using hadoop , show users results live, while computation in progress. basic approach distribute executable , bunch of images , gather results.
can results interactively while computing process in progress?
are there other alternatives hadoop streaming, such use case?
how can feed executable images? can't find examples other feeding stdin.
for processing images on hadoop best way organize computations be:
- store images in sequence file. key - image name or id, value - image binary data. way have single file images need process. if have images added dynamically system, consider aggregating them in daily sequence files. don't think should use compression sequence file general compression algorithms not work images
- process images. here have number of options choose. first use hadoop mapreduce , write program in java java able read sequence file , directly obtain "value" on each map step, "value" binary file data. given this, can run processing logic. second option hadoop streaming. has limitation data goes stdin of application , result read stdout. can overcome writing own inputformat in java serialize image binary data sequence file base64 string , pass generic application. third option use spark process data, again limited in programming languages choise: scala, java or python.
- hadoop developed simplify batch processing on large amounts of data. spark essentialy similar - batch tool. means cannot result before data processed. spark streaming bit different case - there work micro batches of 1-10 seconds , process each of them separately, in general can make work case.
i don't know complete case of yours, 1 possible solution use kafka + spark streaming. application should put images in binary format kafka queue while spark consume , process them in micro batches on cluster, updating users through third component (at least putting image processing status kafka application process it)
but in general, information provided not complete recommend architecture specific case
Comments
Post a Comment