Reusing HDFS storage by several Hadoop installations -


is possible reuse hdfs storage 2 or more hadoop installations? or saying in other words, replicate namenode state.

i want build small showcase hadoop cluster (3-5 nodes) , i'd able play around several hadoop distributions (hortonworks , cloudera @ least). have not decided yet, how have them installed simultaneously , seems challenge, i'd decide - possible reuse data stored in hdfs different clusters (physically using same hard disks)?

for simplicity, i'll happy if works combination of hadoop distros , i'm ready lose data @ point, because it's experiment.

update: want use hdfs exclusively 1 chosen hadoop installation @ time. let's 1 day use cloudera, other hortonworks, both use same data in hdfs.

the 1 caveat would need have these on separate machines since not able bind multiple namenodes same port 8020.

having said cloudera , horton works use same hadoop binaries , same configuration options if built yourself. difference in each of management consoles not come base opensource hadoop releases. suggestion configuring single hadoop group , userbase have access same hdfs namenodes / datanodes , jobtrackers, etc.. should able bind namenodes same hdfs file system. have setup each users ssh permissions well.

there limitations though such hdfs supporting exclusive writes only. when first client contacts name-node open file writing, name-node grants lease client create file. when second client tries open same file writing, name-node see lease file granted client, , reject open request second client.

i configure hdfs dirs accordingly in order preserve level of organization.

i did hadoop 0.23 , 2.2.0 in vmware / ubuntu.

lastly take here official hadoop wiki , faq.

good luck, pat


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -