Reusing HDFS storage by several Hadoop installations -
is possible reuse hdfs storage 2 or more hadoop installations? or saying in other words, replicate namenode state.
i want build small showcase hadoop cluster (3-5 nodes) , i'd able play around several hadoop distributions (hortonworks , cloudera @ least). have not decided yet, how have them installed simultaneously , seems challenge, i'd decide - possible reuse data stored in hdfs different clusters (physically using same hard disks)?
for simplicity, i'll happy if works combination of hadoop distros , i'm ready lose data @ point, because it's experiment.
update: want use hdfs exclusively 1 chosen hadoop installation @ time. let's 1 day use cloudera, other hortonworks, both use same data in hdfs.
the 1 caveat would need have these on separate machines since not able bind multiple namenodes same port 8020.
having said cloudera , horton works use same hadoop binaries , same configuration options if built yourself. difference in each of management consoles not come base opensource hadoop releases. suggestion configuring single hadoop group , userbase have access same hdfs namenodes / datanodes , jobtrackers, etc.. should able bind namenodes same hdfs file system. have setup each users ssh permissions well.
there limitations though such hdfs supporting exclusive writes only. when first client contacts name-node open file writing, name-node grants lease client create file. when second client tries open same file writing, name-node see lease file granted client, , reject open request second client.
i configure hdfs dirs accordingly in order preserve level of organization.
i did hadoop 0.23 , 2.2.0 in vmware / ubuntu.
lastly take here official hadoop wiki , faq.
good luck, pat
Comments
Post a Comment