hadoop

hadoop

 英

  • 網絡分布式計算;分布式計算平臺;分布式文件系統

例句

If I'm a developer using Hadoop and want to look at a bit of data, it will let me run some reports against the file system.

如果使用Hadoop開發者想要查看一些數據那么可以通過文件系統報表達成

We're trying to follow the path taken by the Hadoop project concentrating on robustness, scaling, correctness, and community-building first.

我們追隨Hadoop項目采取路線首先精力集中健壯性擴展正確性以及社區建立

As the hadoop-0. 20 is one of your primary interfaces to the Hadoop cluster, you'll see this utility used quite a bit through this article.

因為hadoop-0.20Hadoop集群主要接口之一看到本文多次使用這個實用程序

Now that I've coded my map and reduce implementations, all that's left to do is link everything up into a Hadoop Job.

現在已經mapreduce實現進行編碼接下來所有一切鏈接到一個HadoopJob

From this article, it's easy to see how Hadoop makes distributed computing simple for processing large datasets.

通過本文容易看出Hadoop顯著簡化處理大型數據集分布式計算

All that's needed is a representation of the data in a vector form that the Hadoop infrastructure can use.

所有一切需要就是矢量格式表達Hadoop基礎設施可以使用數據

Well, as you've probably guessed, Hadoop makes that easy to do.

當然已經猜到Hadoop可以輕松做到

But from the previous discussion, it's easy to see how Hadoop provides parallel processing of work.

但是通過前面討論容易看出Hadoop如何提供并行處理

Not to be outdone, commercial Hadoop pioneer Cloudera announced an HDFS partnership of its own yesterday.

商業Hadoop先驅Cloudera不甘示弱昨天發布自己HDFS合作伙伴計劃

A key part of the announcement was that Yahoo would make available a Hadoop enabled super computing data center named M45.

聲明關鍵Yahoo建立一個使用Hadoop超級計算數據中心M45。

Alas, there are several things that Hadoop does not do, at least when accessed through the MapReduce interface.

幾件事情Hadoop至少通過MapReduce訪問接口

Now you have set up the Hadoop Cluster on the cloud, and it's ready to run the MapReduce applications.

現在已經設置Hadoop集群運行MapReduce應用程序

Since we are going to be connecting to the hadoop file system, we might as well test that as well.

因為我們連接hadoop文件系統我們不妨測試

One particularly handy aspect of Hadoop is that it handles the raw parsing of an input file, so that you can deal with one line at a time.

Hadoop可以輸入文件進行原始解析一點特別有用這樣可以每次處理一行

For all the other settings, keep the defaults or choose the same values as you did for the Hadoop Master node.

對于所有其他設置保留默認或者選擇HadoopMaster節點相同

It is assumed that the Hadoop slave node has been configured a priori in such a manner that it registers with the Hadoop master node.

這里假設Hadoop節點已經之前配置完成也就是已經注冊Hadoop節點

Now that you have installed Hadoop and tested the basic interface to its file system, it's time to test Hadoop in a real application.

既然已經安裝Hadoop測試文件系統基本接口現在真實應用程序測試Hadoop

This article introduces you to the important configurable parameters of Hadoop and the method for analyzing and tuning performance.

本文介紹重要Hadoop配置參數以及分析調優性能方法

That magically seems to work, indicating that we can, indeed, connect to another machine and run the hadoop commands.

魔法似乎工作表明我們可以事實上連接另一臺機器運行hadoop命令

The Hadoop runtime will split up the data (log files) that needs to be processed and give each node in your cluster a chunk of data.

Hadoop運行時分割需要處理數據一些日志文件集群每個節點分配一個數據

data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

Avro[1]最近加入ApacheHadoop家族項目之一支持數據密集型應用定義一種數據格式多種編程語言支持這種格式

One irony of this code and the Hadoop framework is that the input files do not have to be in the same format.

一個諷刺代碼Hadoop框架輸入文件需要相同格式

Hadoop is really designed to run in a distributed manner where it handles the coordination of various nodes running map and reduce.

Hadoop設計旨在一種分布式方式運行處理運行mapreduce各個節點之間協調性

You can perform a couple of tests to ensure that Hadoop is up and running normally (at least the namenode).

可以通過幾個檢查確認Hadoop至少namenode已經啟動正常運行

Thanks to the cloud and Hadoop, it is now possible to handle large amounts of structured or unstructured data in a timely manner.

由于Hadoop出現及時處理大量結構結構數據目前成為可能

So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a Hadoop cluster.

所以過去周末一直從事一個業余愛好項目這個項目可以Hudson集群轉化Hadoop集群

Run the clustering algorithm of choice using one of the many Hadoop-ready driver programs available in Mahout.

使用Mahout可用Hadoop就緒驅動程序運行集群算法

The two core components are the Hadoop Distributed File System for storing data and Hadoop MapReduce for writing parallel-processing jobs.

其中兩個核心組件用于存儲數據HadoopDistributedFileSystemHadoop分布式文件系統用于寫入并行處理任務HadoopMapReduce

The company employs many of the core Hadoop contributors and intends to provide support and training.

公司雇傭了眾多Hadoop項目核心人員提供相應支持培訓

Open source software designed by IBM to help students develop programs for clusters running Hadoop.

IBM設計開源軟件幫助學生們運行Hadoop集群開發程序

You could just use the raw output from Hadoop (a name and value on each line, separated by a space).

可以只是使用來自Hadoop原始輸出一個名稱空格分隔)。

As a distributed framework, Hadoop enables many applications that benefit from parallelization of data processing.

作為分布式框架Hadoop許多應用程序能夠受益并行數據處理

Standalone Mode: By default, Hadoop is configured to run in a non-distributed standalone mode.

單獨模式默認情況Hadoop分布單獨模式運行

If not what is the plan in terms of moving it from an experimental technology to a core infrastructure component.

如果沒有什么計劃Hadoop一個實驗產品核心基礎組件遷移

developed Hadoop, permits AI systems to run data and algorithms across multiple servers simultaneously.

結合可以AI系統多個服務器同時運行數據算法

This flexibility can open new opportunities for Hadoop in a richer set of applications.

更加豐富應用程序集中靈活性可以Hadoop創造機會

feel this would be a big boost to both performance and utility, and it would leverage the power already provided by the Hadoop framework.

覺得一個巨大鼓舞作用表現用途影響作用力量已經提供Hadoop框架

Those log files can be huge, but the work will be split up among the machines (nodes) in your Hadoop cluster.

那些日志文件可能但是挖掘工作Hadoop集群多個機器節點之間分配

Instead, Hadoop can be viewed as a way to distribute both data and algorithms to hosts for faster parallel processing.

相反Hadoop可以視為一種可以同時數據算法分配主機獲得快速并行處理速度方法

The next article in this series will explore how to configure Hadoop in a multi-node cluster with additional examples. See you then!

系列篇文章通過更多示例討論如何節點集群配置Hadoop