hadoop

美　英

網絡分布式計算；分布式計算平臺；分布式文件系統

例句

If I'm a developer using ~~Hadoop~~ and want to look at a bit of data, it will let me run some reports against the file system.

如果我是個使用Hadoop的開發者，想要查看一些數據，那么就可以通過文件系統報表達成所愿。

We're trying to follow the path taken by the ~~Hadoop~~ project concentrating on robustness, scaling, correctness, and community-building first.

我們將追隨Hadoop項目所采取的路線，首先把精力集中在健壯性、擴展性、正確性以及社區建立上。

As the hadoop-0. 20 is one of your primary interfaces to the ~~Hadoop~~ cluster, you'll see this utility used quite a bit through this article.

因為hadoop-0.20是Hadoop集群的主要接口之一，您會看到本文中多次使用這個實用程序。

Now that I've coded my map and reduce implementations, all that's left to do is link everything up into a ~~Hadoop~~ Job.

現在我已經對我的map和reduce實現進行了編碼，接下來所要做的是將所有這一切鏈接到一個HadoopJob。

From this article, it's easy to see how ~~Hadoop~~ makes distributed computing simple for processing large datasets.

通過本文很容易看出Hadoop顯著簡化了處理大型數據集的分布式計算。

All that's needed is a representation of the data in a vector form that the ~~Hadoop~~ infrastructure can use.

所有這一切的需要就是用矢量格式表達Hadoop基礎設施可以使用的數據。

Well, as you've probably guessed, ~~Hadoop~~ makes that easy to do.

當然，您已經猜到了，Hadoop可以輕松地做到。

But from the previous discussion, it's easy to see how ~~Hadoop~~ provides parallel processing of work.

但是，通過前面的討論很容易看出Hadoop如何提供并行處理。

Not to be outdone, commercial ~~Hadoop~~ pioneer Cloudera announced an HDFS partnership of its own yesterday.

商業Hadoop的先驅Cloudera也不甘示弱，于昨天發布了自己的HDFS合作伙伴計劃。

A key part of the announcement was that Yahoo would make available a ~~Hadoop~~ enabled super computing data center named M45.

該聲明的關鍵是Yahoo將建立一個使用Hadoop的超級計算數據中心，名為M45。

Alas, there are several things that ~~Hadoop~~ does not do, at least when accessed through the MapReduce interface.

唉，有幾件事情Hadoop也不做，至少在通過MapReduce訪問接口。

Now you have set up the ~~Hadoop~~ Cluster on the cloud, and it's ready to run the MapReduce applications.

現在，已經在云中設置了Hadoop集群，該運行MapReduce應用程序了。

Since we are going to be connecting to the ~~hadoop~~ file system, we might as well test that as well.

因為我們要連接到hadoop文件系統，我們不妨測試。

One particularly handy aspect of ~~Hadoop~~ is that it handles the raw parsing of an input file, so that you can deal with one line at a time.

Hadoop可以對輸入文件進行原始解析，這一點特別有用，這樣您就可以每次處理一行。

For all the other settings, keep the defaults or choose the same values as you did for the ~~Hadoop~~ Master node.

對于所有其他設置，保留其默認值或者選擇與HadoopMaster節點相同的值。

It is assumed that the ~~Hadoop~~ slave node has been configured a priori in such a manner that it registers with the ~~Hadoop~~ master node.

這里假設Hadoop從節點已經在之前配置完成，也就是它已經注冊到Hadoop主節點中。

Now that you have installed ~~Hadoop~~ and tested the basic interface to its file system, it's time to test ~~Hadoop~~ in a real application.

既然已經安裝了Hadoop并測試了文件系統的基本接口，現在就該在真實的應用程序中測試Hadoop了。

This article introduces you to the important configurable parameters of ~~Hadoop~~ and the method for analyzing and tuning performance.

本文介紹重要的Hadoop可配置參數以及分析和調優性能的方法。

That magically seems to work, indicating that we can, indeed, connect to another machine and run the ~~hadoop~~ commands.

魔法般的似乎工作，表明我們可以，事實上，連接到另一臺機器上，運行hadoop命令。

The ~~Hadoop~~ runtime will split up the data (log files) that needs to be processed and give each node in your cluster a chunk of data.

Hadoop運行時將分割需要處理的數據（一些日志文件）并向您的集群中的每個節點分配一個數據塊。

data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

Avro[1]是最近加入到Apache的Hadoop家族的項目之一。為支持數據密集型應用，它定義了一種數據格式并在多種編程語言中支持這種格式。

One irony of this code and the ~~Hadoop~~ framework is that the input files do not have to be in the same format.

一個諷刺，這段代碼和Hadoop框架是輸入文件不需要在相同的格式。

~~Hadoop~~ is really designed to run in a distributed manner where it handles the coordination of various nodes running map and reduce.

Hadoop的設計旨在以一種分布式方式運行，處理運行map和reduce的各個節點之間的協調性。

You can perform a couple of tests to ensure that ~~Hadoop~~ is up and running normally (at least the namenode).

可以通過幾個檢查確認Hadoop（至少是namenode）已經啟動并正常運行。

Thanks to the cloud and ~~Hadoop~~, it is now possible to handle large amounts of structured or unstructured data in a timely manner.

由于云和Hadoop的出現，及時處理大量的結構化或非結構化數據目前已成為可能。

So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a ~~Hadoop~~ cluster.

所以在過去的兩個周末里，我一直在從事一個業余愛好項目，這個項目可以把Hudson集群轉化成Hadoop集群。

Run the clustering algorithm of choice using one of the many ~~Hadoop~~-ready driver programs available in Mahout.

使用Mahout中可用的Hadoop就緒的驅動程序運行所選集群算法。

The two core components are the ~~Hadoop~~ Distributed File System for storing data and ~~Hadoop~~ MapReduce for writing parallel-processing jobs.

其中兩個核心組件是用于存儲數據的HadoopDistributedFileSystem（Hadoop分布式文件系統）和用于寫入并行處理任務的HadoopMapReduce。

The company employs many of the core ~~Hadoop~~ contributors and intends to provide support and training.

該公司雇傭了眾多Hadoop項目的核心人員欲以提供相應的支持和培訓。

Open source software designed by IBM to help students develop programs for clusters running ~~Hadoop~~.

IBM設計了開源軟件去幫助學生們為運行Hadoop的集群開發程序。

You could just use the raw output from ~~Hadoop~~ (a name and value on each line, separated by a space).

您可以只是使用來自Hadoop的原始輸出（每行上有一個名稱和值，用空格分隔）。

As a distributed framework, ~~Hadoop~~ enables many applications that benefit from parallelization of data processing.

作為分布式框架，Hadoop讓許多應用程序能夠受益于并行數據處理。

Standalone Mode: By default, ~~Hadoop~~ is configured to run in a non-distributed standalone mode.

單獨模式：在默認情況下，Hadoop以非分布的單獨模式運行。

If not what is the plan in terms of moving it from an experimental technology to a core infrastructure component.

如果還沒有，有什么計劃讓Hadoop從一個實驗性的產品向核心基礎組件遷移？

developed ~~Hadoop~~, permits AI systems to run data and algorithms across multiple servers simultaneously.

的結合，可以讓AI系統在多個服務器上同時的運行數據和算法。

This flexibility can open new opportunities for ~~Hadoop~~ in a richer set of applications.

在更加豐富的應用程序集中此靈活性可以為Hadoop創造新的機會。

feel this would be a big boost to both performance and utility, and it would leverage the power already provided by the ~~Hadoop~~ framework.

我覺得這將是一個巨大的鼓舞作用及表現的用途上，而它將影響作用的力量已經提供Hadoop框架。

Those log files can be huge, but the work will be split up among the machines (nodes) in your ~~Hadoop~~ cluster.

那些日志文件可能很大，但是挖掘工作將在您的Hadoop集群中的多個機器（節點）之間分配。

Instead, ~~Hadoop~~ can be viewed as a way to distribute both data and algorithms to hosts for faster parallel processing.

相反地，Hadoop可以被視為一種可以同時將數據和算法分配到主機以獲得更快速的并行處理速度的方法。

The next article in this series will explore how to configure ~~Hadoop~~ in a multi-node cluster with additional examples. See you then!

本系列中的下一篇文章通過更多示例討論如何在多節點集群中配置Hadoop。

熱門查詢