Authorea

wendylyn edited introduction.tex over 10 years ago

Commit id: 0291285c9395a3a9bcd220ab5d7f37af71d18473

deletions | additions

\section{Introduction} “datacenter” refers to facilities used to house computer systems and associated components[1]. Many services which requires high performance computing or high storage volume today, for example, web search (Google, Bing), social networks (Facebook, Twitter), cloud computing platform (Amazon EMR and EC2) and cloud storage service (Amazon S3) are all supported by large-scale datacenters. Based on different usage, the number of nodes in a datacenter could range from several hundred to up to tens of thousand. \\Nodes Nodes in a datacenter is connected via routers and switches of multiple levels, as illustrated in Graph 1. Similar to all other network infrastructure, network issues like insufficient bandwidth, congestion, long latency all happen in datacenter network. However, because of the character of datacenter, there are some network issues in datacenter which cause more troubles than in other kind of network infrastructures. \\Incast: Incast: Incast is a many-to-one communication pattern commonly found in cloud datacenters.[2] When incast happens, multiple nodes respond to a single node simultaneously, and causes switch/router buffer overflow. When buffer overflow happens, standard TCP protocol tries to solve the problem by reducing the length of sliding window. However, it does not work well in datacenter because of the many-to-one pattern: when buffer overflow is detected, all the responding nodes reduce, and re-grow sliding window size simultaneously, which result in poor performance and doesn’t really solve the issue. \\Queue Queue buildup and Buffer pressure: Data flows in datacenters could be categorized into short flow and long flow. Short flow means flows consists of fewer packets, while long flows may consists of way more packets. When long flow is transmitted in datacenter, router/switch buffer could be filled up, which does not affect overall throughput, but add significant delay to responses for short flows. \\In In the user scenario of datacenter, short flows are usually more time-sensitive than long flows. Short flows are more likely to be generated by user interactive operations, for example, submitting a search query, pull an order list, etc. Long flow could be downloading a large file, or committing a disk backup. While users may not be irritated if their download tasks last 5 seconds longer, they may expect instant responses for their short flow requests. \\All of the issues talked above result in response delay and impair datacenter’s functionality. Statistics shows Amazon’s revenue is decreased by 1 percent for every 100ms latency, and Walmart users with 0-1 sec load time have 2x conversion rate of 1-2 sec. Hence a mechanism address such problems in datacenter is very essential.