Authorea

wendylyn edited introduction.tex over 10 years ago

Commit id: 84c0ff21204af840fb4a04a59c7eda62f92112a2

deletions | additions

Incast: Incast is a many-to-one communication pattern commonly found in cloud datacenters.[2] When incast happens, multiple nodes respond to a single node simultaneously, and causes switch/router buffer overflow. When buffer overflow happens, standard TCP protocol tries to solve the problem by reducing the length of sliding window. However, it does not work well in datacenter because of the many-to-one pattern: when buffer overflow is detected, all the responding nodes reduce, and re-grow sliding window size simultaneously, which result in poor performance and doesn’t really solve the issue. Queue buildup and Buffer pressure: Data flows in datacenters could be categorized into short flow and long flow. Short flow means flows consists of fewer packets, while long flows may consists of way more packets. When long flow is transmitted in datacenter, router/switch buffer could be filled up, which does not affect overall throughput, but add significant delay to responses for short flows. In the user scenario of datacenter, short flows are usually more time-sensitive than long flows. Short flows are more likely to be generated by user interactive operations, for example, submitting a search query, pull an order list, etc. Long flow could be downloading a large file, or committing a disk backup. While users may not be irritated if their download tasks last 5 seconds longer, they may expect instant responses for their short flow requests. All of the issues talked above result in response delay and impair datacenter’s functionality. Statistics shows Amazon’s revenue is decreased by 1 percent for every 100ms latency, and Walmart users with 0-1 sec load time have 2x conversion rate of 1-2 sec. Hence a mechanism address such problems in datacenter is very essential.