A word to the wise

This article just popped onto my radar: Google Wave as a Tool for Hacking. theharmonyguy, over at Social Hacking built a gadget for Google Wave that demonstrates the lack of security protections and sanitization in the Wave platform, allowing a user’s computer to be compromised simply by viewing an incoming wave. From the article:

All of these demonstrations about security and Google Wave point to four general weaknesses in Wave’s current structure:

  1. Allowing scripts and iframes in gadgets with no limits apart from sandboxing
  2. Lack of control over what content or users can be added to a wave
  3. No simple mechanism for verifying gadget sources or features
  4. Automatically loading gadgets when a wave is viewed

This is just another example of where the rush to release cool new technology gets in the way of even a cursory check to make sure you’re not presenting your users and their data or identities with undue risk.

Cloud failures and failures that could be fixed with clouds.

[Note: I use the words "could" and "may" carefully here. As shown below "cloud" and "cloud-like" can be highly divergent ideas.]

In this article I will provide an introduction to two failure models to large-scale systems, one classic and one new. Through examples I will show that server stability through “cloudification” has not yet been achieved. This is illustrated through IBM failing to cloudify the critical infrastructure of Air New Zealand, and Amazon’s cloud-like solution failing in a manner that geographically dispersed clouds should not. The examples each fit one of two system failure types. The first type is large-scale computer system failures, which could have been avoided or mitigated had the system designers used a cloud architecture or cloud principals. The second type is software and system deployments that have failed, despite being implemented in a cloud-like environment, to live up to the promises made around the technology. Both of these issues are important to fully understand as a great number of people are either completely in the dark as to what clouds are and what benefits they provide, or believe putting their applications “on the cloud” is a silver bullet which will magically slay their uptime, scaling, and security demons.

Put it on the cloud

The second type of failure is exemplified by The Register with their report on the IBM mainframe crash, which took down Air New Zealand‘s check-in desks, online bookings and call centers. In this case, the issue was definitely not a cloud problem. There was no cloud involved in any way on the systems side. According to the original report from Australia’s The Age, Air New Zealand outsourced management of their mainframe and mid-range systems to IBM, who then dropped the ball and the crash occurred. Theoretically, had those systems been deployed in a private cloud which embodied the core values of geographical dispersion, multi-network homing, and self-healing of resources, this issue could have been avoided or, minimally, greatly reduced. This would allow for multiple system failures at the software and hardware level to occur, causing a service slow down or even a loss of real-time access to lower priority services, but the system as a whole would remain online and able to serve customers.

If the cloud doesn’t hold, service evaporates

Spinning this back around to see what happens when a service is deployed on a cloud-like service and fails despite the promise, we have the Bitbucket DDoS attack that occurred early this month. The most detailed account of the affairs comes from the blog of Jesper Nøhr, the developer who runs the code repository hosting service. To summarize, the attack was an extended, multi-phase bot-net attack that was launched against Bitbucket’s website as part of a dispute between warring factions of developers surrounding a project source repository hosted there. Most news coverage to come out of this focused heavily on Amazon’s incredible lack of immediate reaction to a customer under attack, ignoring and downplaying the analysis and recommendations from Jesper’s network administrators, yielding a full 19 hours of downtime before initial service was resumed. This is alarming from a customer service point of view, but not the most concerning element. What seems to have been largely ignored by the news reports was just how stunning it is that a seemingly large scale, established, distributed “cloud” service could be taken offline so easily. And related but second to that is why it was so difficult for Amazon’s technicians to trace the issue back to its source and implement a fix. While no one can truly say exactly why the DDoS attack was so successful, as Amazon’s cloud service is a black-box, a service implemented according to true cloud fundamentals would be able to withstand the beating seen by Amazon without the complete shutdown of site and service availability that Bitbucket experienced.

Denial of Service, the cloud way

Now, while the Bitbucket attack was definitely a traditional DDoS, this is a perfect place to mention a variation on the attack which has been dubbed “EDoS,” or Economic Denial of Service. The twist comes from the subtle difference in goals. In a DDoS, the intent is, through brute force measures, to completely wipe a service or presence off the Internet, making it wholly unavailable to users for the duration and fallout periods of the attack. In EDoS, the intent is to slowly suffocate a service to death without the service provider realizing the problem until it’s too late. Chris Hoff (@beaker) wrote a great intro article on the topic in which he says:

EDoS attacks are death by 1000 cuts. EDoS can also utilize distributed $evil_doers as well as single entities, but works by making legitimate web requests at volumes that may appear to be “normal” but are done so to drive compute, network and storage utility billings in a cloud model abnormally high. Example: a botnet is activated to visit a website whose income results from ecommerce purchases. The requests are all legitimate but the purchases never made. The vendor has to pay the cloud provider for increased elastic use of resources where revenue was never recognized to offset them.

Full circle

So, what should be made of all of this? Cloud deployments and systems, if properly setup and maintained, can help in a number of cases to maintain and improve uptime and availability while mitigating or sidestepping losses and outage from [D]DoS attacks and system failure. However, as with most technologies, “Cloud” is not the panacea for all service woes. It requires diligence from both the service provider and application deployer to fulfill on its promise. I will be going into depth on these and other issues in future posts. Stay tuned.

The perils of a single external provider

The Register just posted a piece on the Bitbucket outages over the weekend with some bits from Jesper Nøhr who runs the site. A series of very large DDoS attacks were pointed squarely at Bitbucket’s storage hosted with Amazon’s cloud data store EBS and were successful in taking it down for an extended period. The attacks suggest that Amazon’s black-box innards may not be built as well as they could have, especially as it took so long to pinpoint the root cause and stop the first attack.

At the time of this posting, Bitbucket is still down. Earlier with a bad gateway straight from the load balancer and more recently with a page that explains the latest outage:

Sorry for this downtime, again, again

In what seems to be wave four, we’re trying to get the service back in action.
We’re aware of the problem, and we’re working as fast as we can to remedy the issue.
Initial analysis indicates that this is most likely fallout from the first 3 attacks, and some recovery needs to be done.

Feel free to have a look at madssj or jespern on twitter.

For extended reading, here is the blog post by Jesper with a blow-by-blow of the attacks, service level by Amazon, initial resolution, second wave, and so forth.

From the catch-all OS to the service container

James Urquhart recently posted another installment in his “Cloud computing and the big rethink” series, which begins to outline an idea I’ve been working out for a while now. The general premise is that as virtualization and clouds become more available in the data center and as services, vendors will begin trimming down the underlying OS to only require exactly what’s needed for that service. Away will go the large OS which is a catch-all for any and every situation and in will come stripped kernels and micro-installs.

I like where James is going and it’s fantastic to see this idea starting to spread.

First of the $1.4B in carbon sequestration projects

Secretary Chu has announced the first awardees of the Industrial Carbon Capture and Storage Projects grant. Twelve awardees whose DOE share totals $21.6M have projects ranging from demonstration to live practice, all aiming at removing, lowering or counteracting CO2 emissions. The awardees are:

  • Air Products and Chemicals Inc. (Allentown, Pa.)
  • Archer Daniels Midland Corporation (Decatur, Ill.)
  • Battelle Memorial Institute, Pacific Northwest Division (Richland, Wash.)
  • C6 Resources (Salno, California)
  • CEMEX Inc. (Houston, Texas)
  • ConocoPhillips (Houston, Texas)
  • Leucadia Energy LLC (New York, N.Y.) – Two projects
  • Praxair Inc. (Danbury, Conn.)
  • Shell Chemical Capital Company (Houston, Texas)
  • University of Utah (Salt Lake City, Utah)
  • Wolverine Power Supply Cooperative Inc. (Cadillac, Mich.)

“This is a major step forward in the fight to reduce carbon dioxide emissions from industrial plants. These new technologies will not only help fight climate change, they will create jobs now and help position the United States to lead the world in carbon dioxide capture technologies, which will only increase in demand in the years ahead,” said Secretary Chu.

Singapore goes for cyber security agency

In light of recent attacks across the globe by cyber ne-er-do-wells, the Singaporean government has created a cyber-security agency to bring the private sector into the loop, hoping to “improve Singapore’s response against cyber-attacks through holding regular simulations that test the country’s ability to respond and recover from online attacks.”

Yet another credit card breach…

Network Solutions Suffers Large Data Breach

Why don’t these guys get it? You can’t just slap on half baked security over your most important data and whistle while looking the other way. You have to be constantly vigilant updating your measures. If you stay still you become a sitting duck for all the people that want your data. From the article:

Incidents such as Network Solutions just go to show that whenever you accept and hold credit card information, you’re paint a huge target on your network. It then becomes incumbent upon the organization to measure its threat level and practice appropriate risk management practices to reduce the probability of a data breach.

Yes.

Physical in home power management devices

There is an article over at SmartGridNews.com talking about Control4′s new in-home energy management device. Overall it looks pretty slick and has all the bells and whistles one would expect. But, ultimately, why should the consumer have to have yet another screen in their home when they already have multiple that would do the job just fine?

Most obviously, homes that would pick up one of these management devices also have computers and a steady Internet connection. The only thing in Control4′s offering that a computer can’t do today with the right software is connect to the bundled thermostat. Everything else should be accessible through the power provider and their smart meter. After the computer monitor, many households have televisions and game consoles. A network connected console with third-party software written for it can also perform the same tasks.

The advantage of both of these alternatives is that the software can easily be designed to fit with a usage paradigm the user is already familiar with on that device.

Broadband over Powerlines

An interesting article over at IEEE Spectrum talking about BPL (Broadband over Powerline). The interesting point here is how the data is only delivered to the consumer via BPL. From the article:

According to Daniel Sangines, a communications engineer who until recently worked on SmartGridCity, the data flows along the power lines for about a kilometer before it’s siphoned off the line and into an optical fiber or cellular-based backhaul system. Attempting greater BPL distances would require multiple repeaters to deal with signal attenuation, reducing the bandwidth unacceptably, he explains.

Not a bad compromise: install a fat pipe for all the combined data, then use existing infrastructure to spider to the home.

Geothermal drilling can cause quakes

This one is a bit old, but it’s worth reposting: New York Times – Deep in Bedrock, Clean Energy and Quake Fears

From the article:

All seemed to be going well — until Dec. 8, 2006, when the project set off an earthquake, shaking and damaging buildings and terrifying many in a city that, as every schoolchild here learns, had been devastated exactly 650 years before by a quake that sent two steeples of the Münster Cathedral tumbling into the Rhine.

Hastily shut down, Mr. Häring’s project was soon forgotten by nearly everyone outside Switzerland. As early as this week, though, an American start-up company, AltaRock Energy, will begin using nearly the same method to drill deep into ground laced with fault lines in an area two hours’ drive north of San Francisco.