好书推荐 好书速递 排行榜 读书文摘

Web Operations

Web Operations
作者:John Allspaw / Jesse Robbins
副标题:Keeping the Data On Time
出版社:O'Reilly Media
出版年:2010-06
ISBN:9781449377441
行业:其它
浏览数:72

内容简介

A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.

Learn the skills needed in web operations, and why they're gained through experience rather than schooling

Understand why it's important to gather metrics from both your application and infrastructure

Consider common approaches to database architectures and the pitfalls that come with increasing scale

Learn how to handle the human side of outages and degradations

Find out how one company avoided disaster after a huge traffic deluge

Discover what went wrong after a problem occurs, and how to prevent it from happening again

Contributors include:

John Allspaw

Heather Champ

Michael Christian

Richard Cook

Alistair Croll

Patrick Debois

Eric Florenzano

Paul Hammond

Justin Huff

Adam Jacob

Jacob Loomis

Matt Massie

Brian Moon

Anoop Nagwani

Sean Power

Eric Ries

Theo Schlossnagle

Baron SchwartzAndrew Shafer

......(更多)

作者简介

John Allspaw is currently Operations Engineering Manager at Flickr, the popular photo site. He has had extensive experience working with growing web sites since 1999. These include online news magazines (Salon.com, InfoWorld.com, Macworld.com) and social networking sites that experienced extreme growth (Friendster and Flickr). During his time at Friendster, traffic increased 5X. He was responsible for their transition from a couple dozen servers in a failing data center to over 400 machines across two data centers, and the complete redesign of the backing infrastructure. When he joined Flickr, they had 10 servers in a tiny data center in Vancouver; they are now located in multiple data centers across the US. Prior to his web experience, Allspaw worked in modeling and simulation as a mechanical engineer doing car crash simulations for the NHTSA.

Jesse Robbins is passionate about infrastructure, emergency management, and technology that helps people be safe, happy, and free. He serves as co-chair of the Velocity Performance & Operations Conference and is part of the O'Reilly Radar. Jesse currently advises companies in Seattle and San Francisco. He previously worked at Amazon.com where his title was "Master of Disaster" and where he was responsible for Website Availability. Jesse is a volunteer Firefighter/EMT & Emergency Manager, and led a task force deployed in Operation Hurricane Katrina.

......(更多)

目录

Chapter 1 Web Operations: The Career

Why Does Web Operations Have It Tough?

From Apprentice to Master

Conclusion

Chapter 2 How Picnik Uses Cloud Computing: Lessons Learned

Where the Cloud Fits (and Why!)

Where the Cloud Doesn't Fit (for Picnik)

Conclusion

Chapter 3 Infrastructure and Application Metrics

Time Resolution and Retention Concerns

Locality of Metrics Collection and Storage

Layers of Metrics

Providing Context for Anomaly Detection and Alerts

Log Lines Are Metrics, Too

Correlation with Change Management and Incident Timelines

Making Metrics Available to Your Alerting Mechanisms

Using Metrics to Guide Load-Feedback Mechanisms

A Metrics Collection System, Illustrated: Ganglia

Conclusion

Chapter 4 Continuous Deployment

Small Batches Mean Faster Feedback

Small Batches Mean Problems Are Instantly Localized

Small Batches Reduce Risk

Small Batches Reduce Overhead

The Quality Defenders' Lament

Getting Started

Continuous Deployment Is for Mission-Critical Applications

Conclusion

Chapter 5 Infrastructure As Code

Service-Oriented Architecture

Conclusion

Chapter 6 Monitoring

Story: "The Start of a Journey"

Step 1: Understand What You Are Monitoring

Step 2: Understand Normal Behavior

Step 3: Be Prepared and Learn

Conclusion

Chapter 7 How Complex Systems Fail

How Complex Systems Fail

Further Reading

Chapter 8 Community Management and Web Operations

Chapter 9 Dealing with Unexpected Traffic Spikes

How It All Started

Alarms Abound

Putting Out the Fire

Surviving the Weekend

Preparing for the Future

CDN to the Rescue

Proxy Servers

Corralling the Stampede

Streamlining the Codebase

How Do We Know It Works?

The Real Test

Lessons Learned

Improvements Since Then

Chapter 10 Dev and Ops Collaboration and Cooperation

Deployment

Shared, Open Infrastructure

Trust

On-call Developers

Avoiding Blame

Conclusion

Chapter 11 How Your Visitors Feel: User-Facing Metrics

Why Collect User-Facing Metrics?

What Makes a Site Slow?

Measuring Delay

Building an SLA

Visitor Outcomes: Analytics

Other Metrics Marketing Cares About

How User Experience Affects Web Ops

The Future of Web Monitoring

Conclusion

Chapter 12 Relational Database Strategy and Tactics for the Web

Requirements for Web Databases

How Typical Web Databases Grow

The Yearning for a Cluster

Database Strategy

Database Tactics

Conclusion

Chapter 13 How to Make Failure Beautiful: The Art and Science of Postmortems

The Worst Postmortem

What Is a Postmortem?

When to Conduct a Postmortem

Who to Invite to a Postmortem

Running a Postmortem

Postmortem Follow-Up

Conclusion

Chapter 14 Storage

Data Asset Inventory

Data Protection

Capacity Planning

Storage Sizing

Operations

Conclusion

Chapter 15 Nonrelational Databases

NoSQL Database Overview

Some Systems in Detail

Conclusion

Chapter 16 Agile Infrastructure

Agile Infrastructure

So, What's the Problem?

Communities of Interest and Practice

Trading Zones and Apologies

Conclusion

Chapter 17 Things That Go Bump in the Night (and How to Sleep Through Them)

Definitions

How Many 9s?

Impact Duration Versus Incident Duration

Datacenter Footprint

Gradual Failures

Trust Nobody

Failover Testing

Monitoring and History of Patterns

Getting a Good Night's Sleep

Appendix Contributors

Colophon

......(更多)

读书文摘

......(更多)

猜你喜欢

点击查看