Web Operations : Keeping the Data On Time 🔍
edited by John Allspaw and Jesse Robbins
O'Reilly Media, Incorporated, O'Reilly Media, Beijing, 2010
英语 [en] · PDF · 13.0MB · 2010 · 📘 非小说类图书 · 🚀/lgli/upload/zlib · Save
描述
A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.
Learn the skills needed in web operations, and why they're gained through experience rather than schooling
Understand why it's important to gather metrics from both your application and infrastructure
Consider common approaches to database architectures and the pitfalls that come with increasing scale
Learn how to handle the human side of outages and degradations
Find out how one company avoided disaster after a huge traffic deluge
Discover what went wrong after a problem occurs, and how to prevent it from happening again
Contributors include:
John Allspaw
Heather Champ
Michael Christian
Richard Cook
Alistair Croll
Patrick Debois
Eric Florenzano
Paul Hammond
Justin Huff
Adam Jacob
Jacob Loomis
Matt Massie
Brian Moon
Anoop Nagwani
Sean Power
Eric Ries
Theo Schlossnagle
Baron Schwartz
Andrew Shafer
About the Author John Allspaw is currently Operations Engineering Manager at Flickr, the popular photo site. He has had extensive experience working with growing web sites since 1999. These include online news magazines Salon.com, InfoWorld.com, Macworld.com and social networking sites that experienced extreme growth (Friendster and Flickr). During his time at Friendster, traffic increased 5X. He was responsible for their transition from a couple dozen servers in a failing data center to over 400 machines across two data centers, and the complete redesign of the backing infrastructure. When he joined Flickr, they had 10 servers in a tiny data center in Vancouver; they are now located in multiple data centers across the US. Prior to his web experience, Allspaw worked in modeling and simulation as a mechanical engineer doing car crash simulations for the NHTSA.
Jesse Robbins (@jesserobbins) is CEO of Opscode (makers of Chef) and a recognized expert in Infrastructure, Web Operations, and Emergency Management.
He serves as co-chair of the Velocity Web Performance & Operations Conference and contributes to the O'Reilly Radar . Prior to co-founding Opscode, he worked at Amazon.com with a title of "Master of Disaster" where he was responsible for Website Availability for every property bearing the Amazon brand.
Robbins is a volunteer Firefighter/EMT and Emergency Manager, and led a task force deployed in Operation Hurricane Katrina. His experiences in the fire service profoundly influence his efforts in technology, and he strives to distill his knowledge from these two worlds and apply it in service of both.
Foreword 13
Preface 15
1 Web Operations: The Career 21
Theo Schlossnagle 21
Why Does Web Operations Have It Tough? 22
From Apprentice to Master 24
Conclusion 29
2 How Picnik Uses Cloud Computing: Lessons Learned 31
Justin Huff 31
Where the Cloud Fits (and Why!) 32
Where the Cloud Doesn’t Fit (for Picnik) 40
Conclusion 40
3 Infrastructure and Application Metrics 41
John Allspaw, with Matt Massie 41
Time Resolution and Retention Concerns 42
Locality of Metrics Collection and Storage 43
Layers of Metrics 44
Providing Context for Anomaly Detection and Alerts 47
Log Lines Are Metrics, Too 48
Correlation with Change Management and Incident Timelines 50
Making Metrics Available to Your Alerting Mechanisms 51
Using Metrics to Guide Load-Feedback Mechanisms 52
A Metrics Collection System, Illustrated: Ganglia 56
Conclusion 67
4 Continuous Deployment 69
Eric Ries 69
Small Batches Mean Faster Feedback 69
Small Batches Mean Problems Are Instantly Localized 70
Small Batches Reduce Risk 70
Small Batches Reduce Overhead 71
The Quality Defenders’ Lament 72
Getting Started 76
Continuous Deployment Is for Mission-Critical Applications 80
Conclusion 83
5 Infrastructure As Code 85
Adam Jacob 85
Service-Oriented Architecture 87
Conclusion 99
6 Monitoring 101
Patrick Debois 101
Story: “The Start of a Journey” 101
Step 1: Understand What You Are Monitoring 105
Step 2: Understand Normal Behavior 115
Step 3: Be Prepared and Learn 122
Conclusion 126
7 How Complex
Systems Fail 127
John Allspaw and Richard Cook 127
How Complex Systems Fail 128
Further Reading 136
8 Community Management and Web Operations 137
Heather Champ and John Allspaw 137
9 Dealing with Unexpected Traffic Spikes 147
Brian Moon 147
How It All Started 147
Alarms Abound 148
Putting Out the Fire 149
Surviving the Weekend 150
Preparing for the Future 151
CDN to the Rescue 151
Proxy Servers 152
Corralling the Stampede 152
Streamlining the Codebase 153
How Do We Know It Works? 154
The Real Test 155
Lessons Learned 156
Improvements Since Then 156
10 Dev and Ops Collaboration and Cooperation 159
Paul Hammond 159
Deployment 160
Shared, Open Infrastructure 164
Trust 166
On-call Developers 168
Avoiding Blame 173
Conclusion 175
11 How Your Visitors Feel: User-Facing Metrics 177
Alistair Croll and Sean Power 177
Why Collect User-Facing Metrics? 179
What Makes a Site Slow? 183
Measuring Delay 185
Building an SLA 191
Visitor Outcomes: Analytics 193
Other Metrics Marketing Cares About 198
How User Experience Affects Web Ops 199
The Future of Web Monitoring 200
Conclusion 205
12 Relational Database Strategy and Tactics for the Web 207
Baron Schwartz 207
Requirements for Web Databases 208
How Typical Web Databases Grow 213
The Yearning for a Cluster 220
Database Strategy 225
Database Tactics 232
Conclusion 238
13 How to Make Failure Beautiful: The Art and
Science of Postmortems 239
Jake Loomis 239
The Worst Postmortem 240
What Is a Postmortem? 241
When to Conduct a Postmortem 242
Who to Invite to a Postmortem 243
Running a Postmortem 243
Postmortem Follow-Up 244
Conclusion 246
14 Storage 247
Anoop Nagwani 247
Data Asset Inventory 247
Data Protection 251
Capacity Planning 260
Storage Sizing 262
Operations 264
Conclusion 265
15 Nonrelational Databases 267
Eric Florenzano 267
NoSQL Database Overview 268
Some Systems in Detail 272
Conclusion 281
16 Agile Infrastructure 283
Andrew Clay Shafer 283
Agile Infrastructure 285
So, What’s the Problem? 289
Communities of Interest and Practice 299
Trading Zones and Apologies 299
Conclusion 302
17 Things That Go Bump in the Night (and How to Sleep Through Them) 305
Mike Christian 305
Definitions 307
How Many 9s? 308
Impact Duration Versus Incident Duration 309
Datacenter Footprint 310
Gradual Failures 311
Trust Nobody 312
Failover Testing 312
Monitoring and History of Patterns 313
Getting a Good Night’s Sleep 314
Contributors 317
Index 323
Programming
Learn the skills needed in web operations, and why they're gained through experience rather than schooling
Understand why it's important to gather metrics from both your application and infrastructure
Consider common approaches to database architectures and the pitfalls that come with increasing scale
Learn how to handle the human side of outages and degradations
Find out how one company avoided disaster after a huge traffic deluge
Discover what went wrong after a problem occurs, and how to prevent it from happening again
Contributors include:
John Allspaw
Heather Champ
Michael Christian
Richard Cook
Alistair Croll
Patrick Debois
Eric Florenzano
Paul Hammond
Justin Huff
Adam Jacob
Jacob Loomis
Matt Massie
Brian Moon
Anoop Nagwani
Sean Power
Eric Ries
Theo Schlossnagle
Baron Schwartz
Andrew Shafer
About the Author John Allspaw is currently Operations Engineering Manager at Flickr, the popular photo site. He has had extensive experience working with growing web sites since 1999. These include online news magazines Salon.com, InfoWorld.com, Macworld.com and social networking sites that experienced extreme growth (Friendster and Flickr). During his time at Friendster, traffic increased 5X. He was responsible for their transition from a couple dozen servers in a failing data center to over 400 machines across two data centers, and the complete redesign of the backing infrastructure. When he joined Flickr, they had 10 servers in a tiny data center in Vancouver; they are now located in multiple data centers across the US. Prior to his web experience, Allspaw worked in modeling and simulation as a mechanical engineer doing car crash simulations for the NHTSA.
Jesse Robbins (@jesserobbins) is CEO of Opscode (makers of Chef) and a recognized expert in Infrastructure, Web Operations, and Emergency Management.
He serves as co-chair of the Velocity Web Performance & Operations Conference and contributes to the O'Reilly Radar . Prior to co-founding Opscode, he worked at Amazon.com with a title of "Master of Disaster" where he was responsible for Website Availability for every property bearing the Amazon brand.
Robbins is a volunteer Firefighter/EMT and Emergency Manager, and led a task force deployed in Operation Hurricane Katrina. His experiences in the fire service profoundly influence his efforts in technology, and he strives to distill his knowledge from these two worlds and apply it in service of both.
Foreword 13
Preface 15
1 Web Operations: The Career 21
Theo Schlossnagle 21
Why Does Web Operations Have It Tough? 22
From Apprentice to Master 24
Conclusion 29
2 How Picnik Uses Cloud Computing: Lessons Learned 31
Justin Huff 31
Where the Cloud Fits (and Why!) 32
Where the Cloud Doesn’t Fit (for Picnik) 40
Conclusion 40
3 Infrastructure and Application Metrics 41
John Allspaw, with Matt Massie 41
Time Resolution and Retention Concerns 42
Locality of Metrics Collection and Storage 43
Layers of Metrics 44
Providing Context for Anomaly Detection and Alerts 47
Log Lines Are Metrics, Too 48
Correlation with Change Management and Incident Timelines 50
Making Metrics Available to Your Alerting Mechanisms 51
Using Metrics to Guide Load-Feedback Mechanisms 52
A Metrics Collection System, Illustrated: Ganglia 56
Conclusion 67
4 Continuous Deployment 69
Eric Ries 69
Small Batches Mean Faster Feedback 69
Small Batches Mean Problems Are Instantly Localized 70
Small Batches Reduce Risk 70
Small Batches Reduce Overhead 71
The Quality Defenders’ Lament 72
Getting Started 76
Continuous Deployment Is for Mission-Critical Applications 80
Conclusion 83
5 Infrastructure As Code 85
Adam Jacob 85
Service-Oriented Architecture 87
Conclusion 99
6 Monitoring 101
Patrick Debois 101
Story: “The Start of a Journey” 101
Step 1: Understand What You Are Monitoring 105
Step 2: Understand Normal Behavior 115
Step 3: Be Prepared and Learn 122
Conclusion 126
7 How Complex
Systems Fail 127
John Allspaw and Richard Cook 127
How Complex Systems Fail 128
Further Reading 136
8 Community Management and Web Operations 137
Heather Champ and John Allspaw 137
9 Dealing with Unexpected Traffic Spikes 147
Brian Moon 147
How It All Started 147
Alarms Abound 148
Putting Out the Fire 149
Surviving the Weekend 150
Preparing for the Future 151
CDN to the Rescue 151
Proxy Servers 152
Corralling the Stampede 152
Streamlining the Codebase 153
How Do We Know It Works? 154
The Real Test 155
Lessons Learned 156
Improvements Since Then 156
10 Dev and Ops Collaboration and Cooperation 159
Paul Hammond 159
Deployment 160
Shared, Open Infrastructure 164
Trust 166
On-call Developers 168
Avoiding Blame 173
Conclusion 175
11 How Your Visitors Feel: User-Facing Metrics 177
Alistair Croll and Sean Power 177
Why Collect User-Facing Metrics? 179
What Makes a Site Slow? 183
Measuring Delay 185
Building an SLA 191
Visitor Outcomes: Analytics 193
Other Metrics Marketing Cares About 198
How User Experience Affects Web Ops 199
The Future of Web Monitoring 200
Conclusion 205
12 Relational Database Strategy and Tactics for the Web 207
Baron Schwartz 207
Requirements for Web Databases 208
How Typical Web Databases Grow 213
The Yearning for a Cluster 220
Database Strategy 225
Database Tactics 232
Conclusion 238
13 How to Make Failure Beautiful: The Art and
Science of Postmortems 239
Jake Loomis 239
The Worst Postmortem 240
What Is a Postmortem? 241
When to Conduct a Postmortem 242
Who to Invite to a Postmortem 243
Running a Postmortem 243
Postmortem Follow-Up 244
Conclusion 246
14 Storage 247
Anoop Nagwani 247
Data Asset Inventory 247
Data Protection 251
Capacity Planning 260
Storage Sizing 262
Operations 264
Conclusion 265
15 Nonrelational Databases 267
Eric Florenzano 267
NoSQL Database Overview 268
Some Systems in Detail 272
Conclusion 281
16 Agile Infrastructure 283
Andrew Clay Shafer 283
Agile Infrastructure 285
So, What’s the Problem? 289
Communities of Interest and Practice 299
Trading Zones and Apologies 299
Conclusion 302
17 Things That Go Bump in the Night (and How to Sleep Through Them) 305
Mike Christian 305
Definitions 307
How Many 9s? 308
Impact Duration Versus Incident Duration 309
Datacenter Footprint 310
Gradual Failures 311
Trust Nobody 312
Failover Testing 312
Monitoring and History of Patterns 313
Getting a Good Night’s Sleep 314
Contributors 317
Index 323
Programming
备用文件名
motw/Web Operations_ Keeping the Data on Time - John Allspaw.pdf
备用文件名
lgli/John Allspaw & Jesse Robbins - Web Operations: Keeping the Data on Time (2010, O'Reilly).pdf
备用文件名
zlib/Computers/Networking/John Allspaw & Jesse Robbins/Web Operations: Keeping the Data on Time_21446032.pdf
备选作者
Robbins, Jesse; Allspaw, John
备选作者
Allspaw, John, Robbins, Jesse
备用版本
1st ed., Beijing [China], Sebastopol, CA, China, 2010
备用版本
United States, United States of America
备用版本
1. Aufl, Beijing ; Köln, 2010
备用版本
1, PS, 2010
元数据中的注释
producers:
Adobe PDF Library 9.0
Adobe PDF Library 9.0
元数据中的注释
Memory of the World Librarian: marcell mars
元数据中的注释
"Learn the skills needed in web operations, and why they're gained through experience rather than schooling; understand why it's important to gather metrics from both your application and your infrastructure; consider common approaches to database architectures and the pitfalls that come with increasing scale; learn how to handle the human side of outages and degradation; find out how one company avoided disaster after a huge traffic deluge; discover--after a problem occurs--what went wrong and how to prevent it from happening again"--P. [4] of cover.
Includes index.
Includes index.
备用描述
Learn what it takes to build and maintain high-traffic websites with Web Operations. Featuring essays from today's top web veterans, this insightful book shows you how to run your web ops as reliably and effectively as Google, Microsoft, and Yahoo run theirs. Even if your site never gets that big, you'll profit from the experience and knowledge of the people who created sites for these and other industry giants. http://oreilly.com/catalog/0636920000136
备用描述
Featuring essays from today's top web engineers, this insightful book shows you how to run your web operations as reliably and effectively as Google, Microsoft, and Yahoo! run theirs. Even if your site never gets that big, you'll profit from the experience and knowledge of the people who created sites for these and other industry giants
开源日期
2022-05-01
🚀 快速下载
成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
🐢 低速下载
由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)
- 低速服务器(合作方提供) #1 (稍快但需要排队)
- 低速服务器(合作方提供) #2 (稍快但需要排队)
- 低速服务器(合作方提供) #3 (稍快但需要排队)
- 低速服务器(合作方提供) #4 (稍快但需要排队)
- 低速服务器(合作方提供) #5 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #6 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #7 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #8 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #9 (无需排队,但可能非常慢)
- 下载后: 在我们的查看器中打开
所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
外部下载
-
对于大文件,我们建议使用下载管理器以防止中断。
推荐的下载管理器:Motrix -
您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
推荐的电子书阅读器:Anna的档案在线查看器、ReadEra和Calibre -
使用在线工具进行格式转换。
推荐的转换工具:CloudConvert和PrintFriendly -
您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
推荐的工具:亚马逊的“发送到 Kindle”和djazz 的“发送到 Kobo/Kindle” -
支持作者和图书馆
✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。
下面的文字仅以英文继续。
总下载量:
“文件的MD5”是根据文件内容计算出的哈希值,并且基于该内容具有相当的唯一性。我们这里索引的所有影子图书馆都主要使用MD5来标识文件。
一个文件可能会出现在多个影子图书馆中。有关我们编译的各种数据集的信息,请参见数据集页面。
有关此文件的详细信息,请查看其JSON 文件。 Live/debug JSON version. Live/debug page.