Scale Fail (part 1)
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
Let me tell you a secret. I don't fix databases. I fix applications.
Companies hire me to "fix the database" because they think it's the source of their performance and downtime problems. This is very rarely the case. Failure to scale is almost always the result of poor management decisions — often a series of them. In fact, these anti-scaling decisions are so often repeated that they have become anti-patterns.
I did a little talk about these anti-patterns at the last MySQL Conference and Expo. Go watch it and then come on back. Now that you've seen the five-minute version (and hopefully laughed at it), you're ready for some less sarcastic detail which explains how to recognize these anti-patterns and how to avoid them.
Trendiness
"Well ... our CTO is the only one at the weekly CTO's lunch who uses PostgreSQL. The other CTOs have been teasing him about it."
Does this sound like your CTO? It's a real conversation I had. It also describes more technical executives than I care to think about: more concerned with their personal image and career than they are with whether or not the site stays up or the company stays in business. If you start hearing any of the following words in your infrastructure meetings, you know you're in for some serious overtime: "hip", "hot", "cutting-edge", "latest tech", or "cool kids". References to magazine surveys or industry trends articles are also a bad sign.
Scaling an application is all about management of resources and administrative repeatability. This means using technology which your staff is extremely familiar with and which has been tested and proven to be reliable — and is designed to do the thing you want it to do. Hot new features are less important than consistent uptime without constant attention. More importantly, web technology usually makes big news while it's still brand new, which also means poorly documented, unstable, unable to integrate with other components, and full of bugs.
There's also another kind of trendiness to watch out for, it's the one which says, "If Google or Facebook does it, it must be the right choice." First, what's the right choice for them may not be the right choice for you, unless your applications and platform are very similar to theirs.
Second, not everything that Google and Facebook did with their infrastructures are things they would do again if they had to start over. Like everyone else, the top internet companies make bad decisions and get stuck with technology which is painful to use, but even more painful to migrate away from. So if you're going to copy something "the big boys" do, make sure you ask their staff what they think of that technology first.
No metrics
"I'm sure the problem is HBase."
"Yes, but have we checked?"
"I told you, we don't need to check. The problem is always HBase."
"Humor me."
"Whatever. Hmmmmmm ... oh! I think something's wrong with the network ..."
Scaling an application is an arithmetic exercise. If one user consumes X amount of CPU time on the web server, how many web servers do you need to support 100,000 simultaneous users? If the database is growing at Y per day, and Z% of the data is "active" how long until the active data outgrows RAM?
Clearly, you cannot do any of this kind of estimation without at least approximate values for X, Y, and Z. If you're planning to scale, you should be instrumenting every piece of your application stack, from the storage hardware to the JavaScript. The thing you forget to monitor is the one which will most likely bring down your whole site. Most software these days has some way to monitor its performance, and software that doesn't is software you should probably avoid.
Despite this common-sense idea, a surprising number of our clients were doing nothing more sophisticated than Nagios alerts on their hardware. This means that when a response time problem or outage occurs, they had no way to diagnose what caused it, and usually ended up fixing the wrong component.
Worse, if you don't have the math for what resources your application is actually consuming, then you have no idea how many servers, and of what kind, you need in order to scale up your site. That means you will be massively overbuilding some components, while starving others, and spending twice as much money as you need to.
Given how many companies lack metrics, or ignore them, how do they make decisions? Well ...
Barn door decision making
"Dan, you were an ad sales manager at Amazon."
In the absence of data, staff tend to troubleshoot problems according to their experience, which is usually wrong. Especially when an emergency occurs, there's a tendency to run to fix whatever broke last time. Of course, if they fixed the thing which broke last time, it's unlikely to be the cause of the current outage.
This sort of thinking gets worse when it comes time to plan for growth. I've seen plenty of IT staff purchase equipment, provision servers, configure hardware and software, and lay out networks according to what they did on their last project or even on their previous job. This means that the resources available for the current application are not at all matched to what that application needs, and either you over-provision dramatically or you go down.
Certainly you should learn from your experience. But you should learn appropriate lessons, like "don't depend on VPNs being constantly up". Don't misapply knowledge, like copying the caching strategy from a picture site to an online bank. Learning the wrong lesson is generally heralded by announcements in one or all of the following forms:
- "when I was at name_of_previous_employer ..."
- "when we encountered not_very_similar_problem before, we used random_software_or_technique ..."
- "name_of_very_different_project is using random_software_or_technique, so that's what we should use."
(For non-native English speakers, "barn door" refers to the expression "closing the barn door after the horses have run away")
Now, it's time to actually get into application design.
Single-threaded programming
"Instantly! It's like magic."
The parallel processing frame of mind is a challenge for most developers. Here's a story I've seen a hundred times: a developer writes his code single-threaded, he tests it with a single user and single process on his own laptop, then he deploys it to 200 servers, and the site goes down.
Single-threading is the enemy of scalability. Any portion of your application which blocks concurrent execution of the same code at the same time is going to limit you to the throughput of a single core on a single machine. I'm not just talking here about application code which takes a mutex, although that can be bad too. I'm talking about designs which block the entire application around waiting on one exclusively locked component.
For example, a popular beginning developer mistake is to put every single asynchronous task in a single non-forwarded queue, limiting the pace of the whole application to the rate at which messages can be pulled off that queue. Other popular mistakes are the frequently updated single-row "status" table, explicit locking of common resources, and total ignorance of which actions in one's programming language, framework, or database require exclusive locks on pages in memory.
One application I'm currently working on has a distributed data-processing cloud of 240 servers. However, assignment of chunks of data to servers for processing is done by a single-process daemon running on a single dispatch server, rate limiting the whole cloud to 4000 jobs/minute and 75% idle.
An even worse example was a popular sports web site we worked on. The site would update sports statistics by holding an exclusive lock on transactional database tables while waiting for a remote data service over the internet to respond. The client couldn't understand why adding more application servers to their infrastructure made the timeouts worse instead of better.
Any time you design anything for your application which is supposed to scale, ask yourself "how would this work if 100 users were doing it simultaneously? 1000? 1,000,000?" And learn a functional language or map/reduce. They're good training for parallel thinking.
Coming in part 2
I'm sure you recognized at least one of the anti-patterns above in your own company, as most of the audience at the Ignite talk did. In part two of this article, I will cover component scaling, caching, and SPoFs, as well as the problem with The Cloud.
[ Note about the author: to support his habit of hacking on the PostgreSQL database, Josh Berkus is CEO of PostgreSQL Experts Inc., a database and applications consulting company which helps clients make their PostgreSQL applications more scalable, reliable, and secure. ]
Index entries for this article | |
---|---|
GuestArticles | Berkus, Josh |
(Log in to post comments)
Scale Fail (part 1)
Posted May 6, 2011 22:47 UTC (Fri) by glikely (subscriber, #39601) [Link]
Single-threading not considered harmful
Posted May 7, 2011 2:10 UTC (Sat) by quotemstr (subscriber, #45331) [Link]
When the author admonishes us for using "single-threading", we naturally suppose that we should use "multi-threading" instead, but because a "multi-threaded program" is commonly understood to be composed of many shared-memory lightweight processes, following this advice will in fact tempt us to create programs that become expensive to scale. In a sense, multi-threading, not single-threading, is the true enemy of scalability.In any massively parallel system, it's the communication between processing nodes that ultimately limits the size and performance of the system. When we express concurrency using multiple threads, we naturally use use the memory shared by these threads as the communication medium. But because shared memory scales poorly, the cost of using ever-larger coherent-memory systems quickly overwhelms any possible benefit.
Having run into this wall, we transition to a communication medium that scales much better, although (or because) it offers fewer features and less coherency compared to shared memory; examples include databases, clustered filesystems, and specialized message queues. After this expensive and painful process, costs again increase linearly with capacity: processing nodes can be spread across multiple machines instead of having to share a single increasingly powerful machine. Because the communication medium is no longer shared memory, the possibility of multiple threads sharing a single process becomes irrelevant, and we see that the work we invested in using this kind of threading was wasted.
So to avoid these ends, let's avoid these beginnings: avoid multi-threading. Use single-threaded programs, which are easier to design, write, and debug than their shared-memory counterparts. Instead, use multiple processes to extract concurrency from the hardware. Choose a communication medium that works just as well on a single machine as it does between machines, and make sure the individual processes comprising the system are blind to the difference. This way, deployment becomes flexible and scaling becomes simpler. Because communication between processes by necessity has to be explicitly designed and specified, modularity almost happens by itself.
I believe the author had these points in mind when he wrote his article, but by denouncing "single-threading", he risks sending some readers down an unproductive path. Concurrency is the ultimate goal, and it's usually achieved best by a set of cooperating single-threaded programs. The word "thread" refers to a concept that resides at a level of abstraction not appropriate for this discussion, and its use can only muddle our thinking.
Single-threading not considered harmful
Posted May 7, 2011 8:10 UTC (Sat) by NAR (subscriber, #1313) [Link]
Single-threading not considered harmful
Posted Jul 4, 2011 11:40 UTC (Mon) by csamuel (✭ supporter ✭, #2624) [Link]
Or MPI which lets you span nodes and is frequently used for highly parallel High Performance Computing (HPC) codes.
It also works well within a single system; I've seen a particular HPC crash simulation code which came in both SMP and MPI variants and even within a single multi core system the MPI version scaled better than the SMP version.
Single-threading not considered harmful
Posted May 7, 2011 16:58 UTC (Sat) by Aliasundercover (subscriber, #69009) [Link]
Performance scaling? Sure, shared memory doesn't scale to the largest jobs. For that you need methods that cross machines. Those methods don't scale to the smallest latencies. For that you need shared memory.
Still, I agree. Threading is too much the method of fashion more because everyone is doing it than technical merit. Of course everyone doing it means you get to use libraries other people wrote and spend less time swimming up stream.
Single-threading not considered harmful
Posted May 12, 2011 0:34 UTC (Thu) by jberkus (guest, #55561) [Link]
Single-threading not considered harmful
Posted May 12, 2011 1:57 UTC (Thu) by Trelane (subscriber, #56877) [Link]
Single-threading not considered harmful
Posted May 12, 2011 15:47 UTC (Thu) by kamil (subscriber, #3802) [Link]
Single-threading not considered harmful
Posted May 12, 2011 15:52 UTC (Thu) by Trelane (subscriber, #56877) [Link]
Single-threading not considered harmful
Posted May 19, 2011 9:11 UTC (Thu) by renox (guest, #23785) [Link]
Maybe multi/single-tasking would be better terms?
Single-threading not considered harmful
Posted May 22, 2011 10:47 UTC (Sun) by kplcjl (guest, #75098) [Link]
The problem is recognizing when a dedicated single process is the best answer to your problem or when multiple threads on the same server would be best. Generally, when your problem is data driven, the multiple thread solution would be better and when your problem is cpu intensive, a single thread dedicated to solving the problem combined with a data storage mechanizm so multiple servers can attack several individual problems concurrently may be the way to go.
This article seems to cover the situation where one of those solutions fit and therefore it becomes the "best" solution for every situation. It doesn't seem to apply to dumb coding mistakes that make good design go bad. For instance, I saw one case where it should have used "-" instead of "+" in one place of a mathematical equation. It took me a week and a half to convince the manager there was a mistake in the equation.
Single-threading not considered harmful
Posted May 22, 2011 11:08 UTC (Sun) by kplcjl (guest, #75098) [Link]
Single-threading not considered harmful
Posted May 20, 2011 19:48 UTC (Fri) by mikernet (guest, #75071) [Link]
Your application should be multi-threaded so that all the processor cores and idle time on any one machine are fully utilized. It should also be able to scale across machines so you can increase total processing capacity easily.
Single-threading not considered harmful
Posted May 20, 2011 22:44 UTC (Fri) by raven667 (subscriber, #5198) [Link]
On a modern multi-socket multi-core machine each socket is its own largely independent computer and the whole machine is a NUMA cluster. That means that each process is assigned to a particular node and that's where its memory lives, splitting memory between nodes or bouncing a process between different nodes reduces performance. My guess is that threading will scale weirdly when you get beyond what can be handled by all the cores in one socket whereas a multi-process model can keep more memory local to the socket the process is running on.
I would suggest starting with a multi-process model because you get better fault tolerance with memory protection then consider threading if that doesn't test out for concurrent performance
No Metrics
Posted May 7, 2011 3:09 UTC (Sat) by mlawren (guest, #10136) [Link]
Great article! Sadly everything you wrote rings very true. I've seen the No Metrics conversation go the other way as well:
"app: The network is broken."
"net: Hmmm... let me check. Nope, my measurements indicate everything is ok"
"app: Are you sure?"
"net: Yes, here is the data, see for yourself.
"app: But must be the network!!!"
"net: We haven't made any changes. Look at these historical graphs of usage. That spike from last week was the file-sharer who had that unfortunate accident in the carpark on Monday. Nothing difference since then."
"net: Have you checked your application?"
"app: Of course, it's running fine."
"net: How do you know?"
"app: I just know."
"net: When was the last time it ran fine?"
"app: Tuesday."
"net: What have you changed since then?"
"app: Only re-worked the dispatcher, and migrated the cache location to the other campus. But we checked, the code runs fine!"
"net: wtf??!?"
"net: Say, you drive that green Ford Focus don't you?"
"app: Yeah, why?."
"net: No reason. Your problem will be solved by tomorrow."
No Metrics
Posted May 7, 2011 19:20 UTC (Sat) by mstefani (guest, #31644) [Link]
No Metrics
Posted May 9, 2011 4:36 UTC (Mon) by gdt (subscriber, #6284) [Link]
Particularly the lack of instrumentation, especially of problematic middleboxes such as application (de)accelerators and firewalls. Even basic monitoring is poor, link with application-performance-killing high errors rates often creeping under the radar of monitoring tools like Nagios.
It's rare to see routing designed with good choices and configured correctly. There's a simple tell-tale test: type in an unassigned IP address in the corporate network, does it error immediately or time out?
The poor state of corporate networks isn't helped by networking equipment vendors, who often ship equipment with near-essential settings off for "backward compatibility".
Finally, many sysadmins and applications are their own worst enemy. Using IP addresses rather than DNS names (they're going to regret that, come IPv6). Disabling ethernet autonegotiation. Assuming link layer connectivity for high-availability schemes. Refusing to deal with authentication and authorisation issues within the application, but pushing that into VLANs and VPNs, thus turning the corporate network into a flat layer two network, with resulting poor behaviour under fault conditions.
No Metrics
Posted May 9, 2011 8:39 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
I have seen many times internal services using HTTP with plain-text auth on the local networks - because administrators think it's 'secure'. Hell, probably everyone here is guilty of that.
Fortunately, situation is changing. With IPv6 it's possible to do end-to-end IPSec (which is not possible due to #(*$&(@$& NATs right now) and with DNSSEC it's possible to reliably store host certs in RDNS.
No Metrics
Posted May 9, 2011 12:43 UTC (Mon) by paulj (subscriber, #341) [Link]
No Metrics
Posted May 9, 2011 16:13 UTC (Mon) by raven667 (subscriber, #5198) [Link]
Universal end-to-end nightmare
Posted May 19, 2011 18:39 UTC (Thu) by oelewapperke (guest, #74309) [Link]
Given how many security problems we have, and how quickly they get fixed ... this is sadly a good thing.
No Metrics
Posted May 9, 2011 18:03 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
With IPv6 it's exactly backwards - it's a struggle NOT to make your computers globally addressable.
No Metrics
Posted May 9, 2011 21:48 UTC (Mon) by paulj (subscriber, #341) [Link]
No Metrics
Posted May 10, 2011 8:18 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
The changes will take years, so there'll be plenty of time for security to evolve. But we now have foundation for it.
No Metrics
Posted May 10, 2011 9:55 UTC (Tue) by paulj (subscriber, #341) [Link]
No Metrics
Posted May 10, 2011 11:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
And corporate networks will benefit from end-to-end security most, so I expect that they'll migrate to IPsec even before home users.
No Metrics
Posted May 10, 2011 17:35 UTC (Tue) by dlang (guest, #313) [Link]
No Metrics
Posted May 10, 2011 17:41 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
Besides, it's not like I can't make an HTTPS tunnel which can pierce all but the most paranoid firewalls right now. Skype does this, for example.
No Metrics
Posted May 10, 2011 21:37 UTC (Tue) by Tobu (subscriber, #24111) [Link]
Nitpick: that depends on the key exchange. Sniffing after a Diffie Helman requires the cooperation of one of the parties, and I don't think wireshark has support for this at the moment.
No Metrics
Posted May 10, 2011 21:45 UTC (Tue) by raven667 (subscriber, #5198) [Link]
No Metrics
Posted May 19, 2011 18:40 UTC (Thu) by oelewapperke (guest, #74309) [Link]
And it's perfectly secure.
No Metrics
Posted May 23, 2011 4:24 UTC (Mon) by RobertBrockway (guest, #48927) [Link]
No Metrics
Posted May 11, 2011 18:36 UTC (Wed) by Baylink (guest, #755) [Link]
The problem with utopias is that it only takes *one* Bad Guy to fuck things up for the rest of us.
"That's not a feature, that's a bug."
No Metrics
Posted May 12, 2011 13:42 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
No Metrics
Posted May 19, 2011 18:46 UTC (Thu) by oelewapperke (guest, #74309) [Link]
It kinda does solve a lot of problems.
I mean, I hate nat just like the next guy. But you won't get anywhere by declaring it doesn't solve anything. You'll be just like gaia idiots screaming before the capitol to get America off oil, not realizing they're basically asking America to cut it's economy by 95% or more. Not going to happen (and it's a *good* thing we don't honor such requests)
NAT is a beautifully simple solution. And it is possible to modify just about any protocol to work with nat. I fear nat and ipv4 may be here to stay.
Certainly converting RIPE, APNIC and AFRINIC over to ARIN rules would give us another 10 years easily. Saying "an IP will cost you $0.01 per year" will get us another 100 years.
No Metrics
Posted May 19, 2011 19:11 UTC (Thu) by nybble41 (subscriber, #55106) [Link]
Anyway, most home routers aren't much more secure with NAT, since they allow ports to be forwarded via UPnP requests. If you're running a server and opening forwarding ports with UPnP you might as well permit direct access; if not, blocking the connection at the server (because the port is closed) is just as effective as blocking it at the firewall. An effective firewall must be configured by the network administrator to accept or reject specific traffic, not simply permit incoming connections to any local server that asks politely while blocking the ones which would have been rejected anyway.
Blame the network
Posted May 9, 2011 5:35 UTC (Mon) by ringerc (subscriber, #3071) [Link]
Suuuure, MYOB.
Scale Fail (part 1)
Posted May 7, 2011 3:27 UTC (Sat) by samth (guest, #1290) [Link]
Performance Analysis of Idle Programs
Erik Altman, Matthew Arnold, Stephen Fink, Nick Mitchell
https://researcher.ibm.com/researcher/view_project.php?id...
Scale Fail (part 1)
Posted May 9, 2011 7:56 UTC (Mon) by sgros (guest, #36440) [Link]
1. Did they patented it? Can they patent it if they didn't already do so?
2. Is there some open source code that does something similar or that re-implements this functionality?
Scale Fail (part 1)
Posted May 15, 2011 11:51 UTC (Sun) by ramon (guest, #21845) [Link]
Scale Fail (part 1)
Posted May 18, 2011 7:46 UTC (Wed) by incase (guest, #37115) [Link]
I had no trouble finding this link: https://researcher.ibm.com/researcher/files/us-sjfink/res...
josh's clients aren't stupid, they're smart
Posted May 7, 2011 6:16 UTC (Sat) by b7j0c (subscriber, #27559) [Link]
most small ideas will fail. most entrepreneurs know this. its stupid to try to build a service for a hundred million users when you can't attract ten thousand, and since small development groups are just learning about their domain, trying to scale out early will probably just result in throw-away code. don't worry about problems that aren't problems.
i think josh's clients are doing the right thing. first get something built that people want to use. push your stack as far as you can. if in the end you're succesful enough to afford the time to do a rearchitect yourself or pay someone else to do it, thats called success and its a good thing.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 9:11 UTC (Sat) by tialaramex (subscriber, #21167) [Link]
You see, there are actually three ways things can go from a startup. It can crash and burn (doesn't matter what you planned, every dollar spent making it scale is now worthless), it can explode with success (everything that doesn't scale means incredible pain and expense) but it can also trundle along steadily growing by increments.
The latter is boring, not many books about it, no amazing slide show presentations with violent asymptotic charts. But it's probably reality for way more engineers. In this scenario there isn't enough money to solve problems by throwing money at them, you will run out.
And it's for this reason that you ought to do a little bit of thinking about the things Josh mentions on a new project. You don't have to solve every problem now, that leads to paralysis, but you need some awareness of which bits of the system will need work in six weeks, or in six months with the kind of steady growth that means there's no risk of unemployment but you also won't all be rich.
Smart clients would be coming to Josh because they actually do have a specific scalability problem in PostgreSQL, and they expect him to fix it so that they can avoid spending a lot of money on faster servers as a workaround. Paying a PostgreSQL consultant to tell you that you'd got a switch port running at 10 Half duplex is not smart.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 14:47 UTC (Sat) by b7j0c (subscriber, #27559) [Link]
josh's clients aren't stupid, they're smart
Posted May 7, 2011 12:53 UTC (Sat) by aggelos (guest, #41752) [Link]
Josh's examples make it quite clear that, those clients at least, use duct tape just because that's the limit of their abilities. I can't count the times I've heard "no time to do it right now, we'll deal with it when we make it" as a euphemism for "we're grossly incompetent and in any case I'd like to change the subject now". It is especially telling when a forced cut-all-corners implementation could be fixed in a couple of days, yet it never seems to make it into the TODO list until after it's caused downtime and/or major customer issues for a couple of months. Spending money based on What Idiotic Idea of the Week magazine suggested and ignoring blatantly obvious needs like reliable and verifiable backups also speaks volumes about whether such startups are actually making rational choices.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 14:27 UTC (Sat) by b7j0c (subscriber, #27559) [Link]
everyone wins here. josh gets paid and spreads some wisdom. the startup gets problems solved and learns something along the way.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 15:33 UTC (Sat) by aggelos (guest, #41752) [Link]
Unfortunately, even a thing as simple as building a scalable news site or web shop is not something so well understood that you can just set up web framework X and let it run, treating it like black box, tweaking it's parameters until it no longer fails and randomly buying more expensive hardware until the problems become less evident.
Management that a) does not know how to hire competent technical people b) actively avoids hiring competent (and probably more expensive) technical people c) consistently rewards people for finding a hacky workaround after an all-nighter instead of investigating and potentially fixing the actual problem, just deserves to fail. Bringing in 3rd-party consultants is not going to somehow fix their *actual* problems (which, as Josh notes, are managerial in nature). They will just keep "handling" issues the exact same way, spend more frustrating hours, lose even more viewers/customers and, of course, money. What's worse, they'll keep stressing their employees to do the impossible when things fail. If you make sure you have know idea how (let alone why) your system works, you can not fix any but the most trivial issues and definitely cannot outperform a competitor that has a clue.
Now, obviously, sucessful (not necessarily smart) business decisions might keep you going despite that, but that in no way vindicates your recklessness.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 17:40 UTC (Sat) by b7j0c (subscriber, #27559) [Link]
you've just described 95% of the www, including the inauspicious starts of many of the top ten leaders in traffic.
having built part of a top-ten site (yahoo, and my contribution, yahoo news is still number one in its category, worldwide), i'm going to arrogantly state that most people here have a wholly unrealistic view of the forethought, planning, architecture and staffing of a rapidly growing service. i am 100% confident that if you poll individuals who built very high traffic sites from an early stage, you will find a group of individuals who are utterly honest about their inability to do it right the first time, or even know how to
josh's clients aren't stupid, they're smart
Posted May 7, 2011 22:38 UTC (Sat) by aggelos (guest, #41752) [Link]
Our discussion was on whether management in these companies is making an informed trade-off or if they feel safer stabbing in the dark, rather than going for the light switch. I don't think you've brought anything new to the table, other than your degree of confidence in your convictions which, I'm sure you'll agree, is not something one can argue against :-)
Besides, I'm pretty sure noone contested that it takes a lot of effort to build a highly scalable site, even if you do know what you're doing. Still not relevant to the discussion in this subthread though.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 23:03 UTC (Sat) by cmccabe (guest, #60281) [Link]
However, there are more sites out there that failed because of bad engineering than you think. For example, Friendster could have been what Facebook is today if they had been able to scale the site properly.
Over-planning and over-architecting are often problems, but I think most organizations tend to under-measure rather than over-measure. If you don't know what "the problem" is, it's lot harder to throw money at it to make it go away. And if you can't tell the good engineers and the good consultants from the bad, then you really are doomed, no matter how much you have in the bank.
P.S.
It is funny to hear Josh complain about stupid clients. After all, if they were smart, he wouldn't have a job.
josh's clients aren't stupid, they're smart
Posted May 7, 2011 23:51 UTC (Sat) by wahern (subscriber, #37304) [Link]
The secret ingredient to success is pure luck. It's hard to swallow. If you disagree, then show me your billion dollar stock portfolio. The accuracy and reliability would only need to be moderately significant in order to extract obscene amounts of wealth from the market.
People who seem to be relatively good predictors of success, such as Warren Buffett, are incapable of articulating their thinking process. If they could articulate it, then there'd be many more copy cats. This suggests the possibility that the emergence of people like Warren Buffett is also a chance occurrence, and that whatever "it factor" they possess is entirely anomalous.
Hard work and fair play don't lead to success, they lead to marginal improvements in wealth and increasingly tolerable living conditions over long spans of time. That they lead to individual success is a noble lie we tell ourselves in order for society to extract those meager gains.
Now, hard work and fair play may lead to personal fulfillment. Maybe that's more important than any kind of economic wealth.
josh's clients aren't stupid, they're smart
Posted May 8, 2011 0:26 UTC (Sun) by cmccabe (guest, #60281) [Link]
> good predictors of success. If they were, and if those things were
> accurately and reliably quantifiable (which they must be to have any
> substance to them) then predicting success would be incredibly more common
First of all, there is a middle ground between things being completely random, and completely predetermined by known factors. Second of all, a lot of the most important predictors of success are things that we haven't learned to quantify yet.
Warren Buffet has articulated his stock market strategy many times. Do your research; only invest in things that you understand; invest for the long term.
> Now, hard work and fair play may lead to personal fulfillment. Maybe
> that's more important than any kind of economic wealth.
Is personal fulfillment "accurately and reliably quantifiable"? If not, by your own argument, it has no "substance" to it. Or maybe, just maybe, science hasn't learned to quantify a lot of the most important things in life.
josh's clients aren't stupid, they're smart
Posted May 8, 2011 1:13 UTC (Sun) by tialaramex (subscriber, #21167) [Link]
And yes, luck is a massive factor. Warren's strategy sounds good, but it's the same strategy lots of people have, without getting his success. We might as well listen to the anomalous 110 year old who tells us he puts it down to a long walk every afternoon. No doubt walking doesn't hurt, but it's not why he's 110, that's just blind luck.
Survivorship bias is a huge problem. If 500 people all pick one of twenty strategies at random, and all but one of the 500 fails, we would be wrong to assume that therefore the strategy chosen by that one person works and the other nineteen do not. But that's exactly what survivorship bias causes us to assume.
josh's clients aren't stupid, they're smart
Posted May 9, 2011 7:55 UTC (Mon) by cmccabe (guest, #60281) [Link]
Are these digits random or not? And if not, what is the pattern?
69804177583220909702029165734725158290463091035903784297757265172087724
josh's clients aren't stupid, they're smart
Posted May 9, 2011 16:02 UTC (Mon) by dskoll (subscriber, #1630) [Link]
Of course those digits are random. So are these digits:
1111111111111111111111111111111111111111111111111111111111111111
(In other words: Your question is meaningless.)
josh's clients aren't stupid, they're smart
Posted May 10, 2011 6:17 UTC (Tue) by cmccabe (guest, #60281) [Link]
josh's clients aren't stupid, they're smart
Posted May 10, 2011 10:45 UTC (Tue) by dskoll (subscriber, #1630) [Link]
Of course it's meaningless. There's no such thing as a set of digits that are "random" or "non-random". You can take a sequence generator and run some statistical tests, but that doesn't prove anything. You can test a sequence of digits for compressibility, but that also doesn't prove anything. The digits in the decimal expansion of pi pass all kinds of statistical tests for randomness, but they are assuredly not "random".
As for a pattern, given a finite sequence of digits, you can construct any pattern you like. I could construct a degree-71 polynomial that fits the 71 digits you posted and say "Yes, that's the generator!"
josh's clients aren't stupid, they're smart
Posted May 10, 2011 14:20 UTC (Tue) by bronson (subscriber, #4806) [Link]
josh's clients aren't stupid, they're smart
Posted May 12, 2011 1:33 UTC (Thu) by dskoll (subscriber, #1630) [Link]
Well, at the parties I go to, people don't usually open the conversation with "69804177583220909702029165734725158290463091035903784297757265172087724".
josh's clients aren't stupid, they're smart
Posted May 12, 2011 4:43 UTC (Thu) by bronson (subscriber, #4806) [Link]
josh's clients aren't stupid, they're smart
Posted May 12, 2011 16:12 UTC (Thu) by dskoll (subscriber, #1630) [Link]
Chill out.
josh's clients aren't stupid, they're smart
Posted May 10, 2011 19:05 UTC (Tue) by cmccabe (guest, #60281) [Link]
> posted and say "Yes, that's the generator!"
That would be a pattern, but not an interesting one.
josh's clients aren't stupid, they're smart
Posted May 10, 2011 19:07 UTC (Tue) by cmccabe (guest, #60281) [Link]
A non-interesting pattern would be something like the observation "this is a number!" or the observation that all digits from 0 to 9 occur.
josh's clients aren't stupid, they're smart
Posted May 12, 2011 11:39 UTC (Thu) by etienne (guest, #25256) [Link]
http://kasmana.people.cofc.edu/MATHFICT/mf55-spoiler.html
josh's clients aren't stupid, they're smart
Posted May 19, 2011 11:03 UTC (Thu) by yeti-dn (guest, #46560) [Link]
Just answering so I don't miss the answer
Posted May 10, 2011 21:25 UTC (Tue) by man_ls (guest, #15091) [Link]
It looks pretty random, but there are too few repeated consecutive digits: only three 77 and one 22. I give up, what is the pattern?
Just answering so I don't miss the answer
Posted May 10, 2011 23:05 UTC (Tue) by neilbrown (subscriber, #359) [Link]
If you consider just the first 4 digits (0,1,2,3), then even digits occur 10 times each, odd digits 5 time each. However this pattern does not continue (4 and 8 also occur 5 times, but are even).
The longest gap between repeats is 24 between 2 '8's.
'6' and '4' see gaps of 23.
'6' is the least frequent digit, 7 is the most frequent (3 times as frequent)
Taken as a decimal integer the factors less than 100 are
2,2,3,3,23,23
Yet the whole number is not a perfect square.
My conclusion is that this is probably the "least uninteresting number" - very interesting....
josh's clients aren't stupid, they're smart
Posted May 11, 2011 6:00 UTC (Wed) by cmccabe (guest, #60281) [Link]
Every digit is completely predetermined, but it's pretty hard to spot if you don't know what you're looking for!
josh's clients aren't stupid, they're smart
Posted May 11, 2011 8:07 UTC (Wed) by ballombe (subscriber, #9523) [Link]
josh's clients aren't stupid, they're smart
Posted May 11, 2011 16:56 UTC (Wed) by tialaramex (subscriber, #21167) [Link]
We suspect (as far as I remember no-one has proved) that all possible sequences of digits occur in Pi. Assuming this is so, the fact that a particular sequence occurs in Pi is not interesting at all.
josh's clients aren't stupid, they're smart
Posted May 11, 2011 20:08 UTC (Wed) by cmccabe (guest, #60281) [Link]
josh's clients aren't stupid, they're smart
Posted May 12, 2011 8:27 UTC (Thu) by ekj (guest, #1524) [Link]
josh's clients aren't stupid, they're smart
Posted May 12, 2011 16:56 UTC (Thu) by fuhchee (guest, #40059) [Link]
What Do These Digits Mean?
Posted May 19, 2011 12:35 UTC (Thu) by ldo (guest, #40946) [Link]
#!/usr/bin/python import sys charset = ' Wadefhlnost' modulo = 13 s = 13682311570832829480888979137834570837851469148689544502986 num = iter(range(2, 9999)) while s != 1 : n = num.next() if s % n == 0 : sys.stdout.write("%s" % charset[(n - 1) % modulo]) s /= n #end if #end while sys.stdout.write("\n")
What Do These Digits Mean?
Posted May 26, 2011 12:20 UTC (Thu) by net_benji (subscriber, #75195) [Link]
http://www.wolframalpha.com/input/?i=factor+1368231157083...
Here are more working combinations:
charset = "hetfn adlsoW"
s = 7062883793966047784250125868403644804557392731274644684033611
charset = "tslofWe dnha"
s = 1029201132023087388381452825147240668180384494509643451351
...
and here's a different one, just for the fun of it:
charset = "wfyonirvmD s?ecutdpah"
modulo = 23
s = 79 * 211 * 241 * 463 * 487 * 499 * 563 * 571 * 673 * 787 * 911 * 977 * 991 * 1039 * 1249 * 1483 * 1489 * 1493 * 1601 * 1621 * 1697 * 1699 * 1889 * 2243 * 2311 * 2347 * 2459 * 2521 * 2719 * 2909 * 2953 * 3119 * 3271 * 3323 * 3359 * 3533 * 3733 * 3947 * 3967 * 4057 * 4177 * 4283 * 4289 * 4597 * 4651 * 4733 * 4933 * 4969 * 5021 * 5087 * 5261 * 5281 * 5347 * 5399 * 5449 * 5557 * 5641 * 5711 * 5807 * 5869 * 5981 * 6353 * 6359 * 6389 * 6569 * 6701 * 6791 * 6823 * 6983 * 7187 * 7309 * 7321 * 7481 * 7529 * 7673 * 7873 * 7901 * 8017
The source for the generator is there: https://github.com/benthaman/lwn-digits/blob/master/encod...
Hope Jon won't mind posting this nonsense here ;)
josh's clients aren't stupid, they're smart
Posted May 8, 2011 1:37 UTC (Sun) by wahern (subscriber, #37304) [Link]
> Do your research; only invest in things that you understand;
> invest for the long term.
If you do those things in his books at best you'll approximate the market and not lose your shirt. And the model that best reproduces historical market behavior is a random walk. Overall market efficiency is an emergent effect from random processes, not the result of smart people being exceptional.
Look at the long line of successors Buffett has tried to groom. None could ever match his acumen by exercising his supposed methodology. Some have come close but inevitably make one or two wrong bets and their long-term gains fall far short of Buffett's. It's like gamblers who would have walked away with the bank but "bad luck" ruined their "streak". They disown the bad luck but claim the winning streak as the product of some personal attribute.
That's not to say that there aren't predictable ways to make money in the market, but they involve placing yourself in certain structural niches within the market. How you rise to those positions also seems to have little to do with intelligence or acumen by comparison with your peers. Where or how those structural market niches arise are unpredictable. Facebook didn't do anything different except chance upon the emergence of a niche at the right time.
The problem with defining success is that people look at the big winners and obvious failures and compare and contrast. If you see an impoverished alcoholic bum on the street and compare him to Buffett, you might conclude that success entails not letting alcohol control your life.
But once you have all of those DON'T DOs (i.e. how not to fail) you need to find the DO DOs. But even after you've enumerated all the DOs, you can find thousands or even millions of people who DO all the same things yet never succeed. There's a difference between not being a failure, and being successful, unless swimming instead of sinking is worthy of special praise. Success as flowing from individualized merit, therefore, is an illusion. In other words, just because you're successful doesn't mean you've done anything different than millions of other people. You're not special; you're just lucky by comparison. So there's little to be gained by analyzing and investigating what's "unique" about very successful people or companies. Anything you think is unique is either anomalous or not reproducible, and so liable to lead you astray from focusing on not failing. Focusing on not failing typically means making your product as good as necessary under the circumstances, not as good as possible.
You can certainly write a book about how to live a middle-class life, or how not to run a company into the ground. But no book will ever be able to describe how to become a millionaire, how to find the next great concept, or how to ensure that great idea will come to spectacular fruition by proper technical execution. Just as was stated earlier, many--perhaps most--of "successful" websites are ugly as hell behind the curtains. They were doing just enough to stay afloat, just like everybody else (on average), and serendipitously found themselves in charmed waters.
josh's clients aren't stupid, they're smart
Posted May 8, 2011 8:12 UTC (Sun) by cmccabe (guest, #60281) [Link]
Obviously, markets have a lot of problems and inefficiencies. A lot of mutual fund managers are overpaid for the small amount of research that they do. Sometimes the market over-allocates resources to one sector, like housing. But just because the market has inefficiencies, or Warren Buffet is having trouble finding a successor, doesn't somehow indicate that the market is completely random.
Despite your professed love for quantifiable data, you haven't really presented that much hard evidence for me to argue against. What I would like to argue against is your fatalistic attidue. You seem to view yourself as a helpless pawn of all-powerful external forces.
Ironically, optimism-- or at any rate, the will to keep on going-- is one of the best predictors I know of success-- and not just in business. If you look at any open source leader, any entrepreneur, successful politician-- whatever-- you will find that one thing they have in common is that they believe in themselves and their abilities.
One book you might be interested in is "The Drunkard's Walk" by Leonard Mlodinow, a physicist at Caltech. In it, he puts forth some pretty convincing arguments that life is more random than we think. Mlodinow has a statistician's focus on hard data and it's pretty interesting. The performance of mutual fund managers, the outcome of the world series, and the decisions of Hollywood producers are all revealed as being more random than we might think. He also takes a swing a Bill Gates, claiming that he was just in the right place in the right time. Personally, I disagree-- I think Gates would have been at least moderately successful in any place and time. Anyway, you might enjoy the book.
C.
josh's clients aren't stupid, they're smart
Posted May 8, 2011 22:37 UTC (Sun) by ibukanov (subscriber, #3942) [Link]
This is not what I have observed. The most successful in business people I know personally are pessimists. In one particular case the pessimism made the guy to prepare for the worst. That allowed for him to survive bad times and gain from the good times disproportionally.
Another example is a very optimistic guy who quieted his job as an oil engineer and went to Thailand to start a shrimp farm. It was rather successful initially. But one day a drunken worker from the factory got into a conflict with local mobs and they burn the farm down... Now the guy is back into his engineering position.
josh's clients aren't stupid, they're smart
Posted May 9, 2011 16:08 UTC (Mon) by raven667 (subscriber, #5198) [Link]
Success is just like cancer, you can get (un)lucky but if you keep hitting those carcinogens you improve your chances of catching it, although not guaranteed either way.
josh's clients aren't stupid, they're smart
Posted May 9, 2011 21:19 UTC (Mon) by ibukanov (subscriber, #3942) [Link]
In the first case it was not the will but rather the cash that the company saved during the good times under assumption that those would not last. With that it was not problematic to carry on during the bad times. And when competitors that hired too many engineers as consultants for the oil industry went burst it was trivial to pick up their clients. (In Norway it is rather hard to fire a person on a permanent position so a bankruptcy often is the simplest solution for a small company.)
> the person was not successful when they quit after the first setback
Only a very optimistic person would invest a few years of savings into something he had only vague idea about. And that fire on the shrimp farm was not a setback, it was a total disaster. As far as I know the farm was not insured against that type of damaged and the guy simply had no money to try one more time.
Note that I am not arguing that pessimism is required for a business success. I just want to say that even if optimism could be a necessary trait for some kinds of entrepreneurship, it may lead to failure just as well as to success.
josh's clients aren't stupid, they're smart
Posted May 14, 2011 14:31 UTC (Sat) by jjs (guest, #10315) [Link]
Another good book is "The Black Swan" by Nassim Nicholas Taleb.
josh's clients aren't stupid, they're smart
Posted May 14, 2011 14:40 UTC (Sat) by jjs (guest, #10315) [Link]
josh's clients aren't stupid, they're smart
Posted May 12, 2011 9:30 UTC (Thu) by ekj (guest, #1524) [Link]
If you mean, do a lot better than most people who peruse the same activity, then it's almost tautological that there's no recipee for it. If there was a simple and unambigious way of outperforming the general market as an investor, for example, then all investors would be using technique, thus negatings it's effectiveness. It's matemathically obvious that investors as a group, will not be able to beat the market. Thus one person can only beat the market to PRECISELY the same degree as another underperforms the market.
But there's other ways of defining success.
There's things that on the average *do* lead to goals that people desire. It's just that it's hard to do them. If you can manage to do them though, then achieving those goals become likely. I'd call that success.
Saving 10% of your income in a low-cost index-fund will, with high probability, give you a very solid financial position in a decade or two. (but not make you a billionaire)
You're just saying, it seems to me, there's no simple, predictable, repeatable way of doing the exceptional. That's almost tautological, because if there where, it'd no longer -be- exceptional.
josh's clients aren't stupid, they're smart
Posted May 14, 2011 1:41 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link]
Warren Buffet has articulated his stock market strategy many times. Do your research; only invest in things that you understand; invest for the long term.
That isn't an investment strategy; it's a short list of how to avoid common investment mistakes. Yes, you need to put your money into businesses you know and understand, but the sticking point is understanding business and knowing what to research when you're doing your homework. It's the knowledge necessary to do those things that makes Buffet an investment genius, and that detailed knowledge can't be encapsulated into a few pithy rules.
As investment advice those rules about as useful as Babe Ruth's advice on hitting a baseball: wait for a good pitch and hit it. It's good advice as far as it goes, but it's useless without the talent and skill to put it into practice. Similarly, it's great to come up with a list of best coding practices, but they're no replacement for judgment, skill, and experience in programming.
josh's clients aren't stupid, they're smart
Posted May 12, 2011 0:43 UTC (Thu) by jberkus (guest, #55561) [Link]
It is funny to hear Josh complain about stupid clients. After all, if they were smart, he wouldn't have a job.</quote>
I'd still rather get paid to solve interesting problems than stupid ones.
And what's really frustrating to me as a consultant is to put dozens of hours into a real solution for a client's scalability issues, only to have my work discarded because of management problems. Whether I get paid or not.
josh's clients aren't stupid, they're smart
Posted May 12, 2011 8:17 UTC (Thu) by ekj (guest, #1524) [Link]
Infact, it's -still- not done anywhere near "right", instead the limitations are worked-around.
josh's clients aren't stupid, they're smart
Posted May 12, 2011 8:55 UTC (Thu) by ssmith32 (subscriber, #72404) [Link]
Sounds like a great recipe for 100% Pure San Francisco/Peninsula VC Kool-Aid. Yum! :P
More seriously, do you have, at the very least, of list of notable successful business cases that illustrate how your principles work to create profitable, sustainable, and growing businesses?
josh's clients aren't stupid, they're smart
Posted May 22, 2011 22:32 UTC (Sun) by rodgerd (guest, #58896) [Link]
Scale Fail (part 1)
Posted May 7, 2011 15:30 UTC (Sat) by RogerOdle (subscriber, #60791) [Link]
Scale Fail (part 1)
Posted May 7, 2011 17:05 UTC (Sat) by huayra (guest, #58915) [Link]
Scale Fail (part 1)
Posted May 8, 2011 1:27 UTC (Sun) by dps (guest, #5725) [Link]
I am one of the very few people that has formally analysed parallel systems in anger. The difficultly of doing this indicates that it worth sacrificing some concurrency for a simpler design. Shared data is expensive to implement on distributed systems and good for writing 10+ pages of mathematics about a trivial 5 line program.
If a task can be broken down into pieces that are more or less independent then threads are probably a good idea: pools of I/O slaves are effective for some I/O intensive applications and data parallel programming works well for large numerical problems.
There are good mathematical reasons for a job list to work well, based on systems which fail at random intervals. This analysis assumes the cost of accessing the list is negligible compared to the jobs. If that is not true then it is an inappropriate design.
I think the idea that functional programming language are good training for parallel thinking is completely wrong. These languages, like ML, do not feature any control flow whatever. Anybody that attempts to analyse the evaluation strategy will be punished (and probably wrong).
Scale Fail (part 1)
Posted May 11, 2011 14:27 UTC (Wed) by oct (subscriber, #71481) [Link]
Bullshit. All MLs have strict evaluation strategy by default.
A niggle about linearity
Posted May 9, 2011 18:24 UTC (Mon) by davecb (subscriber, #1574) [Link]
You write "Scaling an application is an arithmetic exercise. If one user consumes X amount of CPU time on the web server, how many web servers do you need to support 100,000 simultaneous users?"
And it is arithmetic (linear), until it isn't.
When you hit a saturation point, the response time stops growing linearly with load, and increases insanely. The response time heads skyward like a homesick angel, and even the sysadmin thinks it's hung. If you plot it, you get a curve that looks a lot like a hockey-stick, with a short horizontal blade, a nice gentle bend... and a long straight handle that just keeps going up.
You add linear amounts of resources to get it back under control, of course, so the cure is arithmetic. The response time and the customer response, however, are hyperbolic (;-))
--dave
A niggle about linearity
Posted May 12, 2011 0:45 UTC (Thu) by jberkus (guest, #55561) [Link]
However, you can get an estimate of when you're going to hit those thresholds with some fairly simple arithmatic. It's a shame more people don't try.
A niggle about linearity
Posted May 13, 2011 2:59 UTC (Fri) by raven667 (subscriber, #5198) [Link]
I've struggled with this kind of issue in the past when trying to understand the performance issues and needs of a large in-house that I supported for many years. I got it wrong many times and the simple estimations that might have helped only look simple in retrospect. There is a lot of pressure to treat databases and storage as a black box until you ask more from it than it can give.
A niggle about linearity
Posted May 13, 2011 13:13 UTC (Fri) by andrewt (guest, #5703) [Link]
A niggle about linearity
Posted May 13, 2011 20:05 UTC (Fri) by dlang (guest, #313) [Link]
in many cases they are not where you expect them to be (the hyperthreaded cpu utilisation is one example, locking overhead with multiple processors is another)
frequently there are factors in play that you don't know about, and the result is that until you test it to a particular load, you have no way of knowing if the system will reach that load.
interpolation (guessing how things work between measured points) is fairly reliable
extrapolation (guessing how things will work beyond measured points) is only reliable until some new factor shows up.
A niggle about linearity
Posted May 22, 2011 22:29 UTC (Sun) by rodgerd (guest, #58896) [Link]
A: Because when I run the new query you put in from a simple script, it uses a whole CPU and takes a second. Your performance limit is $NUMCPU/second, at best.
Q: [goes away to redo query]
Scale Fail (part 1)
Posted May 14, 2011 13:01 UTC (Sat) by vachi (guest, #67512) [Link]
(disclaimer: I'm app dev. I always have an axe to grind about admins :-)
A few months back, our app abruptly slowed down, and sometimes seemed to hang from user point of view. App team talked to admins whether any strange sign in app server or DB server box. Admins defiantly declared that the servers are all green. CPU went over threshold a few times, but that was normal on peak day. A lot of free memory. It must be lousy app that was the problem.
After a lot of frustration and investigation, app team found out that one of the resources inside DB was configured way too low, and on that fateful day (4 years since setup) the resource was used up and things blow apart. Admin said they did not even know how to monitor that resouce consumption...
Latency and optimizing SQL queries
Posted May 25, 2011 4:52 UTC (Wed) by swmike (guest, #57335) [Link]
Now, have the database server further away from the client doing the queries, and you now have a huge problem. Spending a bit more time on the SQL query and do everything in a single query would solve this problem, but from looking at the network traffic in most client/server applications, this seems like a lost art.
In your dev testing, insert a box that induces 500ms latency between client and server and check that the application still runs fine. If it doesn't, redesign.