Scale Fail (part 1)

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

May 6, 2011

This article was contributed by Josh Berkus

Let me tell you a secret. I don't fix databases. I fix applications.

Companies hire me to "fix the database" because they think it's the source of their performance and downtime problems. This is very rarely the case. Failure to scale is almost always the result of poor management decisions — often a series of them. In fact, these anti-scaling decisions are so often repeated that they have become anti-patterns.

I did a little talk about these anti-patterns at the last MySQL Conference and Expo. Go watch it and then come on back. Now that you've seen the five-minute version (and hopefully laughed at it), you're ready for some less sarcastic detail which explains how to recognize these anti-patterns and how to avoid them.

Trendiness

"Now, why are you migrating databases? You haven't had a downtime in three months, and we have a plan for the next two years of growth. A migration will cause outages and chaos."

"Well ... our CTO is the only one at the weekly CTO's lunch who uses PostgreSQL. The other CTOs have been teasing him about it."

Does this sound like your CTO? It's a real conversation I had. It also describes more technical executives than I care to think about: more concerned with their personal image and career than they are with whether or not the site stays up or the company stays in business. If you start hearing any of the following words in your infrastructure meetings, you know you're in for some serious overtime: "hip", "hot", "cutting-edge", "latest tech", or "cool kids". References to magazine surveys or industry trends articles are also a bad sign.

Scaling an application is all about management of resources and administrative repeatability. This means using technology which your staff is extremely familiar with and which has been tested and proven to be reliable — and is designed to do the thing you want it to do. Hot new features are less important than consistent uptime without constant attention. More importantly, web technology usually makes big news while it's still brand new, which also means poorly documented, unstable, unable to integrate with other components, and full of bugs.

There's also another kind of trendiness to watch out for, it's the one which says, "If Google or Facebook does it, it must be the right choice." First, what's the right choice for them may not be the right choice for you, unless your applications and platform are very similar to theirs.

Second, not everything that Google and Facebook did with their infrastructures are things they would do again if they had to start over. Like everyone else, the top internet companies make bad decisions and get stuck with technology which is painful to use, but even more painful to migrate away from. So if you're going to copy something "the big boys" do, make sure you ask their staff what they think of that technology first.

No metrics

"Have we actually checked the network latency?"
"I'm sure the problem is HBase."
"Yes, but have we checked?"
"I told you, we don't need to check. The problem is always HBase."
"Humor me."
"Whatever. Hmmmmmm ... oh! I think something's wrong with the network ..."

Scaling an application is an arithmetic exercise. If one user consumes X amount of CPU time on the web server, how many web servers do you need to support 100,000 simultaneous users? If the database is growing at Y per day, and Z% of the data is "active" how long until the active data outgrows RAM?

Clearly, you cannot do any of this kind of estimation without at least approximate values for X, Y, and Z. If you're planning to scale, you should be instrumenting every piece of your application stack, from the storage hardware to the JavaScript. The thing you forget to monitor is the one which will most likely bring down your whole site. Most software these days has some way to monitor its performance, and software that doesn't is software you should probably avoid.

Despite this common-sense idea, a surprising number of our clients were doing nothing more sophisticated than Nagios alerts on their hardware. This means that when a response time problem or outage occurs, they had no way to diagnose what caused it, and usually ended up fixing the wrong component.

Worse, if you don't have the math for what resources your application is actually consuming, then you have no idea how many servers, and of what kind, you need in order to scale up your site. That means you will be massively overbuilding some components, while starving others, and spending twice as much money as you need to.

Given how many companies lack metrics, or ignore them, how do they make decisions? Well ...

Barn door decision making

"When I was at Amazon, we used a squid reverse proxy ..."

"Dan, you were an ad sales manager at Amazon."

In the absence of data, staff tend to troubleshoot problems according to their experience, which is usually wrong. Especially when an emergency occurs, there's a tendency to run to fix whatever broke last time. Of course, if they fixed the thing which broke last time, it's unlikely to be the cause of the current outage.

This sort of thinking gets worse when it comes time to plan for growth. I've seen plenty of IT staff purchase equipment, provision servers, configure hardware and software, and lay out networks according to what they did on their last project or even on their previous job. This means that the resources available for the current application are not at all matched to what that application needs, and either you over-provision dramatically or you go down.

Certainly you should learn from your experience. But you should learn appropriate lessons, like "don't depend on VPNs being constantly up". Don't misapply knowledge, like copying the caching strategy from a picture site to an online bank. Learning the wrong lesson is generally heralded by announcements in one or all of the following forms:

"when I was at name_of_previous_employer ..."
"when we encountered not_very_similar_problem before, we used random_software_or_technique ..."
"name_of_very_different_project is using random_software_or_technique, so that's what we should use."

(For non-native English speakers, "barn door" refers to the expression "closing the barn door after the horses have run away")

Now, it's time to actually get into application design.

Single-threaded programming

"So, if I monkey-patch a common class in Rails, when do the changes affect concurrently running processes?"

"Instantly! It's like magic."

The parallel processing frame of mind is a challenge for most developers. Here's a story I've seen a hundred times: a developer writes his code single-threaded, he tests it with a single user and single process on his own laptop, then he deploys it to 200 servers, and the site goes down.

Single-threading is the enemy of scalability. Any portion of your application which blocks concurrent execution of the same code at the same time is going to limit you to the throughput of a single core on a single machine. I'm not just talking here about application code which takes a mutex, although that can be bad too. I'm talking about designs which block the entire application around waiting on one exclusively locked component.

For example, a popular beginning developer mistake is to put every single asynchronous task in a single non-forwarded queue, limiting the pace of the whole application to the rate at which messages can be pulled off that queue. Other popular mistakes are the frequently updated single-row "status" table, explicit locking of common resources, and total ignorance of which actions in one's programming language, framework, or database require exclusive locks on pages in memory.

One application I'm currently working on has a distributed data-processing cloud of 240 servers. However, assignment of chunks of data to servers for processing is done by a single-process daemon running on a single dispatch server, rate limiting the whole cloud to 4000 jobs/minute and 75% idle.

An even worse example was a popular sports web site we worked on. The site would update sports statistics by holding an exclusive lock on transactional database tables while waiting for a remote data service over the internet to respond. The client couldn't understand why adding more application servers to their infrastructure made the timeouts worse instead of better.

Any time you design anything for your application which is supposed to scale, ask yourself "how would this work if 100 users were doing it simultaneously? 1000? 1,000,000?" And learn a functional language or map/reduce. They're good training for parallel thinking.

Coming in part 2

I'm sure you recognized at least one of the anti-patterns above in your own company, as most of the audience at the Ignite talk did. In part two of this article, I will cover component scaling, caching, and SPoFs, as well as the problem with The Cloud.

[ Note about the author: to support his habit of hacking on the PostgreSQL database, Josh Berkus is CEO of PostgreSQL Experts Inc., a database and applications consulting company which helps clients make their PostgreSQL applications more scalable, reliable, and secure. ]

Index entries for this article
GuestArticles	Berkus, Josh

(Log in to post comments)

Scale Fail (part 1)

Posted May 6, 2011 22:47 UTC (Fri) by glikely (subscriber, #39601) [Link]

Good article Josh, thanks. It's also fun that most of your observations apply to kernel hacking too. :-)

Single-threading not considered harmful

Posted May 7, 2011 2:10 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

When the author admonishes us for using "single-threading", we naturally suppose that we should use "multi-threading" instead, but because a "multi-threaded program" is commonly understood to be composed of many shared-memory lightweight processes, following this advice will in fact tempt us to create programs that become expensive to scale. In a sense, multi-threading, not single-threading, is the true enemy of scalability.

In any massively parallel system, it's the communication between processing nodes that ultimately limits the size and performance of the system. When we express concurrency using multiple threads, we naturally use use the memory shared by these threads as the communication medium. But because shared memory scales poorly, the cost of using ever-larger coherent-memory systems quickly overwhelms any possible benefit.

Having run into this wall, we transition to a communication medium that scales much better, although (or because) it offers fewer features and less coherency compared to shared memory; examples include databases, clustered filesystems, and specialized message queues. After this expensive and painful process, costs again increase linearly with capacity: processing nodes can be spread across multiple machines instead of having to share a single increasingly powerful machine. Because the communication medium is no longer shared memory, the possibility of multiple threads sharing a single process becomes irrelevant, and we see that the work we invested in using this kind of threading was wasted.

So to avoid these ends, let's avoid these beginnings: avoid multi-threading. Use single-threaded programs, which are easier to design, write, and debug than their shared-memory counterparts. Instead, use multiple processes to extract concurrency from the hardware. Choose a communication medium that works just as well on a single machine as it does between machines, and make sure the individual processes comprising the system are blind to the difference. This way, deployment becomes flexible and scaling becomes simpler. Because communication between processes by necessity has to be explicitly designed and specified, modularity almost happens by itself.

I believe the author had these points in mind when he wrote his article, but by denouncing "single-threading", he risks sending some readers down an unproductive path. Concurrency is the ultimate goal, and it's usually achieved best by a set of cooperating single-threaded programs. The word "thread" refers to a concept that resides at a level of abstraction not appropriate for this discussion, and its use can only muddle our thinking.

Single-threading not considered harmful

Posted May 7, 2011 8:10 UTC (Sat) by NAR (subscriber, #1313) [Link]

Welcome to Erlang.

Single-threading not considered harmful

Posted Jul 4, 2011 11:40 UTC (Mon) by csamuel (✭ supporter ✭, #2624) [Link]

Or MPI which lets you span nodes and is frequently used for highly parallel High Performance Computing (HPC) codes.

It also works well within a single system; I've seen a particular HPC crash simulation code which came in both SMP and MPI variants and even within a single multi core system the MPI version scaled better than the SMP version.

Single-threading not considered harmful

Posted May 7, 2011 16:58 UTC (Sat) by Aliasundercover (subscriber, #69009) [Link]

My complaint with threading is correctness more than performance. With all resources shared between all threads the scope of an error is the whole application. Use more explicit sharing and the really dangerous errors get confined to the smaller portions of the program actually dealing with the sharing.

Performance scaling? Sure, shared memory doesn't scale to the largest jobs. For that you need methods that cross machines. Those methods don't scale to the smallest latencies. For that you need shared memory.

Still, I agree. Threading is too much the method of fashion more because everyone is doing it than technical merit. Of course everyone doing it means you get to use libraries other people wrote and spend less time swimming up stream.

Single-threading not considered harmful

Posted May 12, 2011 0:34 UTC (Thu) by jberkus (guest, #55561) [Link]

Yes, sorry for the confusion of terminology. "Parallel programming" is what I want to encourage. I'm a little unclear on what the official term for non-parallel programming is (is there one?) so I used "single-threaded".

Single-threading not considered harmful

Posted May 12, 2011 1:57 UTC (Thu) by Trelane (subscriber, #56877) [Link]

serial programming.

Single-threading not considered harmful

Posted May 12, 2011 15:47 UTC (Thu) by kamil (subscriber, #3802) [Link]

No, it's sequential programming.

Single-threading not considered harmful

Posted May 12, 2011 15:52 UTC (Thu) by Trelane (subscriber, #56877) [Link]

maybe it's both?
http://software.intel.com/en-us/blogs//2008/07/02/serial-...

Single-threading not considered harmful

Posted May 19, 2011 9:11 UTC (Thu) by renox (guest, #23785) [Link]

I think that 'single-threading' is correct the issue is that when we think about multi-threading we only think about multi-threading within one process but multi-processing is also multi-threading (because each process contains at least one thread).

Maybe multi/single-tasking would be better terms?

Single-threading not considered harmful

Posted May 22, 2011 10:47 UTC (Sun) by kplcjl (guest, #75098) [Link]

Actually, I don't see multi-threading as anything but handling single processes multiple concurrent times. It doesn't matter if you are talking about 5 services doing a single different thing in each or if a single service is setting up multiple threads to do the same process several times at once.

The problem is recognizing when a dedicated single process is the best answer to your problem or when multiple threads on the same server would be best. Generally, when your problem is data driven, the multiple thread solution would be better and when your problem is cpu intensive, a single thread dedicated to solving the problem combined with a data storage mechanizm so multiple servers can attack several individual problems concurrently may be the way to go.

This article seems to cover the situation where one of those solutions fit and therefore it becomes the "best" solution for every situation. It doesn't seem to apply to dumb coding mistakes that make good design go bad. For instance, I saw one case where it should have used "-" instead of "+" in one place of a mathematical equation. It took me a week and a half to convince the manager there was a mistake in the equation.

Single-threading not considered harmful

Posted May 22, 2011 11:08 UTC (Sun) by kplcjl (guest, #75098) [Link]

I misspoke, when I said "multiple concurrent times". On a single core nothing works that way. I was thinking of the OS scheduling the core's use to make it appear it is doing concurrent processing to the human onlooker as well as multiple cores doing true concurrent work.

Single-threading not considered harmful

Posted May 20, 2011 19:48 UTC (Fri) by mikernet (guest, #75071) [Link]

You can achieve both multi-threading and multi-process/multi-machine scalability, and should for a large application. A single thread serving many concurrent users will be very latent. Spinning up a seperate process for each user is incredibly resource heavy, processes are much heavier than threads.

Your application should be multi-threaded so that all the processor cores and idle time on any one machine are fully utilized. It should also be able to scale across machines so you can increase total processing capacity easily.

Single-threading not considered harmful

Posted May 20, 2011 22:44 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I don't have any test data in front of me but I'm not sure your statement is entirely true. On Linux threads are not significantly lighter than processes because processes are extremely light weight. Both are scheduled the same in the kernel and with copy on write memory management there isn't that much difference for the in-kernel housekeeping between creating a thread or forking (cloning) a process.

On a modern multi-socket multi-core machine each socket is its own largely independent computer and the whole machine is a NUMA cluster. That means that each process is assigned to a particular node and that's where its memory lives, splitting memory between nodes or bouncing a process between different nodes reduces performance. My guess is that threading will scale weirdly when you get beyond what can be handled by all the cores in one socket whereas a multi-process model can keep more memory local to the socket the process is running on.

I would suggest starting with a multi-process model because you get better fault tolerance with memory protection then consider threading if that doesn't test out for concurrent performance

No Metrics

Posted May 7, 2011 3:09 UTC (Sat) by mlawren (guest, #10136) [Link]

Great article! Sadly everything you wrote rings very true. I've seen the No Metrics conversation go the other way as well:

"app: The network is broken."
"net: Hmmm... let me check. Nope, my measurements indicate everything is ok"
"app: Are you sure?"
"net: Yes, here is the data, see for yourself.
"app: But must be the network!!!"
"net: We haven't made any changes. Look at these historical graphs of usage. That spike from last week was the file-sharer who had that unfortunate accident in the carpark on Monday. Nothing difference since then."
"net: Have you checked your application?"
"app: Of course, it's running fine."
"net: How do you know?"
"app: I just know."
"net: When was the last time it ran fine?"
"app: Tuesday."
"net: What have you changed since then?"
"app: Only re-worked the dispatcher, and migrated the cache location to the other campus. But we checked, the code runs fine!"
"net: wtf??!?"
"net: Say, you drive that green Ford Focus don't you?"
"app: Yeah, why?."
"net: No reason. Your problem will be solved by tomorrow."

No Metrics

Posted May 7, 2011 19:20 UTC (Sat) by mstefani (guest, #31644) [Link]

Hah! "Blame the network" is industry best practice...

No Metrics

Posted May 9, 2011 4:36 UTC (Mon) by gdt (subscriber, #6284) [Link]

Whilst I've seen far too many database administrators blame "the network", it is also true that a lot of the points made by this article apply as equally to networks as to databases.

Particularly the lack of instrumentation, especially of problematic middleboxes such as application (de)accelerators and firewalls. Even basic monitoring is poor, link with application-performance-killing high errors rates often creeping under the radar of monitoring tools like Nagios.

It's rare to see routing designed with good choices and configured correctly. There's a simple tell-tale test: type in an unassigned IP address in the corporate network, does it error immediately or time out?

The poor state of corporate networks isn't helped by networking equipment vendors, who often ship equipment with near-essential settings off for "backward compatibility".

Finally, many sysadmins and applications are their own worst enemy. Using IP addresses rather than DNS names (they're going to regret that, come IPv6). Disabling ethernet autonegotiation. Assuming link layer connectivity for high-availability schemes. Refusing to deal with authentication and authorisation issues within the application, but pushing that into VLANs and VPNs, thus turning the corporate network into a flat layer two network, with resulting poor behaviour under fault conditions.

No Metrics

Posted May 9, 2011 8:39 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Yes. In my opinion, the lack of good IP-layer security is stunning. And treating local wired networks as 'trusted' is even more stunning.

I have seen many times internal services using HTTP with plain-text auth on the local networks - because administrators think it's 'secure'. Hell, probably everyone here is guilty of that.

Fortunately, situation is changing. With IPv6 it's possible to do end-to-end IPSec (which is not possible due to #(*$&(@$& NATs right now) and with DNSSEC it's possible to reliably store host certs in RDNS.

No Metrics

Posted May 9, 2011 12:43 UTC (Mon) by paulj (subscriber, #341) [Link]

You're assuming any coming IPv6 world will have near-universal end-to-end connectivity. Sadly, that's unlikely to be true.

No Metrics

Posted May 9, 2011 16:13 UTC (Mon) by raven667 (subscriber, #5198) [Link]

This may be off-topic but why do you say that? While it's true that we will still have just as many firewalls as before it should be easier to enable end-to-end connectivity and such connectivity is the default and harder to work around than not doing so.

Universal end-to-end nightmare

Posted May 19, 2011 18:39 UTC (Thu) by oelewapperke (guest, #74309) [Link]

This sadly is a good reason *not* to implement ipv6. Security is a problem, and NAT means universal firewalling.

Given how many security problems we have, and how quickly they get fixed ... this is sadly a good thing.

No Metrics

Posted May 9, 2011 18:03 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Probably not. With IPv4 it's a struggle to keep things globally adressable. Hey, I've just got my IPv4 PI allocation and happily renumbered my networks - now I can reach every host from every host. It's a heaven. But getting it was certainly not trivial.

With IPv6 it's exactly backwards - it's a struggle NOT to make your computers globally addressable.

No Metrics

Posted May 9, 2011 21:48 UTC (Mon) by paulj (subscriber, #341) [Link]

Globally addressable != global end-end connectivity, sadly. Lots (most, even) of people will still have restrictive firewalls between their network and the internet, and some people even want NAT for IPv6..

No Metrics

Posted May 10, 2011 8:18 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Firewalls can be disabled or reconfigured. NAT can't be disabled (there's not enough global IPv4 addresses) even in principle.

The changes will take years, so there'll be plenty of time for security to evolve. But we now have foundation for it.

No Metrics

Posted May 10, 2011 9:55 UTC (Tue) by paulj (subscriber, #341) [Link]

Many users will lack either the administrative or technical capability (or both) to reconfigure firewalls, sadly.

No Metrics

Posted May 10, 2011 11:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

That's the beauty of IPv6 it's open by default.

And corporate networks will benefit from end-to-end security most, so I expect that they'll migrate to IPsec even before home users.

No Metrics

Posted May 10, 2011 17:35 UTC (Tue) by dlang (guest, #313) [Link]

corporate users will suffer from open end-to-end connectivity most, including tunnels that make it impossible to see what's happening.

No Metrics

Posted May 10, 2011 17:41 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Not really. It's possible to sniff encrypted tunnels if you know the private key of one host (and enterprise admins probably would know it), it's even supported in Wireshark.

Besides, it's not like I can't make an HTTPS tunnel which can pierce all but the most paranoid firewalls right now. Skype does this, for example.

No Metrics

Posted May 10, 2011 21:37 UTC (Tue) by Tobu (subscriber, #24111) [Link]

Nitpick: that depends on the key exchange. Sniffing after a Diffie Helman requires the cooperation of one of the parties, and I don't think wireshark has support for this at the moment.

No Metrics

Posted May 10, 2011 21:45 UTC (Tue) by raven667 (subscriber, #5198) [Link]

*sigh* that is one thing that is probably true, some network operators will break their networks in the name of security making life difficult for the people who use them and that won't really protect anything because so much traffic is tunneled over 80/443 which is almost universally allowed.

No Metrics

Posted May 19, 2011 18:40 UTC (Thu) by oelewapperke (guest, #74309) [Link]

That's because it requires the "end" party to take the initiative. That's the beauty of NAT. Put an ancient totally unsupported bug-riddled system that every grandmother knows how to exploit remotely behind a nat firewall ...

And it's perfectly secure.

No Metrics

Posted May 23, 2011 4:24 UTC (Mon) by RobertBrockway (guest, #48927) [Link]

No it isn't. If that was true then most successful attacks today wouldn't even occur. For some time now the bulk of attacks have occurred over connections initiated by the end user system.

No Metrics

Posted May 11, 2011 18:36 UTC (Wed) by Baylink (guest, #755) [Link]

> That's the beauty of IPv6 it's open by default.

The problem with utopias is that it only takes *one* Bad Guy to fuck things up for the rest of us.

"That's not a feature, that's a bug."

No Metrics

Posted May 12, 2011 13:42 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

And closing down everything like we already do now does not help at all.

No Metrics

Posted May 19, 2011 18:46 UTC (Thu) by oelewapperke (guest, #74309) [Link]

Actually, given the amount of security holes ... and how secure a company is by default from remote (ie. they only get infected by surfing porn sites or opening suspicious mail) ...

It kinda does solve a lot of problems.

I mean, I hate nat just like the next guy. But you won't get anywhere by declaring it doesn't solve anything. You'll be just like gaia idiots screaming before the capitol to get America off oil, not realizing they're basically asking America to cut it's economy by 95% or more. Not going to happen (and it's a *good* thing we don't honor such requests)

NAT is a beautifully simple solution. And it is possible to modify just about any protocol to work with nat. I fear nat and ipv4 may be here to stay.

Certainly converting RIPE, APNIC and AFRINIC over to ARIN rules would give us another 10 years easily. Saying "an IP will cost you $0.01 per year" will get us another 100 years.

No Metrics

Posted May 19, 2011 19:11 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

A stateful firewall which simply blocks all incoming connections (i.e. a NAT setup minus the actual address and/or port translation) gets you all the security benefits of NAT without most of the hassle. As a bonus, if you want to run the same services on two or more servers they can each use their own addresses rather than competing for the standard port numbers.

Anyway, most home routers aren't much more secure with NAT, since they allow ports to be forwarded via UPnP requests. If you're running a server and opening forwarding ports with UPnP you might as well permit direct access; if not, blocking the connection at the server (because the port is closed) is just as effective as blocking it at the firewall. An effective firewall must be configured by the network administrator to accept or reject specific traffic, not simply permit incoming connections to any local server that asks politely while blocking the ones which would have been rejected anyway.

Blame the network

Posted May 9, 2011 5:35 UTC (Mon) by ringerc (subscriber, #3071) [Link]

"accounting-package-x doesn't perform terribly using a shared pseudo-ISAM file over a SMB network, it's great! If you're having problems, there's something wrong with your network..."

Suuuure, MYOB.

Scale Fail (part 1)

Posted May 7, 2011 3:27 UTC (Sat) by samth (guest, #1290) [Link]

Along similar lines, I saw the following excellent paper presented at SPLASH 2010, about a tool developed at IBM Research for diagnosing scaling problems.

Performance Analysis of Idle Programs
Erik Altman, Matthew Arnold, Stephen Fink, Nick Mitchell
https://researcher.ibm.com/researcher/view_project.php?id...

Scale Fail (part 1)

Posted May 9, 2011 7:56 UTC (Mon) by sgros (guest, #36440) [Link]

Interesting. Now I have two questions regarding this work:

1. Did they patented it? Can they patent it if they didn't already do so?

2. Is there some open source code that does something similar or that re-implements this functionality?

Scale Fail (part 1)

Posted May 15, 2011 11:51 UTC (Sun) by ramon (guest, #21845) [Link]

That might be a very interesting paper to read, but given that it is hidden behind a stupid paywall I don't think many people in the industry will notice it....

Scale Fail (part 1)

Posted May 18, 2011 7:46 UTC (Wed) by incase (guest, #37115) [Link]

Hidden behind a paywall?
I had no trouble finding this link: https://researcher.ibm.com/researcher/files/us-sjfink/res...

josh's clients aren't stupid, they're smart

Posted May 7, 2011 6:16 UTC (Sat) by b7j0c (subscriber, #27559) [Link]

(over)planning, (over)measuring, (over)architecting....to the "scale" consultant these are no brainers, but they have a huge cost, are hard to do right, and keep developers away from building features. a very-small (y-combinator style) startup, is much better served building out features first, and throw money and duct-tape at the problem until they absolutely cannot squeeze more performance out of their initial stack.

most small ideas will fail. most entrepreneurs know this. its stupid to try to build a service for a hundred million users when you can't attract ten thousand, and since small development groups are just learning about their domain, trying to scale out early will probably just result in throw-away code. don't worry about problems that aren't problems.

i think josh's clients are doing the right thing. first get something built that people want to use. push your stack as far as you can. if in the end you're succesful enough to afford the time to do a rearchitect yourself or pay someone else to do it, thats called success and its a good thing.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 9:11 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

The original boss of my current company used to say "Nice problem to have" about such things. But there are definitely places where this attitude caused us to make needlessly bad decisions.

You see, there are actually three ways things can go from a startup. It can crash and burn (doesn't matter what you planned, every dollar spent making it scale is now worthless), it can explode with success (everything that doesn't scale means incredible pain and expense) but it can also trundle along steadily growing by increments.

The latter is boring, not many books about it, no amazing slide show presentations with violent asymptotic charts. But it's probably reality for way more engineers. In this scenario there isn't enough money to solve problems by throwing money at them, you will run out.

And it's for this reason that you ought to do a little bit of thinking about the things Josh mentions on a new project. You don't have to solve every problem now, that leads to paralysis, but you need some awareness of which bits of the system will need work in six weeks, or in six months with the kind of steady growth that means there's no risk of unemployment but you also won't all be rich.

Smart clients would be coming to Josh because they actually do have a specific scalability problem in PostgreSQL, and they expect him to fix it so that they can avoid spending a lot of money on faster servers as a workaround. Paying a PostgreSQL consultant to tell you that you'd got a switch port running at 10 Half duplex is not smart.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 14:47 UTC (Sat) by b7j0c (subscriber, #27559) [Link]

josh should be so lucky as to have clients with problems as obvious as a misconfigured port switch, he shouldn't balk at the easy money. chances are most startups now are hosting on rackspace or ec2 or joyent and none of them ever even sees their servers or networking gear.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 12:53 UTC (Sat) by aggelos (guest, #41752) [Link]

While it is possible for someone to take such calculated risks, that is almost always /not/ the case. In order to do the calculation, you need to have at least some clue about the nature of technical problems than plague you.

Josh's examples make it quite clear that, those clients at least, use duct tape just because that's the limit of their abilities. I can't count the times I've heard "no time to do it right now, we'll deal with it when we make it" as a euphemism for "we're grossly incompetent and in any case I'd like to change the subject now". It is especially telling when a forced cut-all-corners implementation could be fixed in a couple of days, yet it never seems to make it into the TODO list until after it's caused downtime and/or major customer issues for a couple of months. Spending money based on What Idiotic Idea of the Week magazine suggested and ignoring blatantly obvious needs like reliable and verifiable backups also speaks volumes about whether such startups are actually making rational choices.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 14:27 UTC (Sat) by b7j0c (subscriber, #27559) [Link]

but if these people are truly incompetent or limited in their abilities as you say, then they're even smarter for recognizing the limits in their skills and bringing in someone smart like josh when they hit the wall. no one knows everything, but many of them will learn. calling for help is not stupid, and everyone misses obvious gotchas, including everyone here.

everyone wins here. josh gets paid and spreads some wisdom. the startup gets problems solved and learns something along the way.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 15:33 UTC (Sat) by aggelos (guest, #41752) [Link]

When you "hit the wall" it is not because you made a mistake here and there. It is because you've been piling workaround on top of workaround and have trained your tech staff to never, ever, actually investigate a problem, but fiddle with knobs and kick it until it "works". While still having no idea why it works. Calling for help when all that you know to do fails is not smart, at least not any smarter than shouting for help when you're drowning.

Unfortunately, even a thing as simple as building a scalable news site or web shop is not something so well understood that you can just set up web framework X and let it run, treating it like black box, tweaking it's parameters until it no longer fails and randomly buying more expensive hardware until the problems become less evident.

Management that a) does not know how to hire competent technical people b) actively avoids hiring competent (and probably more expensive) technical people c) consistently rewards people for finding a hacky workaround after an all-nighter instead of investigating and potentially fixing the actual problem, just deserves to fail. Bringing in 3rd-party consultants is not going to somehow fix their *actual* problems (which, as Josh notes, are managerial in nature). They will just keep "handling" issues the exact same way, spend more frustrating hours, lose even more viewers/customers and, of course, money. What's worse, they'll keep stressing their employees to do the impossible when things fail. If you make sure you have know idea how (let alone why) your system works, you can not fix any but the most trivial issues and definitely cannot outperform a competitor that has a clue.

Now, obviously, sucessful (not necessarily smart) business decisions might keep you going despite that, but that in no way vindicates your recklessness.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 17:40 UTC (Sat) by b7j0c (subscriber, #27559) [Link]

> Management that a) does not know how to hire competent technical people b) actively avoids hiring competent (and probably more expensive) technical people c) consistently rewards people for finding a hacky workaround after an all-nighter instead of investigating and potentially fixing the actual problem, just deserves to fail

you've just described 95% of the www, including the inauspicious starts of many of the top ten leaders in traffic.

having built part of a top-ten site (yahoo, and my contribution, yahoo news is still number one in its category, worldwide), i'm going to arrogantly state that most people here have a wholly unrealistic view of the forethought, planning, architecture and staffing of a rapidly growing service. i am 100% confident that if you poll individuals who built very high traffic sites from an early stage, you will find a group of individuals who are utterly honest about their inability to do it right the first time, or even know how to

josh's clients aren't stupid, they're smart

Posted May 7, 2011 22:38 UTC (Sat) by aggelos (guest, #41752) [Link]

It might be so, I'm not even interested on arguing on that since it's beside the point of this discussion. FWIW, you seem to imply something that is completely unbased, i.e. that ignorance and unwillingness to hire and take advantage of people with technical know-how is somehow related to success. Don't think I should bother to refute that.

Our discussion was on whether management in these companies is making an informed trade-off or if they feel safer stabbing in the dark, rather than going for the light switch. I don't think you've brought anything new to the table, other than your degree of confidence in your convictions which, I'm sure you'll agree, is not something one can argue against :-)

Besides, I'm pretty sure noone contested that it takes a lot of effort to build a highly scalable site, even if you do know what you're doing. Still not relevant to the discussion in this subthread though.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 23:03 UTC (Sat) by cmccabe (guest, #60281) [Link]

You bring up a good point. As engineers, we naturally tend to overestimate the importance of engineering and underestimate the value of things like marketing, good management, and intelligent business deals.

However, there are more sites out there that failed because of bad engineering than you think. For example, Friendster could have been what Facebook is today if they had been able to scale the site properly.

Over-planning and over-architecting are often problems, but I think most organizations tend to under-measure rather than over-measure. If you don't know what "the problem" is, it's lot harder to throw money at it to make it go away. And if you can't tell the good engineers and the good consultants from the bad, then you really are doomed, no matter how much you have in the bank.

P.S.
It is funny to hear Josh complain about stupid clients. After all, if they were smart, he wouldn't have a job.

josh's clients aren't stupid, they're smart

Posted May 7, 2011 23:51 UTC (Sat) by wahern (subscriber, #37304) [Link]

Technical merit, good management, and intelligent business deals aren't good predictors of success. If they were, and if those things were accurately and reliably quantifiable (which they must be to have any substance to them) then predicting success would be incredibly more common than it is.

The secret ingredient to success is pure luck. It's hard to swallow. If you disagree, then show me your billion dollar stock portfolio. The accuracy and reliability would only need to be moderately significant in order to extract obscene amounts of wealth from the market.

People who seem to be relatively good predictors of success, such as Warren Buffett, are incapable of articulating their thinking process. If they could articulate it, then there'd be many more copy cats. This suggests the possibility that the emergence of people like Warren Buffett is also a chance occurrence, and that whatever "it factor" they possess is entirely anomalous.

Hard work and fair play don't lead to success, they lead to marginal improvements in wealth and increasingly tolerable living conditions over long spans of time. That they lead to individual success is a noble lie we tell ourselves in order for society to extract those meager gains.

Now, hard work and fair play may lead to personal fulfillment. Maybe that's more important than any kind of economic wealth.

josh's clients aren't stupid, they're smart

Posted May 8, 2011 0:26 UTC (Sun) by cmccabe (guest, #60281) [Link]

> Technical merit, good management, and intelligent business deals aren't
> good predictors of success. If they were, and if those things were
> accurately and reliably quantifiable (which they must be to have any
> substance to them) then predicting success would be incredibly more common

First of all, there is a middle ground between things being completely random, and completely predetermined by known factors. Second of all, a lot of the most important predictors of success are things that we haven't learned to quantify yet.

Warren Buffet has articulated his stock market strategy many times. Do your research; only invest in things that you understand; invest for the long term.

> Now, hard work and fair play may lead to personal fulfillment. Maybe
> that's more important than any kind of economic wealth.

Is personal fulfillment "accurately and reliably quantifiable"? If not, by your own argument, it has no "substance" to it. Or maybe, just maybe, science hasn't learned to quantify a lot of the most important things in life.

josh's clients aren't stupid, they're smart

Posted May 8, 2011 1:13 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

We have psychometric tests for this stuff. So yes, we can tell whether people believe they are fulfilled, and for this purpose that's the same as knowing whether they are in fact fulfilled.

And yes, luck is a massive factor. Warren's strategy sounds good, but it's the same strategy lots of people have, without getting his success. We might as well listen to the anomalous 110 year old who tells us he puts it down to a long walk every afternoon. No doubt walking doesn't hurt, but it's not why he's 110, that's just blind luck.

Survivorship bias is a huge problem. If 500 people all pick one of twenty strategies at random, and all but one of the 500 fails, we would be wrong to assume that therefore the strategy chosen by that one person works and the other nineteen do not. But that's exactly what survivorship bias causes us to assume.

josh's clients aren't stupid, they're smart

Posted May 9, 2011 7:55 UTC (Mon) by cmccabe (guest, #60281) [Link]

Just for fun, here's a puzzle.

Are these digits random or not? And if not, what is the pattern?
69804177583220909702029165734725158290463091035903784297757265172087724

josh's clients aren't stupid, they're smart

Posted May 9, 2011 16:02 UTC (Mon) by dskoll (subscriber, #1630) [Link]

Of course those digits are random. So are these digits:

1111111111111111111111111111111111111111111111111111111111111111

(In other words: Your question is meaningless.)

josh's clients aren't stupid, they're smart

Posted May 10, 2011 6:17 UTC (Tue) by cmccabe (guest, #60281) [Link]

I'm just asking if there is an interesting pattern in the digits or not. That question isn't meaningless.

josh's clients aren't stupid, they're smart

Posted May 10, 2011 10:45 UTC (Tue) by dskoll (subscriber, #1630) [Link]

Of course it's meaningless. There's no such thing as a set of digits that are "random" or "non-random". You can take a sequence generator and run some statistical tests, but that doesn't prove anything. You can test a sequence of digits for compressibility, but that also doesn't prove anything. The digits in the decimal expansion of pi pass all kinds of statistical tests for randomness, but they are assuredly not "random".

As for a pattern, given a finite sequence of digits, you can construct any pattern you like. I could construct a degree-71 polynomial that fits the 71 digits you posted and say "Yes, that's the generator!"

josh's clients aren't stupid, they're smart

Posted May 10, 2011 14:20 UTC (Tue) by bronson (subscriber, #4806) [Link]

Wow. Did you miss the "just for fun" in cmccabe's original post? You must be a riot at parties.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 1:33 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Well, at the parties I go to, people don't usually open the conversation with "69804177583220909702029165734725158290463091035903784297757265172087724".

josh's clients aren't stupid, they're smart

Posted May 12, 2011 4:43 UTC (Thu) by bronson (subscriber, #4806) [Link]

That doesn't explain why why you feel a need to shout someone down here. Must everybody conform to your narrow idea of fun?

josh's clients aren't stupid, they're smart

Posted May 12, 2011 16:12 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Chill out.

josh's clients aren't stupid, they're smart

Posted May 10, 2011 19:05 UTC (Tue) by cmccabe (guest, #60281) [Link]

> I could construct a degree-71 polynomial that fits the 71 digits you
> posted and say "Yes, that's the generator!"

That would be a pattern, but not an interesting one.

josh's clients aren't stupid, they're smart

Posted May 10, 2011 19:07 UTC (Tue) by cmccabe (guest, #60281) [Link]

Sorry, what I meant to say was, that polynomial would not represent a pattern in the sequence itself.

A non-interesting pattern would be something like the observation "this is a number!" or the observation that all digits from 0 to 9 occur.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 11:39 UTC (Thu) by etienne (guest, #25256) [Link]

A more interesting pattern in PI would be this one:
http://kasmana.people.cofc.edu/MATHFICT/mf55-spoiler.html

josh's clients aren't stupid, they're smart

Posted May 19, 2011 11:03 UTC (Thu) by yeti-dn (guest, #46560) [Link]

Please go and learn about Kolmogorov complexity. The question is meaningful but your post, I am afraid, is not.

Just answering so I don't miss the answer

Posted May 10, 2011 21:25 UTC (Tue) by man_ls (guest, #15091) [Link]

It looks pretty random, but there are too few repeated consecutive digits: only three 77 and one 22. I give up, what is the pattern?

Just answering so I don't miss the answer

Posted May 10, 2011 23:05 UTC (Tue) by neilbrown (subscriber, #359) [Link]

71 digits, which is prime so grouping them is unlikely to help.

If you consider just the first 4 digits (0,1,2,3), then even digits occur 10 times each, odd digits 5 time each. However this pattern does not continue (4 and 8 also occur 5 times, but are even).

The longest gap between repeats is 24 between 2 '8's.
'6' and '4' see gaps of 23.

'6' is the least frequent digit, 7 is the most frequent (3 times as frequent)

Taken as a decimal integer the factors less than 100 are
2,2,3,3,23,23

Yet the whole number is not a perfect square.

My conclusion is that this is probably the "least uninteresting number" - very interesting....

josh's clients aren't stupid, they're smart

Posted May 11, 2011 6:00 UTC (Wed) by cmccabe (guest, #60281) [Link]

It's digits 99768 through 99838 from the base 10 decimal expansion of pi.

Every digit is completely predetermined, but it's pretty hard to spot if you don't know what you're looking for!

josh's clients aren't stupid, they're smart

Posted May 11, 2011 8:07 UTC (Wed) by ballombe (subscriber, #9523) [Link]

The only thing that is remarkable is that Google did not find it.

josh's clients aren't stupid, they're smart

Posted May 11, 2011 16:56 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

So in other words the answer was "No" ?

We suspect (as far as I remember no-one has proved) that all possible sequences of digits occur in Pi. Assuming this is so, the fact that a particular sequence occurs in Pi is not interesting at all.

josh's clients aren't stupid, they're smart

Posted May 11, 2011 20:08 UTC (Wed) by cmccabe (guest, #60281) [Link]

Even if all possible sequences of digits occur in Pi, the fact that a particular sequence occurs at a particular position can be "interesting". If nothing else, it would make for a pretty strange compression algorithm (although probably not a practical one for most input data, given how large the indices are likely to be.)

josh's clients aren't stupid, they're smart

Posted May 12, 2011 8:27 UTC (Thu) by ekj (guest, #1524) [Link]

On the average, the index of where in pi a certain sequence occurs, is as long as the sequence itself, thus the average savings of this compression-scheme is zero.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 16:56 UTC (Thu) by fuhchee (guest, #40059) [Link]

I'm still missing the "fun" part.

What Do These Digits Mean?

Posted May 19, 2011 12:35 UTC (Thu) by ldo (guest, #40946) [Link]

#!/usr/bin/python

import sys

charset = ' Wadefhlnost'
modulo = 13
s = 13682311570832829480888979137834570837851469148689544502986
num = iter(range(2, 9999))
while s != 1 :
        n = num.next()
        if s % n == 0 :
                sys.stdout.write("%s" % charset[(n - 1) % modulo])
                s /= n
        #end if
#end while
sys.stdout.write("\n")

What Do These Digits Mean?

Posted May 26, 2011 12:20 UTC (Thu) by net_benji (subscriber, #75195) [Link]

All 23 prime factors of s are of order 1, I think that's the gist of it.
http://www.wolframalpha.com/input/?i=factor+1368231157083...

Here are more working combinations:
charset = "hetfn adlsoW"
s = 7062883793966047784250125868403644804557392731274644684033611
charset = "tslofWe dnha"
s = 1029201132023087388381452825147240668180384494509643451351
...
and here's a different one, just for the fun of it:
charset = "wfyonirvmD s?ecutdpah"
modulo = 23
s = 79 * 211 * 241 * 463 * 487 * 499 * 563 * 571 * 673 * 787 * 911 * 977 * 991 * 1039 * 1249 * 1483 * 1489 * 1493 * 1601 * 1621 * 1697 * 1699 * 1889 * 2243 * 2311 * 2347 * 2459 * 2521 * 2719 * 2909 * 2953 * 3119 * 3271 * 3323 * 3359 * 3533 * 3733 * 3947 * 3967 * 4057 * 4177 * 4283 * 4289 * 4597 * 4651 * 4733 * 4933 * 4969 * 5021 * 5087 * 5261 * 5281 * 5347 * 5399 * 5449 * 5557 * 5641 * 5711 * 5807 * 5869 * 5981 * 6353 * 6359 * 6389 * 6569 * 6701 * 6791 * 6823 * 6983 * 7187 * 7309 * 7321 * 7481 * 7529 * 7673 * 7873 * 7901 * 8017

The source for the generator is there: https://github.com/benthaman/lwn-digits/blob/master/encod...

Hope Jon won't mind posting this nonsense here ;)

josh's clients aren't stupid, they're smart

Posted May 8, 2011 1:37 UTC (Sun) by wahern (subscriber, #37304) [Link]

> Warren Buffet has articulated his stock market strategy many times.
> Do your research; only invest in things that you understand;
> invest for the long term.

If you do those things in his books at best you'll approximate the market and not lose your shirt. And the model that best reproduces historical market behavior is a random walk. Overall market efficiency is an emergent effect from random processes, not the result of smart people being exceptional.

Look at the long line of successors Buffett has tried to groom. None could ever match his acumen by exercising his supposed methodology. Some have come close but inevitably make one or two wrong bets and their long-term gains fall far short of Buffett's. It's like gamblers who would have walked away with the bank but "bad luck" ruined their "streak". They disown the bad luck but claim the winning streak as the product of some personal attribute.

That's not to say that there aren't predictable ways to make money in the market, but they involve placing yourself in certain structural niches within the market. How you rise to those positions also seems to have little to do with intelligence or acumen by comparison with your peers. Where or how those structural market niches arise are unpredictable. Facebook didn't do anything different except chance upon the emergence of a niche at the right time.

The problem with defining success is that people look at the big winners and obvious failures and compare and contrast. If you see an impoverished alcoholic bum on the street and compare him to Buffett, you might conclude that success entails not letting alcohol control your life.

But once you have all of those DON'T DOs (i.e. how not to fail) you need to find the DO DOs. But even after you've enumerated all the DOs, you can find thousands or even millions of people who DO all the same things yet never succeed. There's a difference between not being a failure, and being successful, unless swimming instead of sinking is worthy of special praise. Success as flowing from individualized merit, therefore, is an illusion. In other words, just because you're successful doesn't mean you've done anything different than millions of other people. You're not special; you're just lucky by comparison. So there's little to be gained by analyzing and investigating what's "unique" about very successful people or companies. Anything you think is unique is either anomalous or not reproducible, and so liable to lead you astray from focusing on not failing. Focusing on not failing typically means making your product as good as necessary under the circumstances, not as good as possible.

You can certainly write a book about how to live a middle-class life, or how not to run a company into the ground. But no book will ever be able to describe how to become a millionaire, how to find the next great concept, or how to ensure that great idea will come to spectacular fruition by proper technical execution. Just as was stated earlier, many--perhaps most--of "successful" websites are ugly as hell behind the curtains. They were doing just enough to stay afloat, just like everybody else (on average), and serendipitously found themselves in charmed waters.

josh's clients aren't stupid, they're smart

Posted May 8, 2011 8:12 UTC (Sun) by cmccabe (guest, #60281) [Link]

If the stock market were completely random, it would be completely useless. The whole reason for having a stock market is to allocate money to companies that will use that money in productive ways. We've already tried replacing markets with central planning committees-- it was called communism. It didn't work.

Obviously, markets have a lot of problems and inefficiencies. A lot of mutual fund managers are overpaid for the small amount of research that they do. Sometimes the market over-allocates resources to one sector, like housing. But just because the market has inefficiencies, or Warren Buffet is having trouble finding a successor, doesn't somehow indicate that the market is completely random.

Despite your professed love for quantifiable data, you haven't really presented that much hard evidence for me to argue against. What I would like to argue against is your fatalistic attidue. You seem to view yourself as a helpless pawn of all-powerful external forces.

Ironically, optimism-- or at any rate, the will to keep on going-- is one of the best predictors I know of success-- and not just in business. If you look at any open source leader, any entrepreneur, successful politician-- whatever-- you will find that one thing they have in common is that they believe in themselves and their abilities.

One book you might be interested in is "The Drunkard's Walk" by Leonard Mlodinow, a physicist at Caltech. In it, he puts forth some pretty convincing arguments that life is more random than we think. Mlodinow has a statistician's focus on hard data and it's pretty interesting. The performance of mutual fund managers, the outcome of the world series, and the decisions of Hollywood producers are all revealed as being more random than we might think. He also takes a swing a Bill Gates, claiming that he was just in the right place in the right time. Personally, I disagree-- I think Gates would have been at least moderately successful in any place and time. Anyway, you might enjoy the book.

josh's clients aren't stupid, they're smart

Posted May 8, 2011 22:37 UTC (Sun) by ibukanov (subscriber, #3942) [Link]

> Ironically, optimism-- or at any rate, the will to keep on going-- is one of the best predictors I know of success-- and not just in business.

This is not what I have observed. The most successful in business people I know personally are pessimists. In one particular case the pessimism made the guy to prepare for the worst. That allowed for him to survive bad times and gain from the good times disproportionally.

Another example is a very optimistic guy who quieted his job as an oil engineer and went to Thailand to start a shrimp farm. It was rather successful initially. But one day a drunken worker from the factory got into a conflict with local mobs and they burn the farm down... Now the guy is back into his engineering position.

josh's clients aren't stupid, they're smart

Posted May 9, 2011 16:08 UTC (Mon) by raven667 (subscriber, #5198) [Link]

Are you sure you aren't arguing against your point? In the examples you gave you have one person who is successful because they had the will to keep going and were prepared, which you call pessimism and one where the person was not successful when they quit after the first setback which you call optimistic.

Success is just like cancer, you can get (un)lucky but if you keep hitting those carcinogens you improve your chances of catching it, although not guaranteed either way.

josh's clients aren't stupid, they're smart

Posted May 9, 2011 21:19 UTC (Mon) by ibukanov (subscriber, #3942) [Link]

> successful because they had the will to keep going

In the first case it was not the will but rather the cash that the company saved during the good times under assumption that those would not last. With that it was not problematic to carry on during the bad times. And when competitors that hired too many engineers as consultants for the oil industry went burst it was trivial to pick up their clients. (In Norway it is rather hard to fire a person on a permanent position so a bankruptcy often is the simplest solution for a small company.)

> the person was not successful when they quit after the first setback

Only a very optimistic person would invest a few years of savings into something he had only vague idea about. And that fire on the shrimp farm was not a setback, it was a total disaster. As far as I know the farm was not insured against that type of damaged and the guy simply had no money to try one more time.

Note that I am not arguing that pessimism is required for a business success. I just want to say that even if optimism could be a necessary trait for some kinds of entrepreneurship, it may lead to failure just as well as to success.

josh's clients aren't stupid, they're smart

Posted May 14, 2011 14:31 UTC (Sat) by jjs (guest, #10315) [Link]

I highly recommend John Bogle's books, including "Enough" and "Don't Count on It". He's been a professional investor since the 1950's, founded Vanguard, and those books I've read include graphs & numbers illustrating his points. Main point - trying to outguess the market is a loser's game (at best over the long term you match the market, but when disintermediation is taken into account, you lose money). Over the long term, most money managers make lots of money on transaction fees, but their customers lose.

Another good book is "The Black Swan" by Nassim Nicholas Taleb.

josh's clients aren't stupid, they're smart

Posted May 14, 2011 14:40 UTC (Sat) by jjs (guest, #10315) [Link]

One other key point - the separation of the stock market into the economic and emotional sides - and overall economy grows with the economic part, bubbles and crashes are mainly caused by the emotional investment overinvesting/underinvesting as people chase this week's numbers, not the long term health.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 9:30 UTC (Thu) by ekj (guest, #1524) [Link]

It depends on how you define "success".

If you mean, do a lot better than most people who peruse the same activity, then it's almost tautological that there's no recipee for it. If there was a simple and unambigious way of outperforming the general market as an investor, for example, then all investors would be using technique, thus negatings it's effectiveness. It's matemathically obvious that investors as a group, will not be able to beat the market. Thus one person can only beat the market to PRECISELY the same degree as another underperforms the market.

But there's other ways of defining success.

There's things that on the average *do* lead to goals that people desire. It's just that it's hard to do them. If you can manage to do them though, then achieving those goals become likely. I'd call that success.

Saving 10% of your income in a low-cost index-fund will, with high probability, give you a very solid financial position in a decade or two. (but not make you a billionaire)

You're just saying, it seems to me, there's no simple, predictable, repeatable way of doing the exceptional. That's almost tautological, because if there where, it'd no longer -be- exceptional.

josh's clients aren't stupid, they're smart

Posted May 14, 2011 1:41 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link]

Warren Buffet has articulated his stock market strategy many times. Do your research; only invest in things that you understand; invest for the long term.

That isn't an investment strategy; it's a short list of how to avoid common investment mistakes. Yes, you need to put your money into businesses you know and understand, but the sticking point is understanding business and knowing what to research when you're doing your homework. It's the knowledge necessary to do those things that makes Buffet an investment genius, and that detailed knowledge can't be encapsulated into a few pithy rules.

As investment advice those rules about as useful as Babe Ruth's advice on hitting a baseball: wait for a good pitch and hit it. It's good advice as far as it goes, but it's useless without the talent and skill to put it into practice. Similarly, it's great to come up with a list of best coding practices, but they're no replacement for judgment, skill, and experience in programming.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 0:43 UTC (Thu) by jberkus (guest, #55561) [Link]

<quote>P.S.
It is funny to hear Josh complain about stupid clients. After all, if they were smart, he wouldn't have a job.</quote>

I'd still rather get paid to solve interesting problems than stupid ones.

And what's really frustrating to me as a consultant is to put dozens of hours into a real solution for a client's scalability issues, only to have my work discarded because of management problems. Whether I get paid or not.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 8:17 UTC (Thu) by ekj (guest, #1524) [Link]

True. I can state from my own personal experience that the first architectures of MediaWiki where -horribly- far from being done "right" for scaling to anything even remotely resembling the level of scaling needed today.

Infact, it's -still- not done anywhere near "right", instead the limitations are worked-around.

josh's clients aren't stupid, they're smart

Posted May 12, 2011 8:55 UTC (Thu) by ssmith32 (subscriber, #72404) [Link]

You mean run your company like Friendster? Or, as you mentioned, companies in the y combinator portfolio, i.e., at best, very trendy and not very profitable?

Sounds like a great recipe for 100% Pure San Francisco/Peninsula VC Kool-Aid. Yum! :P

More seriously, do you have, at the very least, of list of notable successful business cases that illustrate how your principles work to create profitable, sustainable, and growing businesses?

josh's clients aren't stupid, they're smart

Posted May 22, 2011 22:32 UTC (Sun) by rodgerd (guest, #58896) [Link]

All the world is not a San Fran startup. I realise this might be hard to believe, but it's true. A large part of the world is, in fact, businesses who have already solved the "having customers" problem, and need to solve the "rolling out enhancements and new products without doing so in a fashion that results in no customers" problem.

Scale Fail (part 1)

Posted May 7, 2011 15:30 UTC (Sat) by RogerOdle (subscriber, #60791) [Link]

Sometimes the bad decisions are unavoidable. A recurring problem I have seen is customers that see something cool in the movies and want it now. They talk to senior management that rose out of the sales/marketing group and have only a vague idea about technology or what they really know is at least 10 years old. Next thing you know, management says you must use trendy sexy useless X to build the product so that they can tell the customer that "We are on top of X". So you break your nice but not sexy application by trying to find a place for X in order to please management who is trying to please a favorite client.

Scale Fail (part 1)

Posted May 7, 2011 17:05 UTC (Sat) by huayra (guest, #58915) [Link]

Of course... Instead of Squid as reverse proxy, they have been using Varnish Cache ;-)

Scale Fail (part 1)

Posted May 8, 2011 1:27 UTC (Sun) by dps (guest, #5725) [Link]

Using vasy numbers of threads is almost always a bad idea. Writing the program is much harder and the performance is likely to poor due to excessive amounts of context switching.

I am one of the very few people that has formally analysed parallel systems in anger. The difficultly of doing this indicates that it worth sacrificing some concurrency for a simpler design. Shared data is expensive to implement on distributed systems and good for writing 10+ pages of mathematics about a trivial 5 line program.

If a task can be broken down into pieces that are more or less independent then threads are probably a good idea: pools of I/O slaves are effective for some I/O intensive applications and data parallel programming works well for large numerical problems.

There are good mathematical reasons for a job list to work well, based on systems which fail at random intervals. This analysis assumes the cost of accessing the list is negligible compared to the jobs. If that is not true then it is an inappropriate design.

I think the idea that functional programming language are good training for parallel thinking is completely wrong. These languages, like ML, do not feature any control flow whatever. Anybody that attempts to analyse the evaluation strategy will be punished (and probably wrong).

Scale Fail (part 1)

Posted May 11, 2011 14:27 UTC (Wed) by oct (subscriber, #71481) [Link]

> These languages, like ML, do not feature any control flow whatever.
Bullshit. All MLs have strict evaluation strategy by default.

A niggle about linearity

Posted May 9, 2011 18:24 UTC (Mon) by davecb (subscriber, #1574) [Link]

You write "Scaling an application is an arithmetic exercise. If one user consumes X amount of CPU time on the web server, how many web servers do you need to support 100,000 simultaneous users?"

And it is arithmetic (linear), until it isn't.

When you hit a saturation point, the response time stops growing linearly with load, and increases insanely. The response time heads skyward like a homesick angel, and even the sysadmin thinks it's hung. If you plot it, you get a curve that looks a lot like a hockey-stick, with a short horizontal blade, a nice gentle bend... and a long straight handle that just keeps going up.

You add linear amounts of resources to get it back under control, of course, so the cure is arithmetic. The response time and the customer response, however, are hyperbolic (;-))

--dave

A niggle about linearity

Posted May 12, 2011 0:45 UTC (Thu) by jberkus (guest, #55561) [Link]

Yeah, performance problems are generally thresholded. That is, they're linear until they hit a limit, and then things fall apart.

However, you can get an estimate of when you're going to hit those thresholds with some fairly simple arithmatic. It's a shame more people don't try.

A niggle about linearity

Posted May 13, 2011 2:59 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I agree, I wish more people understood databases and storage IO at least as well as many understand network IO. The recent rediscovery of buffer bloat and latency for example hasn't seemed to happen for storage, people talk like MB/s is the only stat that matters when it is often the least interesting.

I've struggled with this kind of issue in the past when trying to understand the performance issues and needs of a large in-house that I supported for many years. I got it wrong many times and the simple estimations that might have helped only look simple in retrospect. There is a lot of pressure to treat databases and storage as a black box until you ask more from it than it can give.

A niggle about linearity

Posted May 13, 2011 13:13 UTC (Fri) by andrewt (guest, #5703) [Link]

Be careful, as CPU utilization does not always correlate to work done. In fact, CPUs with hyper-threading have quite a surprise when they cross the 50% utilization point -you might get 25% more transactions as the CPU goes to 100%, and not another 100% transactions as one would expect from simple arithmetic. Even without hyper-threading, there's enough other things to bust the whole linearity thing like cache warmth, etc.

A niggle about linearity

Posted May 13, 2011 20:05 UTC (Fri) by dlang (guest, #313) [Link]

that only works if you know what those thresholds are.

in many cases they are not where you expect them to be (the hyperthreaded cpu utilisation is one example, locking overhead with multiple processors is another)

frequently there are factors in play that you don't know about, and the result is that until you test it to a particular load, you have no way of knowing if the system will reach that load.

interpolation (guessing how things work between measured points) is fairly reliable

extrapolation (guessing how things will work beyond measured points) is only reliable until some new factor shows up.

A niggle about linearity

Posted May 22, 2011 22:29 UTC (Sun) by rodgerd (guest, #58896) [Link]

Q: Why is our new version of $APPLICATION using a ton more CPU on the database?

A: Because when I run the new query you put in from a simple script, it uses a whole CPU and takes a second. Your performance limit is $NUMCPU/second, at best.

Q: [goes away to redo query]

Scale Fail (part 1)

Posted May 14, 2011 13:01 UTC (Sat) by vachi (guest, #67512) [Link]

Don't feel so safe just because your admins have monitoring process... It is the thing you did not monitor that will come to bite you.
(disclaimer: I'm app dev. I always have an axe to grind about admins :-)

A few months back, our app abruptly slowed down, and sometimes seemed to hang from user point of view. App team talked to admins whether any strange sign in app server or DB server box. Admins defiantly declared that the servers are all green. CPU went over threshold a few times, but that was normal on peak day. A lot of free memory. It must be lousy app that was the problem.

After a lot of frustration and investigation, app team found out that one of the resources inside DB was configured way too low, and on that fateful day (4 years since setup) the resource was used up and things blow apart. Admin said they did not even know how to monitor that resouce consumption...

Latency and optimizing SQL queries

Posted May 25, 2011 4:52 UTC (Wed) by swmike (guest, #57335) [Link]

People generally don't seem to grasp latency as a problem. Most applications are developed on a gig-connected server in the basement which has virtually no latency to the database server. When you create an application that queries and gets a list, and you then do one query per item in the list, this is not a problem in dev testing.

Now, have the database server further away from the client doing the queries, and you now have a huge problem. Spending a bit more time on the SQL query and do everything in a single query would solve this problem, but from looking at the network traffic in most client/server applications, this seems like a lost art.

In your dev testing, insert a box that induces 500ms latency between client and server and check that the application still runs fine. If it doesn't, redesign.