Tuesday, May 15, 2012

Twitter on Scala

A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners

Summary
Three Twitter developers, Steve Jenson, Alex Payne, and Robey Pointer, talk with Bill Venners about their use of Scala in production at Twitter.
Twitter is a fast growing website that provides a micro-blogging service. It began its life as a Ruby on Rails application, and still uses Ruby on Rails to deliver most user-facing web pages. But about a year ago they started replacing some of the back-end Ruby services with applications running on the JVM and written in Scala. In this interview, three developers from Twitter—Steve Jenson, system engineer; Alex Payne, API lead; and Robey Pointer, member of the service team—sit down with Bill Venners to discuss Twitter's real-world use of Scala. They describe the production issues that led them to consider Scala in the first place, what issues they ran into using Scala in production, and how Scala affected their programming style.

A quick look at Twitter



Bill Venners: What is Twitter, and what in its technical history led you to consider Scala?
Alex Payne: Twitter is a communications service that allows people to share information in 140 characters or less. You can share information from your phone, from a web browser, or from one of many API clients that are out there, for just about every operating system, mobile platform, or web platform. Basically, if you want to share a short thought, one to many, Twitter is a transport-independent way to do that. In a broader technical sense, we see ourselves as a short messaging layer for the internet. We’ve been described as a “telegraph for web 2.0.”
One of the things that’s core to our business is providing open APIs for everything you can do on the website. So all the functionality that’s available there for users is also available for developers to access programmatically. That’s Twitter in a nutshell.
Twitter started as a hack project at a company called ODEO, which was focused on podcasting. As ODEO was having some troubles in its latter days as a company, they started experimenting, to keep engineers involved by letting them play around with ideas they had on the side. One of the engineers, Jack Dorsey, had been really interested in status. He was looking at his AIM buddy list, and seeing that all of these guys were saying, “I’m walking the dog,” “I’m working on this,” “I’m going to that.” He wondered if there was some way to make it easier for people to share that status. So he and a couple other engineers started prototyping what became Twitter on Ruby on Rails, which was the stack that ODEO was built on. And Twitter continues today to be primarily a Rails application, with a bunch of Ruby daemons doing asynchronous processing on the backend.
Over time we found that although Rails works great for doing front-end web development, for doing heavy weight back-end processing, Rails had some performance limitations at runtime. And I think that—and this is more my personal opinion—the Ruby language lacks some things that contribute to reliable, high performance code, which is something we’re very interested in as we’re growing as a business. We want the code we write to be correct and maintainable. We want to keep our costs down—all the things most businesses want out of their stack. So that’s why we started looking at Scala.
The other big reason we looked at Scala was that, although we’ve run into problems with Ruby, we like the flexibility of the language. We like that it’s such a full featured language, that it’s fun to code in. It’s the same reason so many Java people end up writing Ruby after they leave some big enterprise company. They want to have fun day to day. We didn’t want to leave that behind and go to a language with a very dry, businesslike community, like C++, for example. We know that people write super high performance code in C++, and engineers like Steve and Robey have had experience with that. But we wanted to be using a language that we’re really passionate about, and it seemed worth taking a gamble on Scala.

Reliable, high performance code



Bill Venners: I’m curious, and the Ruby folks will want it spelled out: Can you elaborate on what you felt the Ruby language lacked in the area of reliable, high performance code?
Steve Jenson:One of the things that I’ve found throughout my career is the need to have long-lived processes. And Ruby, like many scripting languages, has trouble being an environment for long lived processes. But the JVM is very good at that, because it’s been optimized for that over the last ten years. So Scala provides a basis for writing long-lived servers, and that’s primarily what we use it for at Twitter right now. Another thing we really like about Scala is static typing that’s not painful. Sometimes it would be really nice in Ruby to say things like, here’s an optional type annotation. This is the type we really expect to see here. And we find that really useful in Scala, to be able to specify the type information.
Robey Pointer: Also, Ruby doesn’t really have good thread support yet. It’s getting better, but when we were writing these servers, green threads were the only thing available. Green threads don't use the actual operating system’s kernel threads. They sort of emulate threads by periodically stopping what they are doing and checking whether another “thread” wants to run. So Ruby is emulating threads within a single core or a processor. We wanted to run on multi-core servers that don’t have an infinite amount of memory. And if you don’t have good threading support, you really need multiple processes. And because Ruby’s garbage collector is not quite as good as Java’s, each process uses up a lot of memory. We can’t really run very many Ruby daemon processes on a single machine without consuming large amounts of memory. Whereas with running things on the JVM we can run many threads in the same heap, and let that one process take all the machine’s memory for its playground.
Alex Payne: I’d definitely want to hammer home what Steve said about typing. As our system has grown, a lot of the logic in our Ruby system sort of replicates a type system, either in our unit tests or as validations on models. I think it may just be a property of large systems in dynamic languages, that eventually you end up rewriting your own type system, and you sort of do it badly. You’re checking for null values all over the place. There’s lots of calls to Ruby’s kind_of? method, which asks, “Is this a kind of User object? Because that’s what we’re expecting. If we don’t get that, this is going to explode.” It is a shame to have to write all that when there is a solution that has existed in the world of programming languages for decades now.

Complementing Ruby with Scala



Steve Jenson: We find Ruby and Scala are very complementary. We use Ruby, actually specifically Rails, for things that it is very strong at. All the front end stuff that it does very well.
Bill Venners: What do you use Scala for?
Robey Pointer: We had a Ruby-based queueing system that we used for communicating between the Rails front ends and the daemons, and we ended up replacing that with one written in Scala. The Ruby one actually worked pretty decently in a normal steady state, but the startup time and the crash behavior were undesirable. It was a little too slow and memory intensive. Sometimes our peak loads would knock it out. And when it got knocked out, it was very slow to recover, which is not what we wanted. We wanted something that could handle the edge cases and the high load, maybe not as easily as a regular load, but with relative ease.
Bill Venners: What did the daemons do?
Robey Pointer: A lot of our architecture is based on letting Rails do what it does best, which is the AJAX, the web front ends, the website—what the user sees. Anything we can offload out of the request/response cycle, we do. So we queue those tasks into a messaging system and have back-end daemons handle them.
Steve Jenson: For example, if you make a change to your social graph; i.e., you follow or unfollow someone on Twitter. All of that work and the associated cache invalidations are done asynchronously by a daemon.
Bill Venners: Did you consider JRuby?
Alex Payne: We did. At the time we looked into it, we simply couldn't boot our Rails app on JRuby. Too many of the Ruby Gems we make use of require C extensions, and haven't been ported to JVM-friendly versions. The performance of JRuby was also not even on par with MRI (the C implementation of Ruby), much less a language like Scala. We're open to trying out JRuby again in the future, but we're also hoping that some Ruby patches will help in the meantime.

Tradeoffs with Scala



Bill Venners: You’ve had real experience with Scala, using it to solve real problems. What tradeoffs did you find with it? What were the problems? What were the good things? What were the bad things?
Steve Jenson: I think it worked remarkably well for us. A lot of us have had experience programming in languages that were more research-oriented, and those tend to have a lot of problems when trying to productionize systems. But we didn’t really run into a lot of those issues with Scala. We would run into a few issues, with newer parts of the system. I know we ran into some issues with actors and high scalability, but we were able to work around those. Generally, it’s been a very performant and stable system for us.
Robey Pointer: I would agree that the problems have been very minimal so far. Some of it was just the newness of the language and compiler. We occasionally run into compiler errors that are mystifying for a minute, which took a little time to figure out what the actual error was. Some of the core collection libraries in Scala are not quite up to snuff yet. And apparently they are working on that right now.
Bill Venners: Not up to snuff in what way? They don’t work? They’re not fast?
Robey Pointer: I never had a problem with them not working, but a couple of the methods were not written in a particularly performant way, or there were some gaps in the API. In some cases we just decided to burrow down and use the Java collections from Scala, which is a nice advantage of Scala, that we have that option.
Alex Payne: One of the first things I worked on in Scala here was a test harness for our APIs. It wraps the Apache Commons HTTP library and provides a set of objects that represents the restful resources on our system. The hardest part was just switching over from the Ruby mentality to the Scala mentality. Trying to think more functionally. Trying to think more immutably. Thinking about static typing for the first time in several years. So for people who may not have as much of a Java background and have more of a background in dynamic languages, the transition period might be a little bit longer for them, but having gotten to the other side of that, it’s great. Now I think in Scala by default as opposed to thinking in Ruby by default when I’m sketching out code.
Bill Venners: How did learning Scala change how you think about programming?
Robey Pointer: I had no functional background prior to learning Scala other than Python. I was pretty familiar with Python. As I’ve learned more Scala I’ve started thinking more functionally than I did before. When I first started I would use the forexpression, which is very much like Python’s. Now more often I find myself invoking map or foreach directly on iterators.
Alex Payne: I guess thinking about concurrency in terms of actors was definitely a switch. I’d programmed a little bit in the IO language. But I really like Scala’s actor implementation, which is a little bit closer to Erlang’s than IO’s. That’s been a positive change.
Steve Jenson: I came from a Java background, but I was also experienced in Common LISP and ML, and it was wonderful to use a runtime I was familiar with and be able to use functional combinators and closures and higher order functions, all these things that I’ve wanted to use more in production systems. I’ve been really pleased with how they work in Scala.

Concerns with Scala



Bill Venners: If I’m thinking about using Scala in a production system, what should I worry about? What are the things I need to make sure work? What should I be scared of?
Alex Payne: I’d be prepared for a few hours of tinkering with your IDE or your editor. It still seems like IDE and editor support is, perhaps not in its infancy, but in its awkward teenage years. Several of us are using IntelliJ, and IntelliJ 8.1 seems to be pretty good when it comes to Scala support. The Emacs mode, I know Steve uses that, but the indentation is a little quirky. Textmate support is pretty dreadful, but there was some discussion on the Scala tools mailing list on improving that. But that’s a barrier to entry.
Robey Pointer: If you’re not coming from the Java world, if you’re coming from the Ruby or Python world, the compile-deploy cycle can be a little irritating. It’s a very different world to set up a build environment and deploy jar files with large scripts than it is with Ruby or Python.
Alex Payne: The JavaRebel kind of helps with that, once you get that set up, it’s a little bit more of the, write some code, hit save, run some tests again. You can get closer to that, but there’s still some of the baggage of the Java world, where you have to do a whole bunch of up front setup on every project. But you have a set of good conventions, and it becomes very easy to bring in new libraries. You’ve got a lot of deploy stuff baked in. It’s just a tradeoff.
Steve Jenson: Making sure that you’re using mutability in the right places. Start with immutability, then use mutability where you find appropriate. That’s been a good lesson for us. The reason you should care about immutability is that if you’re using threads and your objects are immutable, you don’t have to worry about things changing underneath you. For us that’s been a big win. We really only ever go to mutability if we feel we need an extra performance gain.
Robey Pointer: And the JIT compiler can apparently give some important performance benefits to immutable objects.
Alex Payne: One other thing we’ve run into. It’s definitely a special case, and I don’t think it should throw people off, but we’ve been building a server, called Hosebird, to send the entire stream of public Tweets in near real time to a variety of partners over the internet. So it is a specialized system. We’ve built it in Scala wrapping Jetty, and initially we had a number of actors inside the system: one to pull messages off of our internal messaging queue, and a number of other actors that represented clients. And over time as we ran more and more system tests on it, we found that actors weren’t necessarily the ideal concurrency model for all parts of that system. Some parts of the concurrency model of that system are still actor based. For example, it uses a memcache library that Robey wrote, which is actor based. But for other parts we’ve just gone back to a traditional Java threading model. The engineer working on that, John Kalucki, just found it was a little bit easier to test, a bit more predictable. The nice thing was, it took minutes to switch code that was actor based over to something thread based. It was a couple of search and replaces. So it’s not so bad if actors fail you for whatever reason.
Steve Jenson: Just to clarify, you’re talking recently about moving from actors to more of the Java 5 concurrency model, like java.util.concurrent, executors, thread pools?
Alex Payne: I don’t even know if John was using thread pools on that necessarily. I think he was still doing some manual thread management. Basically it was just moving from actors to explicitly running new threads.
Robey Pointer: I ran into the same thing in Kestrel, the queueing system. I started off with an actor for every single queue. I found that the work is so fine grained there that it was actually better at that tiny level to just use Java locks. Actors work great for having client connections, where there’s a bit of work overhead to what the actor is doing, and the code for handling client requests is very simple and straightforward.
Bill Venners: Anything else that someone considering using Scala in the real world should be aware of?
Alex Payne: I think programmers who’ve never worked with a language with pattern matching before should be prepared to have that change their perceptions about programming. I was talking to a group of mostly Mac programmers, largely Objective-C developers. I was trying to convey to them that once you start working with pattern matching, you’ll never want to use a language without it again. It’s such a common thing that a programmer does every day. I have a collection of stuff. Let me pick certain needles out of this haystack, whether its based on a class or their contents, it’s such a powerful tool. It’s so great.
Robey Pointer: I wanted to talk a bit more about starting to use Scala. It definitely wasn’t a flippant choice we made over a few beers one night. We actually agonized over it for quite a while. Maybe not agonized, but certainly discussed it for a long time. One of the biggest draws for us to Scala as opposed to another language, was that once you’d started writing in a really high level language like Ruby, it can be difficult and kind of annoying to go back to a medium level language like Java, where you have to type a lot of code to get the same effect. That was a really big draw for us. With Scala we could still write this really high level code, but be on the JVM.
Bill Venners: Could you clarify what you mean by high and medium level?
Robey Pointer: I think of it as, the higher the level of the programming language, the less you have to type to do more. To me, languages like Ruby, Scala, and Python are very high level, because you can write a few lines of code to do what might take ten or twenty lines in Java, or 250 lines in C.

Getting started with Scala



Bill Venners: How would you suggest people get started with Scala?
Steve Jenson: Just try it. Make a starter project. Go for it.
Alex Payne: There’s great code on GitHub. There’s a growing Scala community there. David Pollak and the rest of the Lift committers have put Lift on GitHub. That’s been sort of a catalyst. Jonas Bonér, who has been working on a bunch of transactional memory systems, very high concurrency, enterprise back end stuff, has been releasing some of that stuff on GitHub. And I think by the end of the year there will be something like five books.
Bill Venners: Last week at the JavaPosse roundup people were talking about how they tried something new out, and a theme emerged. The advice was to try it on something you care about, but not something necessarily business critical. Something that you care enough about that you’ll keep going, but not something that if it fails, you will go out of business.
Steve Jenson: We did that at Twitter. We started by doing a small experiment, where we served our public timeline out of Scala. And it worked very well. We learned a lot of lessons, found out what we liked and what we didn’t like. It still runs. It has been running for almost a year.
Alex Payne: Yes, the only time we have issues with it is when the underlying databases have replication lag. Other than that it just keeps on humming. And it has been such a success that our plan for the long run is to move more and more of our architecture into Scala. The vast majority of our traffic is API requests, and we want most of those to be served by Scala, either at an edge cache layer or a web application layer. Hopefully by the end of 2009 the majority of users’ interactions with Twitter are going to be Scala-powered.

Share your opinion



Have an opinion on the ideas presented in this article? You can discuss this article in the Articles Forum Topic, Twitter on Scala.

Resources



Ruby's home page:
http://www.ruby-lang.org
Scala's home page:
http://www.scala-lang.org
Alex Payne is coauthor with Dean Wampler of the book Programming Scala, due to be published in print form in August 2009, and available now as an O'Reilly "Rough Cut" PDF here:
http://oreilly.com/catalog/9780596157746/
The only Scala book available today in print form is Programming in Scala, coauthored by Martin Odersky (the designer of Scala), Lex Spoon, and Bill Venners:
http://www.artima.com/shop/programming_in_scala
For a good overview of what Scala programming is all about, watch The Feel of Scala video on Parleys.com:
http://tinyurl.com/dcfm4c

About the Author


Bill Venners is president of Artima, Inc., publisher of Artima Developer (www.artima.com). He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Active in the Jini Community since its inception, Bill led the Jini Community's ServiceUI project, whose ServiceUI API became the de facto standard way to associate user interfaces to Jini services. Bill is also the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and coauthor with Martin Odersky and Lex Spoon of the book, Programming in Scala.

Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

0 comments: on "Twitter on Scala"

Post a Comment