(think)

An online novel about the Source, the Force, the real life and everything in between...

The Elements of Style in Ruby #13: Length vs Size vs Count

One of the problems newcomers to Ruby experience is that there are often quite a few ways to do same thing. For instance – you can obtain the number of items in Enumerable objects (instances of classes using the Enumerable mixin, which would often be collections like Array, Hash, Set, etc) by either using Enumerable#count or the methods length and its alias size that such classes often provide.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
arr = [1, 2, 3]

arr.length # => 3
arr.size # => 3
arr.count # => 3

h = { a: 1, b: 2 }

h.length # => 2
h.size # => 2
h.count # => 2

str = 'name'
str.length # => 4
str.size # => 4
# str.count won't work as String does not include Enumerable

Which one should you use? Let me help with this choice.

length is a method that’s not part of Enumerable – it’s part of a concrete class (like String or Array) and it’s usually running in O(1) (constant) time. That’s as fast as it gets, which means that using it is probably a good idea.

Whether you should use length or size is mostly a matter of personal preference. Personally I use size for collections (hashes, arrays, etc) and length for strings, since for me objects like hashes and stacks don’t have a length, but a size (defined in terms of the elements they contain). Conversely, it’s perfectly normal to assume that some text has some length. Anyways, in the end you’re invoking the same method, so the semantic distinction is not important.

Enumerable#count, on the other hand, is a totally different beast. It’s usually meant to be used with a block or an argument and will return the number of matches in an Enumerable:

1
2
3
4
arr = [1, 1, 2, 3, 5, 6, 8]

arr.count(&:even?) # => 3
arr.count(1) # => 2

You can, however, invoke it without any arguments and it will return the size of the enumerable on which it was invoked:

1
arr.count # => 7

There’s a performance implication with this, though – to calculate the size of the enumerable the count method will traverse it, which is not particularly fast (especially for huge collections). Some classes (like Array) implement an optimized version of count in terms of length, but many don’t.

The takeaway for you is that you should avoid using the count method if you can use length or size.

A note to Rails developers – ActiveRecord::Relation’s length, size and count methods have a totally different meaning, but that’s irrelevant to our current discussion. (Sean Griffin has written a comment regarding it).

That’s all for now, folks! As usual I’m looking forward to hearing your thoughts here and on Twitter!

A List of Deprecated Stuff in Ruby

As APIs evolve it’s inevitable that portions of them will be deprecated. Generally it’s fairly easy to find out what’s deprecated, but for several reasons that’s not the case in Ruby:

  • Deprecation is done through the use of C functions such as rb_warn & rb_warning (as opposed to some more transparent methods as Java’s @deprecated annotation). To see the deprecation messages from those functions you’ll have to run Ruby with -w. Consider this example code:
1
2
3
string.lines do |line|
  puts line
end
1
2
3
ruby -w test.rb

../test.rb:1: warning: passing a block to String#lines is deprecated
  • Alternative Ruby implementations (like JRuby and Rubinius) generally don’t produce the same deprecation warnings. For instance – JRuby doesn’t produce any warnings for the code listed above. One can say that currently deprecations are an MRI implementation detail (although they shouldn’t be).

  • Deprecations are rarely mentioned in the API docs.

  • There’s no easy way to find out in which version of Ruby something got deprecated as rb_warn is a generic instrumentation for producing all sorts of warnings, as opposed to something created specifically to handle deprecations.

  • Some APIs are deprecated only informally (like Hash#has_key? and Hash#has_value?).

  • Some APIs are deprecated with Kernel#warn (like Digest::Digest).

All of the above makes it fairly hard to compile a precise list of deprecations, but we’ll go only for a rough cut here. Let see what we can do…

Grepping in Ruby 2.1’s code base reveals the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
dir.c
2174:    rb_warning("Dir.exists? is a deprecated name, use Dir.exist? instead");

enumerator.c
355:    rb_warn("Enumerator.new without a block is deprecated; use Object#to_enum");

ext/dbm/dbm.c
338:    rb_warn("DBM#index is deprecated; use DBM#key");

ext/gdbm/gdbm.c
453:    rb_warn("GDBM#index is deprecated; use GDBM#key");

ext/openssl/ossl_cipher.c
217:    rb_warn("arguments for %s#encrypt and %s#decrypt were deprecated; "

ext/sdbm/init.c
331:    rb_warn("SDBM#index is deprecated; use SDBM#key");

ext/stringio/stringio.c
656:    rb_warn("StringIO#bytes is deprecated; use #each_byte instead");
876:    rb_warn("StringIO#chars is deprecated; use #each_char instead");
920:    rb_warn("StringIO#codepoints is deprecated; use #each_codepoint instead");
1124:    rb_warn("StringIO#lines is deprecated; use #each_line instead");

ext/zlib/zlib.c
3892:    rb_warn("Zlib::GzipReader#bytes is deprecated; use #each_byte instead");
4174:    rb_warn("Zlib::GzipReader#lines is deprecated; use #each_line instead");

file.c
1413:    rb_warning("%sexists? is a deprecated name, use %sexist? instead", s, s);

hash.c
529:            rb_warn("ignoring wrong elements is deprecated, remove them explicitly");
934:    rb_warn("Hash#index is deprecated; use Hash#key");
3470:    rb_warn("ENV.index is deprecated; use ENV.key");

io.c
3385:    rb_warn("IO#lines is deprecated; use #each_line instead");
3436:    rb_warn("IO#bytes is deprecated; use #each_byte instead");
3590:    rb_warn("IO#chars is deprecated; use #each_char instead");
3697:    rb_warn("IO#codepoints is deprecated; use #each_codepoint instead");
11196:    rb_warn("ARGF#lines is deprecated; use #each_line instead");
11243:    rb_warn("ARGF#bytes is deprecated; use #each_byte instead");
11282:    rb_warn("ARGF#chars is deprecated; use #each_char instead");
11321:    rb_warn("ARGF#codepoints is deprecated; use #each_codepoint instead");

object.c
991:    rb_warning("untrusted? is deprecated and its behavior is same as tainted?");
1005:    rb_warning("untrust is deprecated and its behavior is same as taint");
1020:    rb_warning("trust is deprecated and its behavior is same as untaint");

proc.c
663:    rb_warn("rb_f_lambda() is deprecated; use rb_block_proc() instead");

string.c
6407:       rb_warning("passing a block to String#lines is deprecated");
6576:       rb_warning("passing a block to String#bytes is deprecated");
6665:       rb_warning("passing a block to String#chars is deprecated");
6769:       rb_warning("passing a block to String#codepoints is deprecated");

vm_method.c
54:    rb_warning("rb_clear_cache() is deprecated.");

Below is a cleaned up list of the output shown above. I’ve removed everything that’s unlikely to be of general interest.

  • Dir.exists? is a deprecated name, use Dir.exist? instead
  • Enumerator.new without a block is deprecated; use Object#to_enum
  • StringIO#bytes is deprecated; use StringIO#each_byte instead
  • StringIO#chars is deprecated; use StringIO#each_char instead
  • StringIO#codepoints is deprecated; use StringIO#each_codepoint instead
  • StringIO#lines is deprecated; use StringIO#each_line instead
  • File.exists? is a deprecated name, use File.exist? instead
  • Hash#index is deprecated; use Hash#key
  • ENV.index is deprecated; use ENV.key
  • IO#lines is deprecated; use IO#each_line instead
  • IO#bytes is deprecated; use IO#each_byte instead
  • IO#chars is deprecated; use IO#each_char instead
  • IO#codepoints is deprecated; use IO#each_codepoint instead
  • ARGF#lines is deprecated; use ARGF#each_line instead
  • ARGF#bytes is deprecated; use ARGF#each_byte instead
  • ARGF#chars is deprecated; use ARGF#each_char instead
  • ARGF#codepoints is deprecated; use ARGF#each_codepoint instead
  • Object#untrusted? is deprecated and its behavior is same as Object#tainted?
  • Object#untrust is deprecated and its behavior is same as Object#taint
  • Object#trust is deprecated and its behavior is same as Object#untaint
  • passing a block to String#lines is deprecated
  • passing a block to String#bytes is deprecated
  • passing a block to String#chars is deprecated
  • passing a block to String#codepoints is deprecated

Unfortunately there’s no way to know in which version of Ruby something got deprecated. Obviously most of the things on the list were deprecated before Ruby 2.1. Ideally in the future we’ll get a better deprecation mechanism that actually keeps track of such data.

Hopefully some of you will find this information useful!

We’re planning to get some deprecation tracking in RuboCop, but due to Ruby’s dynamic nature implementing such a feature reliably in a static code analyzer is an impossible task.

The Elements of Style in Ruby #12: Proc vs Proc.new

People are often confused about the fact that there are two ways to created procs in Ruby – via Kernel#proc and Proc.new. Let’s see them in action:

1
2
3
4
5
Proc.new { true }
# => #<Proc:0x007fe35440a058>

proc { true }
# => #<Proc:0x007fe35440a059>

Hmmm, it seems we get exactly the same results… While this is true on Ruby 1.9+, this was not always the case.

In Ruby 1.8, Kernel#proc is actually a synonym for Kernel#lambda which was extremely confusing, since as we all know lambdas an procs differ in subtle ways. Luckily sanity prevailed and Ruby 1.9 made Kernel#proc a synonym for Proc.new instead.

At this point, however, people couldn’t use Kernel#proc anymore if they wanted to write code that’s behaving in the same way on both Ruby 1.8 and Ruby 1.9 and the use of Kernel#proc was generally discouraged. Thankfully Ruby 1.8 is now dead and buried and there’s no reason to prefer Proc.new over Kernel#proc anymore. As a matter of fact – you should probably be using only Kernel#proc as it’s more concise and it’s symmetrical to Kernel#lambda.

1
2
3
4
5
lambda { true }
# => #<Proc:0x007fe35440a058 (lambda)>

proc { true }
# => #<Proc:0x007fe35440a059>

By the way, given proc’s fairly counter-intuitive behavior regarding return, you should probably use lambdas most of the time.

Looking Back at 2013

2013 was a good year for me in many aspects. I’ll share here some of the programming related achievements of mine over the year that made me somewhat proud of myself.

Achievements

Ruby

RuboCop

I’ve made great headway with the RuboCop static code analyzer over the year. Working on a big non-Rails Ruby project reminded me why I fell in love with Ruby in the first place. I was also amazed by great community that quickly formed around RuboCop and propelled it to popularity. I hope that in 2014 I’ll be able to deliver a 1.0 release.

Blogging

I finally did some Ruby-related blogging this year – mostly regarding good programming style. I enjoyed it, although I noticed it’s much harder for me to blog, than it is to code. Good thing I became a programmer and not a writer I guess!

Emacs

I spent a lot of time working on Emacs-related stuff over the year:

  • I started a new Emacs blog called EmacsRedux
  • I improved a lot Prelude and Projectile
  • I contributed code to Emacs for the first time
  • I took over the maintenance of CIDER
  • I was involved in a lesser role in the development of many cool Emacs extensions

Clojure

I wrote a Clojure Style Guide at the beginning of the year. I also hoped I’d be able to work on static code analyzer for Clojure similar to RuboCop, but I got sidetracked and this did not happen. I did lots of CIDER-related work over the year and I hope I’ll be able improve CIDER significantly in 2014 (I have so many great ideas about it!).

Clojure is still my favorite programming language and hope in 2014 I’ll be able to work on more Clojure projects.

Misc

I spent some time playing with algorithms and math for the first time in a long while (inspired by Coursera). This was lots of fun! I hope that in 2014 I’ll be able to allocate even more time to studying them properly.

Epilogue

In previous years I generally spent a lot of time studying/researching new programming related stuff – new languages, new frameworks, new tools, new paradigms, etc. Conversely, I spent relatively little time hacking on open-source projects. 2013 was quite different for me – very little research, lots of open-source hacking. It was pretty tiresome at times, but also extremely enjoyable and gratifying experience.

No idea how 2014 will turn out, but I hope it will be at least as fun as 2013 was!

Projectile 0.10 Is Out!

Projectile 0.10.0 is out!

This might come as a surprise for people tracking Projectile’s development, since recent snapshots were using 1.0 as the version number, so allow me to explain. I’ve been wanting to release Projectile 1.0 for a while now, but I felt that without the addition of per-project settings support and some refinements to the way ignoring of files & folders currently work, such moniker would be unjustified. Unfortunately, lately I’ve been quite busy working on other projects like cider and RuboCop and I don’t have that much time to work on Projectile, so I kept delaying the 1.0 version.

Recently I decided to release version 0.10 instead, for the benefit of users of package repos like Marmalade. The minor bump in the version doesn’t mean that 0.10 is not a noteworthy update, though. 5 months of development and more than a hundred commits from almost a dozen of developers have really improved the Projectile experience. It’s easily the biggest update of Projectile, since the project was conceived.

Some of the highlights include:

  • .projectile is always taken into account (previously it was consulted only when doing native indexing)
  • There’s now the ability to search for files in a specific directory
  • More project types are recognized
  • You can search for etags (ctags) in a project
  • The Commander (a really cool feature inspired from CIDER & SLIME, that I’ll show in a bit)
  • Dozens of (mostly undocumented in the Changelog) bugfixes

Have a look at the changelog for more details.

And here’s the new Commander in action:

Basically it gives you a way to invoke many of the Projectile commands with a single key – f for find-file, s for switch-project, etc. It’s very handy when switching projects since with this command you can always pick a different command to execute in the new project. By the way, projectile-switch-project will now run the commander, when invoked with a prefix argument (C-u C-c p s).