RubyPants (Class)

In: rubypants.rb
Parent: String

RubyPants — SmartyPants ported to Ruby

Synopsis

RubyPants is a Ruby port of the smart-quotes library SmartyPants.

The original "SmartyPants" is a free web publishing plug-in for Movable Type, Blosxom, and BBEdit that easily translates plain ASCII punctuation characters into "smart" typographic punctuation HTML entities.

Description

RubyPants can perform the following transformations:

  • Straight quotes (" and ) into "curly" quote HTML entities
  • Backticks-style quotes (``like this’‘) into "curly" quote HTML entities
  • Dashes ( and ) into en- and em-dash entities
  • Three consecutive dots ( or . . .) into an ellipsis entity

This means you can write, edit, and save your posts using plain old ASCII straight quotes, plain dashes, and plain dots, but your published posts (and final HTML output) will appear with smart quotes, em-dashes, and proper ellipses.

RubyPants does not modify characters within <pre>, <code>, <kbd>, <math> or <script> tag blocks. Typically, these tags are used to display text where smart quotes and other "smart punctuation" would not be appropriate, such as source code or example markup.

Backslash Escapes

If you need to use literal straight quotes (or plain hyphens and periods), RubyPants accepts the following backslash escape sequences to force non-smart punctuation. It does so by transforming the escape sequence into a decimal-encoded HTML entity:

  \\    \"    \'    \.    \-    \`

This is useful, for example, when you want to use straight quotes as foot and inch marks: 6’2" tall; a 17" iMac. (Use 6\’2\" resp. 17\".)

Algorithmic Shortcomings

One situation in which quotes will get curled the wrong way is when apostrophes are used at the start of leading contractions. For example:

  'Twas the night before Christmas.

In the case above, RubyPants will turn the apostrophe into an opening single-quote, when in fact it should be a closing one. I don’t think this problem can be solved in the general case—every word processor I’ve tried gets this wrong as well. In such cases, it’s best to use the proper HTML entity for closing single-quotes ("&8217;") by hand.

Bugs

To file bug reports or feature requests (except see above) please send email to: chneukirchen@gmail.com

If the bug involves quotes being curled the wrong way, please send example text to illustrate.

Authors

John Gruber did all of the hard work of writing this software in Perl for Movable Type and almost all of this useful documentation. Chad Miller ported it to Python to use with Pyblosxom.

Christian Neukirchen provided the Ruby port, as a general-purpose library that follows the *Cloth API.

Copyright and License

SmartyPants license:

Copyright © 2003 John Gruber (daringfireball.net) All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name "SmartyPants" nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

RubyPants license

RubyPants is a derivative work of SmartyPants and smartypants.py.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Links

John Gruber:daringfireball.net
SmartyPants:daringfireball.net/projects/smartypants
Chad Miller:web.chad.org
Christian Neukirchen:kronavita.de/chris

Constants

VERSION = "0.2"

Public Class methods

Create a new RubyPants instance with the text in string.

Allowed elements in the options array:

0 :do nothing
1 :enable all, using only em-dash shortcuts
2 :enable all, using old school en- and em-dash shortcuts (default)
3 :enable all, using inverted old school en and em-dash shortcuts
-1 :stupefy (translate HTML entities to their ASCII-counterparts)

If you don’t like any of these defaults, you can pass symbols to change RubyPants’ behavior:

:quotes :quotes
:backticks :backtick quotes (``double’’ only)
:allbackticks :backtick quotes (``double’’ and `single’)
:dashes :dashes
:oldschool :old school dashes
:inverted :inverted old school dashes
:ellipses :ellipses
:convertquotes :convert &quot; entities to " for Dreamweaver users
:stupefy :translate RubyPants HTML entities to their ASCII counterparts.

[Source]

     # File rubypants.rb, line 207
207:   def initialize(string, options=[2])
208:     super string
209:     @options = [*options]
210:   end

Public Instance methods

Apply SmartyPants transformations.

[Source]

     # File rubypants.rb, line 213
213:   def to_html
214:     do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
215:     convert_quotes = false
216: 
217:     if @options.include? 0
218:       # Do nothing.
219:       return self
220:     elsif @options.include? 1
221:       # Do everything, turn all options on.
222:       do_quotes = do_backticks = do_ellipses = true
223:       do_dashes = :normal
224:     elsif @options.include? 2
225:       # Do everything, turn all options on, use old school dash shorthand.
226:       do_quotes = do_backticks = do_ellipses = true
227:       do_dashes = :oldschool
228:     elsif @options.include? 3
229:       # Do everything, turn all options on, use inverted old school
230:       # dash shorthand.
231:       do_quotes = do_backticks = do_ellipses = true
232:       do_dashes = :inverted
233:     elsif @options.include?(-1)
234:       do_stupefy = true
235:     else
236:       do_quotes =                @options.include? :quotes
237:       do_backticks =             @options.include? :backticks
238:       do_backticks = :both    if @options.include? :allbackticks
239:       do_dashes = :normal     if @options.include? :dashes
240:       do_dashes = :oldschool  if @options.include? :oldschool
241:       do_dashes = :inverted   if @options.include? :inverted
242:       do_ellipses =              @options.include? :ellipses
243:       convert_quotes =           @options.include? :convertquotes
244:       do_stupefy =               @options.include? :stupefy
245:     end
246: 
247:     # Parse the HTML
248:     tokens = tokenize
249:     
250:     # Keep track of when we're inside <pre> or <code> tags.
251:     in_pre = false
252: 
253:     # Here is the result stored in.
254:     result = ""
255: 
256:     # This is a cheat, used to get some context for one-character
257:     # tokens that consist of just a quote char. What we do is remember
258:     # the last character of the previous text token, to use as context
259:     # to curl single- character quote tokens correctly.
260:     prev_token_last_char = nil
261: 
262:     tokens.each { |token|
263:       if token.first == :tag
264:         result << token[1]
265:         if token[1] =~ %!<(/?)(?:pre|code|kbd|script|math)[\s>]!
266:           in_pre = ($1 != "/")  # Opening or closing tag?
267:         end
268:       else
269:         t = token[1]
270: 
271:         # Remember last char of this token before processing.
272:         last_char = t[-1].chr
273: 
274:         unless in_pre
275:           t = process_escapes t
276:           
277:           t.gsub!(/&quot;/, '"')  if convert_quotes
278: 
279:           if do_dashes
280:             t = educate_dashes t            if do_dashes == :normal
281:             t = educate_dashes_oldschool t  if do_dashes == :oldschool
282:             t = educate_dashes_inverted t   if do_dashes == :inverted
283:           end
284: 
285:           t = educate_ellipses t  if do_ellipses
286: 
287:           # Note: backticks need to be processed before quotes.
288:           if do_backticks
289:             t = educate_backticks t
290:             t = educate_single_backticks t  if do_backticks == :both
291:           end
292: 
293:           if do_quotes
294:             if t == "'"
295:               # Special case: single-character ' token
296:               if prev_token_last_char =~ /\S/
297:                 t = "&#8217;"
298:               else
299:                 t = "&#8216;"
300:               end
301:             elsif t == '"'
302:               # Special case: single-character " token
303:               if prev_token_last_char =~ /\S/
304:                 t = "&#8221;"
305:               else
306:                 t = "&#8220;"
307:               end
308:             else
309:               # Normal case:                  
310:               t = educate_quotes t
311:             end
312:           end
313: 
314:           t = stupefy_entities t  if do_stupefy
315:         end
316: 
317:         prev_token_last_char = last_char
318:         result << t
319:       end
320:     }
321: 
322:     # Done
323:     result
324:   end

Protected Instance methods

Return the string, with "``backticks’‘"-style single quotes translated into HTML curly quote entities.

[Source]

     # File rubypants.rb, line 384
384:   def educate_backticks(str)
385:     str.gsub("``", '&#8220;').gsub("''", '&#8221;')
386:   end

The string, with each instance of "" translated to an em-dash HTML entity.

[Source]

     # File rubypants.rb, line 347
347:   def educate_dashes(str)
348:     str.gsub(/--/, '&#8212;')
349:   end

Return the string, with each instance of "" translated to an em-dash HTML entity, and each "" translated to an en-dash HTML entity. Two reasons why: First, unlike the en- and em-dash syntax supported by educate_dashes_oldschool, it’s compatible with existing entries written before SmartyPants 1.1, back when "" was only used for em-dashes. Second, em-dashes are more common than en-dashes, and so it sort of makes sense that the shortcut should be shorter to type. (Thanks to Aaron Swartz for the idea.)

[Source]

     # File rubypants.rb, line 369
369:   def educate_dashes_inverted(str)
370:     str.gsub(/---/, '&#8211;').gsub(/--/, '&#8212;')
371:   end

The string, with each instance of "" translated to an en-dash HTML entity, and each "" translated to an em-dash HTML entity.

[Source]

     # File rubypants.rb, line 355
355:   def educate_dashes_oldschool(str)
356:     str.gsub(/---/, '&#8212;').gsub(/--/, '&#8211;')
357:   end

Return the string, with each instance of "" translated to an ellipsis HTML entity. Also converts the case where there are spaces between the dots.

[Source]

     # File rubypants.rb, line 377
377:   def educate_ellipses(str)
378:     str.gsub('...', '&#8230;').gsub('. . .', '&#8230;')
379:   end

Return the string, with "educated" curly quote HTML entities.

[Source]

     # File rubypants.rb, line 397
397:   def educate_quotes(str)
398:     punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^^_`{|}~]'
399: 
400:     str = str.dup
401:       
402:     # Special case if the very first character is a quote followed by
403:     # punctuation at a non-word-break. Close the quotes by brute
404:     # force:
405:     str.gsub!(/^'(?=#{punct_class}\B)/, '&#8217;')
406:     str.gsub!(/^"(?=#{punct_class}\B)/, '&#8221;')
407: 
408:     # Special case for double sets of quotes, e.g.:
409:     #   <p>He said, "'Quoted' words in a larger quote."</p>
410:     str.gsub!(/"'(?=\w)/, '&#8220;&#8216;')
411:     str.gsub!(/'"(?=\w)/, '&#8216;&#8220;')
412: 
413:     # Special case for decade abbreviations (the '80s):
414:     str.gsub!(/'(?=\d\ds)/, '&#8217;')
415: 
416:     close_class = %![^\ \t\r\n\\[\{\(\-]!
417:     dec_dashes = '&#8211;|&#8212;'
418:     
419:     # Get most opening single quotes:
420:     str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
421:              '\1&#8216;')
422:     # Single closing quotes:
423:     str.gsub!(/(#{close_class})'/, '\1&#8217;')
424:     str.gsub!(/'(\s|s\b|$)/, '&#8217;\1')
425:     # Any remaining single quotes should be opening ones:
426:     str.gsub!(/'/, '&#8216;')
427: 
428:     # Get most opening double quotes:
429:     str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
430:              '\1&#8220;')
431:     # Double closing quotes:
432:     str.gsub!(/(#{close_class})"/, '\1&#8221;')
433:     str.gsub!(/"(\s|s\b|$)/, '&#8221;\1')
434:     # Any remaining quotes should be opening ones:
435:     str.gsub!(/"/, '&#8220;')
436: 
437:     str
438:   end

Return the string, with "`backticks’"-style single quotes translated into HTML curly quote entities.

[Source]

     # File rubypants.rb, line 391
391:   def educate_single_backticks(str)
392:     str.gsub("`", '&#8216;').gsub("'", '&#8217;')
393:   end

Return the string, with after processing the following backslash escape sequences. This is useful if you want to force a "dumb" quote or other character to appear.

Escaped are:

     \\    \"    \'    \.    \-    \`

[Source]

     # File rubypants.rb, line 335
335:   def process_escapes(str)
336:     str.gsub('\\\\', '&#92;').
337:       gsub('\"', '&#34;').
338:       gsub("\\\'", '&#39;').
339:       gsub('\.', '&#46;').
340:       gsub('\-', '&#45;').
341:       gsub('\`', '&#96;')
342:   end

Return the string, with each RubyPants HTML entity translated to its ASCII counterpart.

Note: This is not reversible (but exactly the same as in SmartyPants)

[Source]

     # File rubypants.rb, line 445
445:   def stupefy_entities(str)
446:     str.
447:       gsub(/&#8211;/, '-').      # en-dash
448:       gsub(/&#8212;/, '--').     # em-dash
449:       
450:       gsub(/&#8216;/, "'").      # open single quote
451:       gsub(/&#8217;/, "'").      # close single quote
452:       
453:       gsub(/&#8220;/, '"').      # open double quote
454:       gsub(/&#8221;/, '"').      # close double quote
455:       
456:       gsub(/&#8230;/, '...')     # ellipsis
457:   end

Return an array of the tokens comprising the string. Each token is either a tag (possibly with nested, tags contained therein, such as <a href="<MTFoo>">, or a run of text between tags. Each element of the array is a two-element array; the first is either :tag or :text; the second is the actual value.

Based on the _tokenize() subroutine from Brad Choate’s MTRegex plugin. <www.bradchoate.com/past/mtregex.php>

This is actually the easier variant using tag_soup, as used by Chad Miller in the Python port of SmartyPants.

[Source]

     # File rubypants.rb, line 471
471:   def tokenize
472:     tag_soup = /([^<]*)(<[^>]*>)/
473: 
474:     tokens = []
475: 
476:     prev_end = 0
477:     scan(tag_soup) {
478:       tokens << [:text, $1]  if $1 != ""
479:       tokens << [:tag, $2]
480:       
481:       prev_end = $~.end(0)
482:     }
483: 
484:     if prev_end < size
485:       tokens << [:text, self[prev_end..-1]]
486:     end
487: 
488:     tokens
489:   end

[Validate]