In: |
rubypants.rb
|
Parent: | String |
RubyPants is a Ruby port of the smart-quotes library SmartyPants.
The original "SmartyPants" is a free web publishing plug-in for Movable Type, Blosxom, and BBEdit that easily translates plain ASCII punctuation characters into "smart" typographic punctuation HTML entities.
RubyPants can perform the following transformations:
This means you can write, edit, and save your posts using plain old ASCII straight quotes, plain dashes, and plain dots, but your published posts (and final HTML output) will appear with smart quotes, em-dashes, and proper ellipses.
RubyPants does not modify characters within <pre>, <code>, <kbd>, <math> or <script> tag blocks. Typically, these tags are used to display text where smart quotes and other "smart punctuation" would not be appropriate, such as source code or example markup.
If you need to use literal straight quotes (or plain hyphens and periods), RubyPants accepts the following backslash escape sequences to force non-smart punctuation. It does so by transforming the escape sequence into a decimal-encoded HTML entity:
\\ \" \' \. \- \`
This is useful, for example, when you want to use straight quotes as foot and inch marks: 6’2" tall; a 17" iMac. (Use 6\’2\" resp. 17\".)
One situation in which quotes will get curled the wrong way is when apostrophes are used at the start of leading contractions. For example:
'Twas the night before Christmas.
In the case above, RubyPants will turn the apostrophe into an opening single-quote, when in fact it should be a closing one. I don’t think this problem can be solved in the general case—every word processor I’ve tried gets this wrong as well. In such cases, it’s best to use the proper HTML entity for closing single-quotes ("&8217;") by hand.
To file bug reports or feature requests (except see above) please send email to: chneukirchen@gmail.com
If the bug involves quotes being curled the wrong way, please send example text to illustrate.
John Gruber did all of the hard work of writing this software in Perl for Movable Type and almost all of this useful documentation. Chad Miller ported it to Python to use with Pyblosxom.
Christian Neukirchen provided the Ruby port, as a general-purpose library that follows the *Cloth API.
Copyright © 2003 John Gruber (daringfireball.net) All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
RubyPants is a derivative work of SmartyPants and smartypants.py.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
John Gruber: | daringfireball.net |
SmartyPants: | daringfireball.net/projects/smartypants |
Chad Miller: | web.chad.org |
Christian Neukirchen: | kronavita.de/chris |
VERSION | = | "0.2" |
Create a new RubyPants instance with the text in string.
Allowed elements in the options array:
0 : | do nothing |
1 : | enable all, using only em-dash shortcuts |
2 : | enable all, using old school en- and em-dash shortcuts (default) |
3 : | enable all, using inverted old school en and em-dash shortcuts |
-1 : | stupefy (translate HTML entities to their ASCII-counterparts) |
If you don’t like any of these defaults, you can pass symbols to change RubyPants’ behavior:
:quotes : | quotes |
:backticks : | backtick quotes (``double’’ only) |
:allbackticks : | backtick quotes (``double’’ and `single’) |
:dashes : | dashes |
:oldschool : | old school dashes |
:inverted : | inverted old school dashes |
:ellipses : | ellipses |
:convertquotes : | convert " entities to " for Dreamweaver users |
:stupefy : | translate RubyPants HTML entities to their ASCII counterparts. |
# File rubypants.rb, line 207 207: def initialize(string, options=[2]) 208: super string 209: @options = [*options] 210: end
Apply SmartyPants transformations.
# File rubypants.rb, line 213 213: def to_html 214: do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil 215: convert_quotes = false 216: 217: if @options.include? 0 218: # Do nothing. 219: return self 220: elsif @options.include? 1 221: # Do everything, turn all options on. 222: do_quotes = do_backticks = do_ellipses = true 223: do_dashes = :normal 224: elsif @options.include? 2 225: # Do everything, turn all options on, use old school dash shorthand. 226: do_quotes = do_backticks = do_ellipses = true 227: do_dashes = :oldschool 228: elsif @options.include? 3 229: # Do everything, turn all options on, use inverted old school 230: # dash shorthand. 231: do_quotes = do_backticks = do_ellipses = true 232: do_dashes = :inverted 233: elsif @options.include?(-1) 234: do_stupefy = true 235: else 236: do_quotes = @options.include? :quotes 237: do_backticks = @options.include? :backticks 238: do_backticks = :both if @options.include? :allbackticks 239: do_dashes = :normal if @options.include? :dashes 240: do_dashes = :oldschool if @options.include? :oldschool 241: do_dashes = :inverted if @options.include? :inverted 242: do_ellipses = @options.include? :ellipses 243: convert_quotes = @options.include? :convertquotes 244: do_stupefy = @options.include? :stupefy 245: end 246: 247: # Parse the HTML 248: tokens = tokenize 249: 250: # Keep track of when we're inside <pre> or <code> tags. 251: in_pre = false 252: 253: # Here is the result stored in. 254: result = "" 255: 256: # This is a cheat, used to get some context for one-character 257: # tokens that consist of just a quote char. What we do is remember 258: # the last character of the previous text token, to use as context 259: # to curl single- character quote tokens correctly. 260: prev_token_last_char = nil 261: 262: tokens.each { |token| 263: if token.first == :tag 264: result << token[1] 265: if token[1] =~ %!<(/?)(?:pre|code|kbd|script|math)[\s>]! 266: in_pre = ($1 != "/") # Opening or closing tag? 267: end 268: else 269: t = token[1] 270: 271: # Remember last char of this token before processing. 272: last_char = t[-1].chr 273: 274: unless in_pre 275: t = process_escapes t 276: 277: t.gsub!(/"/, '"') if convert_quotes 278: 279: if do_dashes 280: t = educate_dashes t if do_dashes == :normal 281: t = educate_dashes_oldschool t if do_dashes == :oldschool 282: t = educate_dashes_inverted t if do_dashes == :inverted 283: end 284: 285: t = educate_ellipses t if do_ellipses 286: 287: # Note: backticks need to be processed before quotes. 288: if do_backticks 289: t = educate_backticks t 290: t = educate_single_backticks t if do_backticks == :both 291: end 292: 293: if do_quotes 294: if t == "'" 295: # Special case: single-character ' token 296: if prev_token_last_char =~ /\S/ 297: t = "’" 298: else 299: t = "‘" 300: end 301: elsif t == '"' 302: # Special case: single-character " token 303: if prev_token_last_char =~ /\S/ 304: t = "”" 305: else 306: t = "“" 307: end 308: else 309: # Normal case: 310: t = educate_quotes t 311: end 312: end 313: 314: t = stupefy_entities t if do_stupefy 315: end 316: 317: prev_token_last_char = last_char 318: result << t 319: end 320: } 321: 322: # Done 323: result 324: end
Return the string, with "``backticks’‘"-style single quotes translated into HTML curly quote entities.
# File rubypants.rb, line 384 384: def educate_backticks(str) 385: str.gsub("``", '“').gsub("''", '”') 386: end
The string, with each instance of "—" translated to an em-dash HTML entity.
# File rubypants.rb, line 347 347: def educate_dashes(str) 348: str.gsub(/--/, '—') 349: end
Return the string, with each instance of "—" translated to an em-dash HTML entity, and each "—" translated to an en-dash HTML entity. Two reasons why: First, unlike the en- and em-dash syntax supported by educate_dashes_oldschool, it’s compatible with existing entries written before SmartyPants 1.1, back when "—" was only used for em-dashes. Second, em-dashes are more common than en-dashes, and so it sort of makes sense that the shortcut should be shorter to type. (Thanks to Aaron Swartz for the idea.)
# File rubypants.rb, line 369 369: def educate_dashes_inverted(str) 370: str.gsub(/---/, '–').gsub(/--/, '—') 371: end
The string, with each instance of "—" translated to an en-dash HTML entity, and each "—" translated to an em-dash HTML entity.
# File rubypants.rb, line 355 355: def educate_dashes_oldschool(str) 356: str.gsub(/---/, '—').gsub(/--/, '–') 357: end
Return the string, with each instance of "…" translated to an ellipsis HTML entity. Also converts the case where there are spaces between the dots.
# File rubypants.rb, line 377 377: def educate_ellipses(str) 378: str.gsub('...', '…').gsub('. . .', '…') 379: end
Return the string, with "educated" curly quote HTML entities.
# File rubypants.rb, line 397 397: def educate_quotes(str) 398: punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^^_`{|}~]' 399: 400: str = str.dup 401: 402: # Special case if the very first character is a quote followed by 403: # punctuation at a non-word-break. Close the quotes by brute 404: # force: 405: str.gsub!(/^'(?=#{punct_class}\B)/, '’') 406: str.gsub!(/^"(?=#{punct_class}\B)/, '”') 407: 408: # Special case for double sets of quotes, e.g.: 409: # <p>He said, "'Quoted' words in a larger quote."</p> 410: str.gsub!(/"'(?=\w)/, '“‘') 411: str.gsub!(/'"(?=\w)/, '‘“') 412: 413: # Special case for decade abbreviations (the '80s): 414: str.gsub!(/'(?=\d\ds)/, '’') 415: 416: close_class = %![^\ \t\r\n\\[\{\(\-]! 417: dec_dashes = '–|—' 418: 419: # Get most opening single quotes: 420: str.gsub!(/(\s| |--|&[mn]dash;|#{dec_dashes}|ȁ[34];)'(?=\w)/, 421: '\1‘') 422: # Single closing quotes: 423: str.gsub!(/(#{close_class})'/, '\1’') 424: str.gsub!(/'(\s|s\b|$)/, '’\1') 425: # Any remaining single quotes should be opening ones: 426: str.gsub!(/'/, '‘') 427: 428: # Get most opening double quotes: 429: str.gsub!(/(\s| |--|&[mn]dash;|#{dec_dashes}|ȁ[34];)"(?=\w)/, 430: '\1“') 431: # Double closing quotes: 432: str.gsub!(/(#{close_class})"/, '\1”') 433: str.gsub!(/"(\s|s\b|$)/, '”\1') 434: # Any remaining quotes should be opening ones: 435: str.gsub!(/"/, '“') 436: 437: str 438: end
Return the string, with "`backticks’"-style single quotes translated into HTML curly quote entities.
# File rubypants.rb, line 391 391: def educate_single_backticks(str) 392: str.gsub("`", '‘').gsub("'", '’') 393: end
Return the string, with after processing the following backslash escape sequences. This is useful if you want to force a "dumb" quote or other character to appear.
Escaped are:
\\ \" \' \. \- \`
# File rubypants.rb, line 335 335: def process_escapes(str) 336: str.gsub('\\\\', '\'). 337: gsub('\"', '"'). 338: gsub("\\\'", '''). 339: gsub('\.', '.'). 340: gsub('\-', '-'). 341: gsub('\`', '`') 342: end
Return the string, with each RubyPants HTML entity translated to its ASCII counterpart.
Note: This is not reversible (but exactly the same as in SmartyPants)
# File rubypants.rb, line 445 445: def stupefy_entities(str) 446: str. 447: gsub(/–/, '-'). # en-dash 448: gsub(/—/, '--'). # em-dash 449: 450: gsub(/‘/, "'"). # open single quote 451: gsub(/’/, "'"). # close single quote 452: 453: gsub(/“/, '"'). # open double quote 454: gsub(/”/, '"'). # close double quote 455: 456: gsub(/…/, '...') # ellipsis 457: end
Return an array of the tokens comprising the string. Each token is either a tag (possibly with nested, tags contained therein, such as <a href="<MTFoo>">, or a run of text between tags. Each element of the array is a two-element array; the first is either :tag or :text; the second is the actual value.
Based on the _tokenize() subroutine from Brad Choate’s MTRegex plugin. <www.bradchoate.com/past/mtregex.php>
This is actually the easier variant using tag_soup, as used by Chad Miller in the Python port of SmartyPants.
# File rubypants.rb, line 471 471: def tokenize 472: tag_soup = /([^<]*)(<[^>]*>)/ 473: 474: tokens = [] 475: 476: prev_end = 0 477: scan(tag_soup) { 478: tokens << [:text, $1] if $1 != "" 479: tokens << [:tag, $2] 480: 481: prev_end = $~.end(0) 482: } 483: 484: if prev_end < size 485: tokens << [:text, self[prev_end..-1]] 486: end 487: 488: tokens 489: end