3. Syntax Highlighting

3.1. Converting Text

Colorizing code has already been touched on briefly in the Introduction chapter (under “Quick Start”). To review, the process is a mere three steps:

  1. Require the class for the type of the output you want (currently, only HTML is supported).
  2. Obtain an instance of the convertor, for the syntax you wish to convert.
  3. Call #convert on that convertor, passing in the text you want to convert. The return value is the HTML representation of the colorized text.

For example:

Colorizing a Ruby script [ruby]
1
2
3
4
5
6
7
8
# Step 1: require the HTML convertor
require 'syntax/convertor/html'

# Step 2: get an instance of the HTML convertor for the Ruby syntax
convertor = Syntax::Convertor::HTML.for_syntax "ruby"

# Step 3: convert the text to HTML
puts convertor.convert( File.read( "program.rb" ) )

3.2. Custom Highlighters

To write your own custom highlighter module, you just need to:

  1. inherit from Syntax::Convertors::Abstract
  2. implement the convert method

You can use the syntax/convertors/html.rb file as an example:

syntax/convertors/html.rb [ruby]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
require 'syntax/convertors/abstract'

module Syntax
  module Convertors

 # A simple class for converting a text into HTML.
    class HTML < Abstract

      # Converts the given text to HTML, using spans to represent token groups
      # of any type but <tt>:normal</tt> (which is always unhighlighted). If
      # +pre+ is +true+, the html is automatically wrapped in pre tags.
      def convert( text, pre=true )
        html = ""
        html << "<pre>" if pre
        regions = []
        @tokenizer.tokenize( text ) do |tok|
          value = html_escape(tok)
   case tok.instruction
            when :region_close then
              regions.pop
              html << "</span>"
            when :region_open then
              regions.push tok.group
              html << "<span class=\"#{tok.group}\">#{value}"
            else
         if tok.group == ( regions.last || :normal )
                html << value
 else
                html << "<span class=\"#{tok.group}\">#{value}</span>"
              end
          end
        end
        html << "</span>" while regions.pop
        html << "</pre>" if pre
        html
      end

      private

        # Replaces some characters with their corresponding HTML entities.
        def html_escape( string )
          string.gsub( /&/, "&amp;" ).
                 gsub( /</, "&lt;" ).
                 gsub( />/, "&gt;" ).
                 gsub( /"/, "&quot;" )
        end

    end

  end
end

Within the #convert method, you will automatically have access to the tokenizer instance variable—instantiated for you by the framework. The rest is up to you.