Changesets can be listed by changeset number.
The Git repository is here.
- Revision:
- 15
- Log:
Attempt to update Typo to a Typo SVN HEAD release from around the
time the prototype installation was set up on the RISC OS Open Limited
web site. Timestamps place this at 04-Jul so a revision from 05-Jul or
earlier was pulled and copied over the 2.6.0 tarball stable code.
- Author:
- adh
- Date:
- Sat Jul 22 23:27:35 +0100 2006
- Size:
- 12173 Bytes
1 | <html> |
2 | <head> |
3 | <title>Syntax Manual :: Chapter 2: Lexical Analysis</title> |
4 | <link type="text/css" rel="stylesheet" href="stylesheets/manual.css" /> |
5 | </head> |
6 | |
7 | <body> |
8 | <div id="banner"> |
9 | <table border='0' cellpadding='0' cellspacing='0' width='100%'> |
10 | <tr><td valign='top' align='left'> |
11 | <div class="title"> |
12 | <span class="product">Syntax—</span><br /> |
13 | <span class="tagline">Lexical Analysis for Syntax Highlighting</span> |
14 | </div> |
15 | </td><td valign='middle' align='right'> |
16 | <div class="info"> |
17 | Syntax Version: <strong>1.0.0</strong><br /> |
18 | Manual Last Updated: <strong>2005-06-18 20:25 UTC</strong> |
19 | </div> |
20 | </td></tr> |
21 | </table> |
22 | </div> |
23 | |
24 | <table border='0' width='100%' cellpadding='0' cellspacing='0'> |
25 | <tr><td valign='top'> |
26 | |
27 | <div id="navigation"> |
28 | <h1>Syntax Manual</h1> |
29 | |
30 | <h2>Chapters</h2> |
31 | <ol type="I"> |
32 | |
33 | <li> |
34 | <a href="chapter-1.html"> |
35 | Introduction |
36 | </a> |
37 | |
38 | <ol type="1"> |
39 | |
40 | <li><a href="chapter-1.html#s1">What is Syntax?</a></li> |
41 | |
42 | <li><a href="chapter-1.html#s2">Quick Start</a></li> |
43 | |
44 | </ol> |
45 | </li> |
46 | |
47 | <li><strong> |
48 | <a href="chapter-2.html"> |
49 | Lexical Analysis |
50 | </a> |
51 | </strong> <big>←</big> |
52 | <ol type="1"> |
53 | |
54 | <li><a href="chapter-2.html#s1">Groups</a></li> |
55 | |
56 | <li><a href="chapter-2.html#s2">Instructions</a></li> |
57 | |
58 | <li><a href="chapter-2.html#s3">Analyzing</a></li> |
59 | |
60 | </ol> |
61 | </li> |
62 | |
63 | <li> |
64 | <a href="chapter-3.html"> |
65 | Syntax Highlighting |
66 | </a> |
67 | |
68 | <ol type="1"> |
69 | |
70 | <li><a href="chapter-3.html#s1">Converting Text</a></li> |
71 | |
72 | <li><a href="chapter-3.html#s2">Custom Highlighters</a></li> |
73 | |
74 | </ol> |
75 | </li> |
76 | |
77 | <li> |
78 | <a href="chapter-4.html"> |
79 | Extending Syntax |
80 | </a> |
81 | |
82 | <ol type="1"> |
83 | |
84 | <li><a href="chapter-4.html#s1">Introduction</a></li> |
85 | |
86 | <li><a href="chapter-4.html#s2">Interface</a></li> |
87 | |
88 | <li><a href="chapter-4.html#s3">Scanning <span class="caps">API</span></a></li> |
89 | |
90 | <li><a href="chapter-4.html#s4">Registering Your New Syntax</a></li> |
91 | |
92 | </ol> |
93 | </li> |
94 | |
95 | </ol> |
96 | |
97 | <h2>Other Documentation</h2> |
98 | |
99 | <ul> |
100 | <li><a href="http://net-ssh.rubyforge.org/api/index.html">Net::SSH API</a></li> |
101 | <li><a href="http://rubyforge.org/tracker/?atid=1842&group_id=274&func=browse">Net::SSH FAQ</a></li> |
102 | </ul> |
103 | |
104 | <h2>Tutorials</h2> |
105 | <ol> |
106 | |
107 | </ol> |
108 | |
109 | <p align="center"><strong>More To Come...</strong></p> |
110 | |
111 | <div class="license"> |
112 | <a href="http://creativecommons.org/licenses/by-sa/2.0/"><img alt="Creative Commons License" border="0" src="http://creativecommons.org/images/public/somerights" /></a><br /> |
113 | This manual is licensed under a <a href="http://creativecommons.org/licenses/by-sa/2.0/">Creative Commons License</a>. |
114 | </div> |
115 | </div> |
116 | |
117 | </td><td valign='top' width="100%"> |
118 | |
119 | <div id="content"> |
120 | |
121 | <div class="top"><div class="prevnext"> |
122 | |
123 | <a href="chapter-1.html">Previous (1. Introduction)</a> | |
124 | |
125 | <a href="index.html">Up</a> |
126 | |
127 | | <a href="chapter-3.html">Next (3. Syntax Highlighting)</a> |
128 | |
129 | </div></div> |
130 | |
131 | <h1>2. Lexical Analysis</h1> |
132 | |
133 | |
134 | |
135 | <h2> |
136 | <a name="s1"></a> |
137 | 2.1. Groups |
138 | </h2> |
139 | |
140 | |
141 | |
142 | <div class="section"> |
143 | <p>Lexical analysis is (at least in part) the process of converting a body of text into <em>tokens</em>. It is also the process of identifying the <em>class</em> of each token. The Syntax library refers to these classes as <em>groups</em>.</p> |
144 | |
145 | |
146 | <p>Each syntax module may define its own groups. The Ruby module, for instance, defines 18 different groups:</p> |
147 | |
148 | |
149 | <ol> |
150 | <li>normal: whitespace and the like. Basically, any text not grouped in any of the other groups.</li> |
151 | <li>comment: the delimiters and contents of a comment</li> |
152 | <li>keyword: any recognized keyword of the Ruby language</li> |
153 | <li>method: the name of a method when it is being declared</li> |
154 | <li>class: the name of a class when it is being declared</li> |
155 | <li>module: the name of a module when it is being declared</li> |
156 | <li>punct: any punctuation character</li> |
157 | <li>symbol: a Ruby symbol token</li> |
158 | <li>string: the contents (but not delimiters) of a string</li> |
159 | <li>char: a character literal (<code>?g</code>)</li> |
160 | <li>ident: an identifier, not otherwise recognized as a keyword</li> |
161 | <li>constant: a constant (beginning with an uppercase letter)</li> |
162 | <li>regex: the contents (but not delimiters) of a regular expression</li> |
163 | <li>number: a numeric literal</li> |
164 | <li>attribute: an instance variable</li> |
165 | <li>global: a global variable</li> |
166 | <li>expr: a nested (interpolated) expression within a string or regex</li> |
167 | <li>escape: an escape squence within a string or regex</li> |
168 | </ol> |
169 | |
170 | |
171 | <p>The only group common to all modules is <code>normal</code>. (When converting text to <span class="caps">HTML</span>, the name of the class used in a span will be the name of the corresponding group—this makes it straightforward to determine what <span class="caps">CSS</span> classes need to be defined.)</p> |
172 | </div> |
173 | |
174 | |
175 | |
176 | <h2> |
177 | <a name="s2"></a> |
178 | 2.2. Instructions |
179 | </h2> |
180 | |
181 | |
182 | |
183 | <div class="section"> |
184 | <p>In addition to groups, each token has an associated <em>instruction</em>. For most tokens, this instruction is the symbol <code>:none</code>, meaning “do nothing special”. However, there are two other instructions defined by the framework:</p> |
185 | |
186 | |
187 | <ul> |
188 | <li><code>:region_open</code>: begin a “region”. This region is a sequence of tokens that are all nested inside the group of the current token. This is useful for strings and regular expressions, which may contain other kinds of tokens (like <code>expr</code> and <code>escape</code>, in Ruby’s case).</li> |
189 | <li><code>:region_close</code>: close the current region.</li> |
190 | </ul> |
191 | |
192 | |
193 | <p>The <span class="caps">HTML</span> convertors uses these instructions to know whether to emit just an opened span tag, or a closed one, or whether to emit both. Other convertors may use these instructions in similar ways.</p> |
194 | </div> |
195 | |
196 | |
197 | |
198 | <h2> |
199 | <a name="s3"></a> |
200 | 2.3. Analyzing |
201 | </h2> |
202 | |
203 | |
204 | |
205 | <div class="section"> |
206 | <p>Lexical analysis is performed by obtaining a tokenizer of the appropriate class and calling <code>tokenize</code> on it, passing the text to be tokenized. Each token is yielded to the associated block as it is discovered.</p> |
207 | |
208 | |
209 | <div class='figure'> |
210 | <span class='caption'>Tokenizing a Ruby script [ruby]</span> |
211 | <div class='body'><table border='0' cellpadding='0' cellspacing='0'><tr><td class='lineno'>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br /></td><td width='100%'><link rel='stylesheet' type='text/css' href='stylesheets/ruby.css' /><div class='ruby'><pre><span class="ident">require</span> <span class="punct">'</span><span class="string">syntax</span><span class="punct">'</span> |
212 | |
213 | <span class="ident">tokenizer</span> <span class="punct">=</span> <span class="constant">Syntax</span><span class="punct">.</span><span class="ident">load</span> <span class="punct">"</span><span class="string">ruby</span><span class="punct">"</span> |
214 | <span class="ident">tokenizer</span><span class="punct">.</span><span class="ident">tokenize</span><span class="punct">(</span> <span class="constant">File</span><span class="punct">.</span><span class="ident">read</span><span class="punct">(</span> <span class="punct">"</span><span class="string">program.rb</span><span class="punct">"</span> <span class="punct">)</span> <span class="punct">)</span> <span class="keyword">do</span> <span class="punct">|</span><span class="ident">token</span><span class="punct">|</span> |
215 | <span class="ident">puts</span> <span class="ident">token</span> |
216 | <span class="ident">puts</span> <span class="punct">"</span><span class="string"> group: <span class="expr">#{token.group}</span></span><span class="punct">"</span> |
217 | <span class="ident">puts</span> <span class="punct">"</span><span class="string"> instruction: <span class="expr">#{token.instruction}</span></span><span class="punct">"</span> |
218 | <span class="keyword">end</span></pre></div></td></tr></table></div></div> |
219 | |
220 | |
221 | <p>If you need finer control over the process, you can use the lower-level <span class="caps">API</span>:</p> |
222 | |
223 | |
224 | <div class='figure'> |
225 | <span class='caption'>Tokenizing a Ruby script via step [ruby]</span> |
226 | <div class='body'><table border='0' cellpadding='0' cellspacing='0'><tr><td class='lineno'>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br /></td><td width='100%'><link rel='stylesheet' type='text/css' href='stylesheets/ruby.css' /><div class='ruby'><pre><span class="ident">require</span> <span class="punct">'</span><span class="string">syntax</span><span class="punct">'</span> |
227 | |
228 | <span class="ident">tokenizer</span> <span class="punct">=</span> <span class="constant">Syntax</span><span class="punct">.</span><span class="ident">load</span> <span class="punct">"</span><span class="string">ruby</span><span class="punct">"</span> |
229 | <span class="ident">tokenizer</span><span class="punct">.</span><span class="ident">start</span><span class="punct">(</span> <span class="constant">File</span><span class="punct">.</span><span class="ident">read</span><span class="punct">(</span> <span class="punct">"</span><span class="string">program.rb</span><span class="punct">"</span> <span class="punct">)</span> <span class="punct">)</span> <span class="keyword">do</span> <span class="punct">|</span><span class="ident">token</span><span class="punct">|</span> |
230 | <span class="ident">puts</span> <span class="ident">token</span> |
231 | <span class="ident">puts</span> <span class="punct">"</span><span class="string"> group: <span class="expr">#{token.group}</span></span><span class="punct">"</span> |
232 | <span class="ident">puts</span> <span class="punct">"</span><span class="string"> instruction: <span class="expr">#{token.instruction}</span></span><span class="punct">"</span> |
233 | <span class="keyword">end</span> |
234 | |
235 | <span class="ident">tokenizer</span><span class="punct">.</span><span class="ident">step</span> |
236 | <span class="ident">tokenizer</span><span class="punct">.</span><span class="ident">step</span> |
237 | <span class="punct">...</span> |
238 | <span class="ident">tokenizer</span><span class="punct">.</span><span class="ident">finish</span></pre></div></td></tr></table></div></div> |
239 | |
240 | |
241 | <p>In this case, each time <code>#step</code> is invoked, it results in tokens being consumed and yielded to the block. However, a single step may result in multiple tokens being detected and yielded—there is no way to guarantee a single token at a time, unless the corresponding syntax module was written to work that way. For efficiency, the existing modules will yield multiple tokens when processing (for instance) strings, regular expressions, and heredocs.</p> |
242 | </div> |
243 | |
244 | |
245 | |
246 | <div class="bottom"><div class="prevnext"> |
247 | |
248 | <a href="chapter-1.html">Previous (1. Introduction)</a> | |
249 | |
250 | <a href="index.html">Up</a> |
251 | |
252 | | <a href="chapter-3.html">Next (3. Syntax Highlighting)</a> |
253 | |
254 | </div></div> |
255 | |
256 | |
257 | </div> |
258 | |
259 | </td></tr> |
260 | </table> |
261 | </body> |
262 | </html> |