Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								e2b9fc75ca 
								
							 
						 
						
							
							
								
								Removed #include for malloc.h  
							
							 
							
							... 
							
							
							
							Apparently some OS' move this to malloc/malloc.h. Since it's not needed lets
just get rid of it. 
							
						 
						
							2014-05-11 21:06:02 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								ba3d96c819 
								
							 
						 
						
							
							
								
								Re-build lexers when base_lexer.rl changes.  
							
							 
							
							... 
							
							
							
							Thanks to @avdi for bringing up on how to do this when using rule() blocks. 
							
						 
						
							2014-05-10 00:28:23 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								19f04f98f7 
								
							 
						 
						
							
							
								
								Support for lexing/parsing inline doctypes.  
							
							 
							
							
							
						 
						
							2014-05-10 00:28:11 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								a92023fe94 
								
							 
						 
						
							
							
								
								Removed outdated paragraph from the README.  
							
							 
							
							... 
							
							
							
							Ironically Oga now uses native extensions for the lexer. 
							
						 
						
							2014-05-09 00:34:25 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								a8bf6be00e 
								
							 
						 
						
							
							
								
								Added a contributing guide.  
							
							 
							
							
							
						 
						
							2014-05-09 00:32:44 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								2dd5d996c4 
								
							 
						 
						
							
							
								
								Travis: don't notify for every failure.  
							
							 
							
							
							
						 
						
							2014-05-08 10:20:35 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								c472ceac6f 
								
							 
						 
						
							
							
								
								Docs for the shared Ragel grammar.  
							
							 
							
							
							
						 
						
							2014-05-08 00:21:23 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								98db796205 
								
							 
						 
						
							
							
								
								Updated editor configuration.  
							
							 
							
							
							
						 
						
							2014-05-08 00:17:12 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								51c1f3c32d 
								
							 
						 
						
							
							
								
								Updated the README.  
							
							 
							
							
							
						 
						
							2014-05-08 00:15:54 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								fe74d60138 
								
							 
						 
						
							
							
								
								Manually bootstrap JRuby after all.  
							
							 
							
							... 
							
							
							
							After discussing this with @headius I've decided to do this the manual way
anyway. Apparently the basic load service stuff is deprecated and not very
reliable. 
							
						 
						
							2014-05-07 22:32:34 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								90fabe3f21 
								
							 
						 
						
							
							
								
								Compile when running `rake generate`.  
							
							 
							
							
							
						 
						
							2014-05-07 20:07:31 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								3c621bf22e 
								
							 
						 
						
							
							
								
								Removed the manifest file + task.  
							
							 
							
							... 
							
							
							
							Using a Dir.glob() is much easier when dealing with a bunch of generated files. 
							
						 
						
							2014-05-07 11:11:29 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								ee78b2c382 
								
							 
						 
						
							
							
								
								Don't redefine namespaces in C.  
							
							 
							
							... 
							
							
							
							The Oga::XML namespace should be set up by Ruby, not by C. 
							
						 
						
							2014-05-07 10:52:06 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								bbdc7966db 
								
							 
						 
						
							
							
								
								Documentation for the JRuby extension.  
							
							 
							
							
							
						 
						
							2014-05-07 10:24:24 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								3afef5f7cc 
								
							 
						 
						
							
							
								
								Lexer support for JRuby.  
							
							 
							
							... 
							
							
							
							JRuby now passes all tests. Benchmark wise it completes the big XML benchmark
in about 500-600 milliseconds. 
							
						 
						
							2014-05-07 09:40:22 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								b9a4038e42 
								
							 
						 
						
							
							
								
								Callback boilerplate for the Java lexer.  
							
							 
							
							
							
						 
						
							2014-05-07 01:01:24 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								e271298984 
								
							 
						 
						
							
							
								
								Use macros in the C lexer.  
							
							 
							
							
							
						 
						
							2014-05-07 00:57:25 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								f25f8a3d15 
								
							 
						 
						
							
							
								
								Break up the Ragel C grammar.  
							
							 
							
							... 
							
							
							
							The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code. 
							
						 
						
							2014-05-07 00:50:34 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								49939fa687 
								
							 
						 
						
							
							
								
								Updated editor configuration.  
							
							 
							
							
							
						 
						
							2014-05-07 00:33:24 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								9abc5c1c92 
								
							 
						 
						
							
							
								
								Separated the Java and C ext codebases.  
							
							 
							
							
							
						 
						
							2014-05-07 00:29:10 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								b8efed5177 
								
							 
						 
						
							
							
								
								Renamed on_start_doctype to on_doctype_start.  
							
							 
							
							
							
						 
						
							2014-05-06 23:18:44 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								f39fe5d857 
								
							 
						 
						
							
							
								
								JRuby lexer boilerplate with actual input.  
							
							 
							
							... 
							
							
							
							This doesn't actually lex anything just yet but at least the input from Ruby is
in place. 
							
						 
						
							2014-05-06 22:43:55 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								fea5ec7946 
								
							 
						 
						
							
							
								
								Removed the package line in LibogaService.java  
							
							 
							
							
							
						 
						
							2014-05-06 20:52:42 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								2053018d07 
								
							 
						 
						
							
							
								
								Slap JRuby so that it can load the .jar file.  
							
							 
							
							
							
						 
						
							2014-05-06 20:45:26 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								6e685378e0 
								
							 
						 
						
							
							
								
								Setup Ragel for JRuby and load things the hard way  
							
							 
							
							
							
						 
						
							2014-05-06 19:06:04 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								aea8378fbb 
								
							 
						 
						
							
							
								
								Removed Cliver from the parser task.  
							
							 
							
							
							
						 
						
							2014-05-06 15:25:57 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								00e778d0d9 
								
							 
						 
						
							
							
								
								Removed unused cliver require.  
							
							 
							
							... 
							
							
							
							Dumbass. 
							
						 
						
							2014-05-06 13:59:26 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								127aea5ca6 
								
							 
						 
						
							
							
								
								Remove the Java output when cleaning.  
							
							 
							
							
							
						 
						
							2014-05-06 10:24:57 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								64c9e18651 
								
							 
						 
						
							
							
								
								Setup for Java and Ragel.  
							
							 
							
							
							
						 
						
							2014-05-06 10:24:07 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								2652bc0103 
								
							 
						 
						
							
							
								
								Removed Cliver as a dependency.  
							
							 
							
							... 
							
							
							
							Since I'm not using any Ragel version specific features it's not really needed
to check for the version. 
							
						 
						
							2014-05-06 10:18:52 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								eeeeb0efad 
								
							 
						 
						
							
							
								
								Don't track the generated Java lexer.  
							
							 
							
							
							
						 
						
							2014-05-06 10:11:19 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								d2742cfdde 
								
							 
						 
						
							
							
								
								Use 4 spaces for C/Java code.  
							
							 
							
							
							
						 
						
							2014-05-06 09:41:36 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								4e2dca2fd9 
								
							 
						 
						
							
							
								
								Updated the list of files to clean.  
							
							 
							
							
							
						 
						
							2014-05-06 09:29:02 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								b9cb7c2d7c 
								
							 
						 
						
							
							
								
								Corrected various extension paths.  
							
							 
							
							
							
						 
						
							2014-05-06 08:47:02 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								01a4a53a53 
								
							 
						 
						
							
							
								
								Merge branch 'native-ext' of github.com:YorickPeterse/oga into native-ext  
							
							 
							
							
							
						 
						
							2014-05-06 08:44:57 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								c30d3a7627 
								
							 
						 
						
							
							
								
								Half-assed JRuby boilerplate.  
							
							 
							
							... 
							
							
							
							Blowing my brains out over getting this fat pig to do what I want but we're
getting there. 
							
						 
						
							2014-05-06 00:23:07 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								2b3a6be24d 
								
							 
						 
						
							
							
								
								Use liboga as a prefix in the C code.  
							
							 
							
							... 
							
							
							
							Namespaces? What are those? 
							
						 
						
							2014-05-05 21:19:50 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								ee756037e7 
								
							 
						 
						
							
							
								
								Removed unused YARD tag.  
							
							 
							
							
							
						 
						
							2014-05-05 09:45:10 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								aeab885a7f 
								
							 
						 
						
							
							
								
								Docs for the Ruby part of the XML lexer.  
							
							 
							
							
							
						 
						
							2014-05-05 09:44:35 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								57fd4dff64 
								
							 
						 
						
							
							
								
								Docs for the C lexer.  
							
							 
							
							
							
						 
						
							2014-05-05 09:40:08 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								335f3cc6d6 
								
							 
						 
						
							
							
								
								Use rb_enc_str_new instead of rb_enc_str_new_cstr.  
							
							 
							
							... 
							
							
							
							The latter in combination with strndup() would leak large amounts of memory. 
							
						 
						
							2014-05-05 00:34:19 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								2689d3f65a 
								
							 
						 
						
							
							
								
								Initial setup using a C extension.  
							
							 
							
							... 
							
							
							
							While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.
To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.
The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster. 
							
						 
						
							2014-05-05 00:31:28 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								baaa24a760 
								
							 
						 
						
							
							
								
								Indentation fix in the lexer.  
							
							 
							
							
							
						 
						
							2014-05-04 18:06:43 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								f18e8893de 
								
							 
						 
						
							
							
								
								Removed the buffering crap from the lexer.  
							
							 
							
							
							
						 
						
							2014-05-04 17:39:08 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								57255012b7 
								
							 
						 
						
							
							
								
								Patch the Ragel lexer after generating it.  
							
							 
							
							... 
							
							
							
							This further increases throughput of the lexer. On MRI this seems to save
around one second or so. It now sits at ~6,8 seconds in the big XML benchmark.
On JRuby, combined with some JIT options and invoke dynamic enabled, this can
reduce the average lexing time to around 3,5 seconds.  Rubinius, also with a
few aggressive JIT options, seems to stick around 9 seocnds. 
							
						 
						
							2014-05-02 00:40:10 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								9dfdefee47 
								
							 
						 
						
							
							
								
								Removed XML::Lexer#buffering?  
							
							 
							
							... 
							
							
							
							Instead of wrapping a predicate method around the ivar we'll just access it
directly. This reduces average lexing times in the big XML benchmark from 7,5
to ~7 seconds. 
							
						 
						
							2014-05-01 22:59:56 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								b854f737cd 
								
							 
						 
						
							
							
								
								Run memory profiling for 60 seconds.  
							
							 
							
							
							
						 
						
							2014-05-01 21:47:51 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								676a5333c0 
								
							 
						 
						
							
							
								
								Use a default gnuplot script.  
							
							 
							
							
							
						 
						
							2014-05-01 21:27:08 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								3344f373bd 
								
							 
						 
						
							
							
								
								Plot time offsets on X axes when profiling.  
							
							 
							
							
							
						 
						
							2014-05-01 21:26:05 +02:00  
						
					 
				
					
						
							
							
								 
								Yorick Peterse
							
						 
						
							 
							
							
							
							
								
							
							
								f4a71d7f63 
								
							 
						 
						
							
							
								
								Use wx as a gnuplot terminal.  
							
							 
							
							... 
							
							
							
							This allows users to zoom in and such, which doesn't work on the qt terminal
for some reason. 
							
						 
						
							2014-05-01 21:01:25 +02:00