Add a wasm browser based playground#41
Conversation
…o accept "'\''" literal.
… emit an error message.
…ype to store bitfields.
…w it only work on literals and only for ASCII
…st is failing and need review.
|
Thanks very much Domingo. There are lots of great changes here. I had a go with the playground and it's amazing. Great work! Do you mind if I share the playground link with a few people? Having seen the railroad diagram generator at https://www.bottlecaps.de/rr/ui I'm convinced that's a useful addition to It's much easier for me, and you're more likely to get a prompt response, if I can deal with these queries and changes in smaller chunks. If you email small queries directly, instead of commenting on the #11, I'll be able to respond faster. If you have smaller PRs then we can also get things merged or given feedback faster too. I've taken the changes that I could and merged them to main. I've rebased the remaining changes to the branch playground-2023-07-16 to hopefully make it convenient for you. But I'll also reply with my thoughts to the remaining commits here. Some of those changes I think are better placed in a separate repository with the playground itself rather than in lalr. Thanks, |
| const ParserSymbol* symbol = reinterpret_cast<const ParserSymbol*>( lexer_.symbol() ); | ||
| while ( parse(symbol, lexer_.lexeme(), lexer_.line(), lexer_.column()) ) | ||
| { | ||
| if(lexer_.full()) break; |
There was a problem hiding this comment.
Yes, there is some grammars that enter a endless loop because the lexer doesn't advance.
I don't know exactly which ones trigger the bug but you can try it with this script:
#!/bin/sh
basep=playground
checkGrammar() {
echo Now testing $1 $2
/usr/bin/time ./grammar_test-clang -g $basep/$1 -i $basep/$2
}
checkGrammar json3.g test.json.txt
checkGrammar lua.g test.lua
checkGrammar carbon-lang.g prelude.carbon
checkGrammar postgresql-16.g test.sql
#checkGrammar cxx-parser.g test.cpp
checkGrammar lsl_ext.g test.lsl
checkGrammar bison.g carbon-lang.y
checkGrammar bison-bug.g carbon-lang.y
checkGrammar dparser.g test.dparser
checkGrammar parse_gen.g test.parse_gen
checkGrammar tameparser.g test.tameparser
checkGrammar javascript.g test.js
checkGrammar javascript-core.g test.js
checkGrammar cparser.g test.c
checkGrammar java11.g test.java
checkGrammar rust.g test.rs
checkGrammar go.g test.go
checkGrammar php-8.2.g test.php
checkGrammar gringo-ng.g test.clingo
checkGrammar ada-adayacc.g test.adb
Build script:
#!/bin/sh
umask 022
myflags="-O2 -g"
#myflags="-O2 -g -m32"
#myflags="-g"
clang-16-env clang++ \
-std=c++17 $myflags -Wall -Wextra -Wno-unused-function -pedantic \
-Isrc -DLALR_NO_THREADS \
src/lalr/ErrorPolicy.cpp \
src/lalr/Grammar.cpp \
src/lalr/GrammarCompiler.cpp \
src/lalr/GrammarGenerator.cpp \
src/lalr/GrammarParser.cpp \
src/lalr/GrammarState.cpp \
src/lalr/GrammarSymbol.cpp \
src/lalr/GrammarSymbolSet.cpp \
src/lalr/GrammarTransition.cpp \
src/lalr/RegexCompiler.cpp \
src/lalr/RegexGenerator.cpp \
src/lalr/RegexItem.cpp \
src/lalr/RegexNode.cpp \
src/lalr/RegexParser.cpp \
src/lalr/RegexState.cpp \
src/lalr/RegexSyntaxTree.cpp \
src/lalr/RegexToken.cpp \
src/lalr/lalr_examples/grammar_test.cpp \
-o grammar_test-clang
grammar_test.cpp:
#include <stdio.h>
#include <stdarg.h>
#include <lalr/GrammarCompiler.hpp>
#include <lalr/Parser.hpp>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <time.h>
static int errors_ = 0;
typedef unsigned char mychar_t;
static void show_error( const char* format, ... )
{
++errors_;
va_list args;
va_start( args, format );
vfprintf( stderr, format, args );
va_end( args );
}
int read_file(const char *fname, std::vector<mychar_t> &content)
{
struct stat stat;
int result = ::stat( fname, &stat );
if ( result != 0 )
{
show_error( "Stat failed on '%s' - result=%d\n", fname, result );
return EXIT_FAILURE;
}
FILE* file = fopen( fname, "rb" );
if ( !file )
{
show_error( "Opening '%s' to read failed - errno=%d\n", fname, errno );
return EXIT_FAILURE;
}
int size = stat.st_size;
content.resize( size+1 );
int read = int( fread(&content[0], sizeof(mychar_t), size, file) );
fclose( file );
file = nullptr;
if ( read != size )
{
show_error( "Reading grammar from '%s' failed - read=%d\n", fname, int(read) );
return EXIT_FAILURE;
}
content[size] = '\0';
return EXIT_SUCCESS;
}
static clock_t start_time;
clock_t myShowDiffTime(const char *title)
{
clock_t now = clock();
clock_t diff = now - start_time;
int msec = diff * 1000 / CLOCKS_PER_SEC;
printf("%s: Time taken %d seconds %d milliseconds\n", title, msec/1000, msec%1000);
start_time = now;
return now;
}
struct C_MultLineCommentLexer
{
static lalr::PositionIterator<const mychar_t*> string_lexer( const lalr::PositionIterator<const mychar_t*>& begin,
const lalr::PositionIterator<const mychar_t*>& end,
std::basic_string<mychar_t>* lexeme,
const void** /*symbol*/ )
{
LALR_ASSERT( lexeme );
lexeme->clear();
//printf("C_MultLineCommentLexer : %s\n", lexeme->c_str());
bool done = false;
lalr::PositionIterator<const mychar_t*> i = begin;
while ( i != end && !done)
{
switch( *i )
{
case '*':
++i;
if(i != end && *i == '/') done = true;
continue;
break;
}
++i;
}
if ( i != end )
{
LALR_ASSERT( *i == '/' );
++i;
}
return i;
}
};
struct AstUserDataDbg {
int index;
int stack_index;
static int next_index;;
static int total;
AstUserDataDbg():index(total++), stack_index(next_index++) {};
};
int AstUserDataDbg::next_index = 0;
int AstUserDataDbg::total = 0;
static bool astMakerDbg( AstUserDataDbg& result, const AstUserDataDbg* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
// //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
// const char *lexstr = (length > 0 ? (const char *)nodes[0].lexeme().c_str() : "::lnull");
// const char *idstr = (length > 0 ? nodes[0].symbol()->identifier : "::inull");
// int line = (length > 0 ? nodes[0].line() : 0);
// int column = (length > 0 ? nodes[0].column() : 0);
// //const char *stateLabel = (length > 0 ? nodes[0].state()->label : "::inull");
// printf("astMaker: %p\t%zd:%d:%d\t%p\t%zd\t->\t%s : %s :%d:%d\n", start, length,
// length ? start->index : -1, length ? start->stack_index : -1,
// nodes, length, idstr, lexstr, line, column);
printf("----\n");
for(size_t i=0; i< length; ++i)
printf("%zd:%d\t%p\t%d:%d\t%p <:> %s <:> %s <:> %s <:> %d:%d\n", i, nodes[i].symbol()->type,
start+i, start[i].index, start[i].stack_index, nodes+i,
nodes[i].symbol()->identifier, nodes[i].symbol()->lexeme,
nodes[i].lexeme().c_str(), nodes[i].line(), nodes[i].column());
return true;
}
struct ParseTreeUserData {
std::vector<ParseTreeUserData> children;
const lalr::ParserSymbol *symbol;
std::basic_string<mychar_t> lexeme; ///< The lexeme at this node (empty if this node's symbol is non-terminal).
ParseTreeUserData():children(0),symbol(nullptr) {};
};
static bool parsetreeMaker( ParseTreeUserData& result, const ParseTreeUserData* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
if(length == 0) return false;
result.symbol = nodes[length-1].state()->transitions->reduced_symbol;
for(size_t i_node = 0; i_node < length; ++i_node)
{
const lalr::ParserNode<mychar_t>& the_node = nodes[i_node];
switch(the_node.symbol()->type)
{
case lalr::SymbolType::SYMBOL_TERMINAL:
{
ParseTreeUserData& udt = result.children.emplace_back();
udt.symbol = the_node.symbol();
udt.lexeme = the_node.lexeme();
//printf("TERMINAL: %s : %s\n", udt.symbol->identifier, udt.lexeme.c_str());
}
break;
case lalr::SymbolType::SYMBOL_NON_TERMINAL:
{
if(the_node.symbol() == result.symbol)
{
const ParseTreeUserData& startx = start[i_node];
for (std::vector<ParseTreeUserData>::const_iterator child = startx.children.begin(); child != startx.children.end(); ++child)
{
result.children.push_back( std::move(*child) );
}
}
else
{
ParseTreeUserData& udt = result.children.emplace_back();
udt.symbol = the_node.symbol();
if(udt.symbol == start[i_node].symbol)
{
udt.children = start[i_node].children;
}
else
udt.children.push_back(std::move(start[i_node]));
}
//printf("NON_TERMINAL: %s\n", result.symbol->identifier);
}
break;
default:
//LALR_ASSERT( ?? );
printf("Unexpected symbol %p\n", the_node.symbol());
}
}
return true;
}
static void indent( int level )
{
for ( int i = 0; i < level; ++i )
{
printf( " |" );
}
}
static void print_parsetree( const ParseTreeUserData& ast, int level )
{
if(ast.symbol)
{
indent( level );
switch(ast.symbol->type)
{
case lalr::SymbolType::SYMBOL_TERMINAL:
if(ast.lexeme.size())
{
//indent( level -1);
printf("%s -> %s\n", ast.symbol->identifier, ast.lexeme.c_str());
}
break;
case lalr::SymbolType::SYMBOL_NON_TERMINAL:
//indent( level );
printf("%s\n", ast.symbol->lexeme);
break;
}
}
for (std::vector<ParseTreeUserData>::const_iterator child = ast.children.begin(); child != ast.children.end(); ++child)
{
print_parsetree( *child, ast.symbol ? (level + 1) : level );
}
}
#include <locale.h>
int main(int argc, char *argv[])
{
const char *grammar_fn = nullptr;
const char *input_fn = nullptr;
bool dumpLexer = false;
start_time = clock();
setlocale(LC_NUMERIC, "");
std::vector<char> grammar_txt;
std::vector<mychar_t> input_txt;
if ( argc < 2 )
{
printf( "%s -g|--grammar grammar_fname -i|--input input_fname -d|--dumpLex\n", argv[0] );
printf( "\n" );
return EXIT_FAILURE;
}
int argi = 1;
while ( argi < argc )
{
if ( strcmp(argv[argi], "-g") == 0 || strcmp(argv[argi], "--grammar") == 0 )
{
grammar_fn = argv[argi + 1];
argi += 2;
}
else if ( strcmp(argv[argi], "-i") == 0 || strcmp(argv[argi], "--input") == 0 )
{
input_fn = argv[argi + 1];
argi += 2;
}
else if ( strcmp(argv[argi], "-d") == 0 || strcmp(argv[argi], "--dumpLex") == 0 )
{
dumpLexer = true;
argi += 1;
}
}
if(grammar_fn != nullptr)
{
int rc = read_file(grammar_fn, (std::vector<mychar_t>&)grammar_txt);
if(rc != EXIT_SUCCESS) return rc;
size_t grammar_txt_size = grammar_txt.size()-1; //-1 to account for the '\0' terminator
myShowDiffTime("read grammar");
printf("Grammar size = %d\n", (int)grammar_txt_size);
lalr::GrammarCompiler compiler;
lalr::ErrorPolicy error_policy;
int errors = compiler.compile( &grammar_txt[0], &grammar_txt[0] + grammar_txt_size, &error_policy );
myShowDiffTime("compile grammar");
if(errors != 0)
{
printf("Error count = %d\n", errors);
return EXIT_FAILURE;
}
compiler.showStats();
if(input_fn != nullptr)
{
rc = read_file(input_fn, input_txt);
if(rc != EXIT_SUCCESS) return rc;
size_t input_txt_size = input_txt.size()-1; //-1 to account for the '\0' terminator
myShowDiffTime("read input");
printf("Input size = %d\n", (int)input_txt_size);
lalr::ErrorPolicy error_policy_input;
lalr::Parser<const mychar_t*, ParseTreeUserData> parser( compiler.parser_state_machine(), &error_policy_input );
parser.set_default_action_handler(parsetreeMaker);
//lalr::Parser<const mychar_t*, AstUserDataDbg> parser( compiler.parser_state_machine(), &error_policy_input );
//parser.set_default_action_handler(astMakerDbg);
//lalr::Parser<const mychar_t*, int> parser( compiler.parser_state_machine(), &error_policy_input );
parser.lexer_action_handlers()
( "C_MultilineComment", &C_MultLineCommentLexer::string_lexer )
;
if(dumpLexer) parser.dumpLex( &input_txt[0], &input_txt[0] + input_txt_size );
else parser.parse( &input_txt[0], &input_txt[0] + input_txt_size );
myShowDiffTime("parse input");
printf( "accepted = %d, full = %d\n", parser.accepted(), parser.full());
if(parser.accepted() && parser.full())
{
print_parsetree( parser.user_data(), 0 );
}
}
}
return EXIT_SUCCESS;
}
|
Of course I don't mind share the playground link, ideally it'll be moved to github pages. I'm glad that we can join efforts to build an amazing tool to facilitate write/debug/develop grammars. Thank you again for your great work ! |
|
Actually I can't comment on individual commits from the PR so I'll just do it here: Fix to detect identifiers referenced in rules but not defined: Make possible to accept associativity/precedence syntax like bison/byacc: Check if '%whitespace' directive is present in the grammar and if not…: Add code to allow generate an EBNF for railroad diagram generation: Add method to dump the input from the lexer: Add a method to show grammar compilation stats.: Add a naive implementation of "%case_insensitive" directive, right no…: Make trivial methods inline.: |
|
Generally I think the playground directory should be a separate repository that uses a submodule or some dependency mechanism to bring in lalr. Then all of the output specific to the playground can go there too. I like that because that keeps lalr as a smaller, simpler C++ library. I think that also frees you up to not depend on me for PRs and feedback in a lot of cases. Thanks heaps, |
…ring like 'error' inside 'errors'
…ay I can generate a better parse tree.
…the line end is multi character like of '\r\n'
This is the first version of a wasm browser based playground to
lalr.