Add a wasm browser based playground by mingodad · Pull Request #41 · cwbaker/lalr

mingodad · 2023-06-23T09:59:17Z

This is the first version of a wasm browser based playground to lalr.

…o accept "'\''" literal.

… emit an error message.

…ype to store bitfields.

…rors.

…w it only work on literals and only for ASCII

…lementation

…st is failing and need review.

cwbaker · 2023-07-16T07:50:04Z

Thanks very much Domingo. There are lots of great changes here.

I had a go with the playground and it's amazing. Great work! Do you mind if I share the playground link with a few people?

Having seen the railroad diagram generator at https://www.bottlecaps.de/rr/ui I'm convinced that's a useful addition to lalrc. Cheers for pointing that out.

It's much easier for me, and you're more likely to get a prompt response, if I can deal with these queries and changes in smaller chunks. If you email small queries directly, instead of commenting on the #11, I'll be able to respond faster. If you have smaller PRs then we can also get things merged or given feedback faster too.

I've taken the changes that I could and merged them to main. I've rebased the remaining changes to the branch playground-2023-07-16 to hopefully make it convenient for you. But I'll also reply with my thoughts to the remaining commits here. Some of those changes I think are better placed in a separate repository with the playground itself rather than in lalr.

Thanks,
Charles

cwbaker · 2023-07-16T08:01:04Z

        const ParserSymbol* symbol = reinterpret_cast<const ParserSymbol*>( lexer_.symbol() );
        while ( parse(symbol, lexer_.lexeme(), lexer_.line(), lexer_.column()) )
        {
+            if(lexer_.full()) break;


Is this fixing a bug?

Yes, there is some grammars that enter a endless loop because the lexer doesn't advance.
I don't know exactly which ones trigger the bug but you can try it with this script:

#!/bin/sh basep=playground checkGrammar() { echo Now testing $1 $2 /usr/bin/time ./grammar_test-clang -g $basep/$1 -i $basep/$2 } checkGrammar json3.g test.json.txt checkGrammar lua.g test.lua checkGrammar carbon-lang.g prelude.carbon checkGrammar postgresql-16.g test.sql #checkGrammar cxx-parser.g test.cpp checkGrammar lsl_ext.g test.lsl checkGrammar bison.g carbon-lang.y checkGrammar bison-bug.g carbon-lang.y checkGrammar dparser.g test.dparser checkGrammar parse_gen.g test.parse_gen checkGrammar tameparser.g test.tameparser checkGrammar javascript.g test.js checkGrammar javascript-core.g test.js checkGrammar cparser.g test.c checkGrammar java11.g test.java checkGrammar rust.g test.rs checkGrammar go.g test.go checkGrammar php-8.2.g test.php checkGrammar gringo-ng.g test.clingo checkGrammar ada-adayacc.g test.adb

Build script:

#!/bin/sh umask 022 myflags="-O2 -g" #myflags="-O2 -g -m32" #myflags="-g" clang-16-env clang++ \ -std=c++17 $myflags -Wall -Wextra -Wno-unused-function -pedantic \ -Isrc -DLALR_NO_THREADS \ src/lalr/ErrorPolicy.cpp \ src/lalr/Grammar.cpp \ src/lalr/GrammarCompiler.cpp \ src/lalr/GrammarGenerator.cpp \ src/lalr/GrammarParser.cpp \ src/lalr/GrammarState.cpp \ src/lalr/GrammarSymbol.cpp \ src/lalr/GrammarSymbolSet.cpp \ src/lalr/GrammarTransition.cpp \ src/lalr/RegexCompiler.cpp \ src/lalr/RegexGenerator.cpp \ src/lalr/RegexItem.cpp \ src/lalr/RegexNode.cpp \ src/lalr/RegexParser.cpp \ src/lalr/RegexState.cpp \ src/lalr/RegexSyntaxTree.cpp \ src/lalr/RegexToken.cpp \ src/lalr/lalr_examples/grammar_test.cpp \ -o grammar_test-clang

grammar_test.cpp:

#include <stdio.h> #include <stdarg.h> #include <lalr/GrammarCompiler.hpp> #include <lalr/Parser.hpp> #include <string.h> #include <errno.h> #include <sys/stat.h> #include <time.h> static int errors_ = 0; typedef unsigned char mychar_t; static void show_error( const char* format, ... ) { ++errors_; va_list args; va_start( args, format ); vfprintf( stderr, format, args ); va_end( args ); } int read_file(const char *fname, std::vector<mychar_t> &content) { struct stat stat; int result = ::stat( fname, &stat ); if ( result != 0 ) { show_error( "Stat failed on '%s' - result=%d\n", fname, result ); return EXIT_FAILURE; } FILE* file = fopen( fname, "rb" ); if ( !file ) { show_error( "Opening '%s' to read failed - errno=%d\n", fname, errno ); return EXIT_FAILURE; } int size = stat.st_size; content.resize( size+1 ); int read = int( fread(&content[0], sizeof(mychar_t), size, file) ); fclose( file ); file = nullptr; if ( read != size ) { show_error( "Reading grammar from '%s' failed - read=%d\n", fname, int(read) ); return EXIT_FAILURE; } content[size] = '\0'; return EXIT_SUCCESS; } static clock_t start_time; clock_t myShowDiffTime(const char *title) { clock_t now = clock(); clock_t diff = now - start_time; int msec = diff * 1000 / CLOCKS_PER_SEC; printf("%s: Time taken %d seconds %d milliseconds\n", title, msec/1000, msec%1000); start_time = now; return now; } struct C_MultLineCommentLexer { static lalr::PositionIterator<const mychar_t*> string_lexer( const lalr::PositionIterator<const mychar_t*>& begin, const lalr::PositionIterator<const mychar_t*>& end, std::basic_string<mychar_t>* lexeme, const void** /*symbol*/ ) { LALR_ASSERT( lexeme ); lexeme->clear(); //printf("C_MultLineCommentLexer : %s\n", lexeme->c_str()); bool done = false; lalr::PositionIterator<const mychar_t*> i = begin; while ( i != end && !done) { switch( *i ) { case '*': ++i; if(i != end && *i == '/') done = true; continue; break; } ++i; } if ( i != end ) { LALR_ASSERT( *i == '/' ); ++i; } return i; } }; struct AstUserDataDbg { int index; int stack_index; static int next_index;; static int total; AstUserDataDbg():index(total++), stack_index(next_index++) {}; }; int AstUserDataDbg::next_index = 0; int AstUserDataDbg::total = 0; static bool astMakerDbg( AstUserDataDbg& result, const AstUserDataDbg* start, const lalr::ParserNode<mychar_t>* nodes, size_t length ) { // //printf("astMaker: %s\n", nodes[0].lexeme().c_str()); // const char *lexstr = (length > 0 ? (const char *)nodes[0].lexeme().c_str() : "::lnull"); // const char *idstr = (length > 0 ? nodes[0].symbol()->identifier : "::inull"); // int line = (length > 0 ? nodes[0].line() : 0); // int column = (length > 0 ? nodes[0].column() : 0); // //const char *stateLabel = (length > 0 ? nodes[0].state()->label : "::inull"); // printf("astMaker: %p\t%zd:%d:%d\t%p\t%zd\t->\t%s : %s :%d:%d\n", start, length, // length ? start->index : -1, length ? start->stack_index : -1, // nodes, length, idstr, lexstr, line, column); printf("----\n"); for(size_t i=0; i< length; ++i) printf("%zd:%d\t%p\t%d:%d\t%p <:> %s <:> %s <:> %s <:> %d:%d\n", i, nodes[i].symbol()->type, start+i, start[i].index, start[i].stack_index, nodes+i, nodes[i].symbol()->identifier, nodes[i].symbol()->lexeme, nodes[i].lexeme().c_str(), nodes[i].line(), nodes[i].column()); return true; } struct ParseTreeUserData { std::vector<ParseTreeUserData> children; const lalr::ParserSymbol *symbol; std::basic_string<mychar_t> lexeme; ///< The lexeme at this node (empty if this node's symbol is non-terminal). ParseTreeUserData():children(0),symbol(nullptr) {}; }; static bool parsetreeMaker( ParseTreeUserData& result, const ParseTreeUserData* start, const lalr::ParserNode<mychar_t>* nodes, size_t length ) { if(length == 0) return false; result.symbol = nodes[length-1].state()->transitions->reduced_symbol; for(size_t i_node = 0; i_node < length; ++i_node) { const lalr::ParserNode<mychar_t>& the_node = nodes[i_node]; switch(the_node.symbol()->type) { case lalr::SymbolType::SYMBOL_TERMINAL: { ParseTreeUserData& udt = result.children.emplace_back(); udt.symbol = the_node.symbol(); udt.lexeme = the_node.lexeme(); //printf("TERMINAL: %s : %s\n", udt.symbol->identifier, udt.lexeme.c_str()); } break; case lalr::SymbolType::SYMBOL_NON_TERMINAL: { if(the_node.symbol() == result.symbol) { const ParseTreeUserData& startx = start[i_node]; for (std::vector<ParseTreeUserData>::const_iterator child = startx.children.begin(); child != startx.children.end(); ++child) { result.children.push_back( std::move(*child) ); } } else { ParseTreeUserData& udt = result.children.emplace_back(); udt.symbol = the_node.symbol(); if(udt.symbol == start[i_node].symbol) { udt.children = start[i_node].children; } else udt.children.push_back(std::move(start[i_node])); } //printf("NON_TERMINAL: %s\n", result.symbol->identifier); } break; default: //LALR_ASSERT( ?? ); printf("Unexpected symbol %p\n", the_node.symbol()); } } return true; } static void indent( int level ) { for ( int i = 0; i < level; ++i ) { printf( " |" ); } } static void print_parsetree( const ParseTreeUserData& ast, int level ) { if(ast.symbol) { indent( level ); switch(ast.symbol->type) { case lalr::SymbolType::SYMBOL_TERMINAL: if(ast.lexeme.size()) { //indent( level -1); printf("%s -> %s\n", ast.symbol->identifier, ast.lexeme.c_str()); } break; case lalr::SymbolType::SYMBOL_NON_TERMINAL: //indent( level ); printf("%s\n", ast.symbol->lexeme); break; } } for (std::vector<ParseTreeUserData>::const_iterator child = ast.children.begin(); child != ast.children.end(); ++child) { print_parsetree( *child, ast.symbol ? (level + 1) : level ); } } #include <locale.h> int main(int argc, char *argv[]) { const char *grammar_fn = nullptr; const char *input_fn = nullptr; bool dumpLexer = false; start_time = clock(); setlocale(LC_NUMERIC, ""); std::vector<char> grammar_txt; std::vector<mychar_t> input_txt; if ( argc < 2 ) { printf( "%s -g|--grammar grammar_fname -i|--input input_fname -d|--dumpLex\n", argv[0] ); printf( "\n" ); return EXIT_FAILURE; } int argi = 1; while ( argi < argc ) { if ( strcmp(argv[argi], "-g") == 0 || strcmp(argv[argi], "--grammar") == 0 ) { grammar_fn = argv[argi + 1]; argi += 2; } else if ( strcmp(argv[argi], "-i") == 0 || strcmp(argv[argi], "--input") == 0 ) { input_fn = argv[argi + 1]; argi += 2; } else if ( strcmp(argv[argi], "-d") == 0 || strcmp(argv[argi], "--dumpLex") == 0 ) { dumpLexer = true; argi += 1; } } if(grammar_fn != nullptr) { int rc = read_file(grammar_fn, (std::vector<mychar_t>&)grammar_txt); if(rc != EXIT_SUCCESS) return rc; size_t grammar_txt_size = grammar_txt.size()-1; //-1 to account for the '\0' terminator myShowDiffTime("read grammar"); printf("Grammar size = %d\n", (int)grammar_txt_size); lalr::GrammarCompiler compiler; lalr::ErrorPolicy error_policy; int errors = compiler.compile( &grammar_txt[0], &grammar_txt[0] + grammar_txt_size, &error_policy ); myShowDiffTime("compile grammar"); if(errors != 0) { printf("Error count = %d\n", errors); return EXIT_FAILURE; } compiler.showStats(); if(input_fn != nullptr) { rc = read_file(input_fn, input_txt); if(rc != EXIT_SUCCESS) return rc; size_t input_txt_size = input_txt.size()-1; //-1 to account for the '\0' terminator myShowDiffTime("read input"); printf("Input size = %d\n", (int)input_txt_size); lalr::ErrorPolicy error_policy_input; lalr::Parser<const mychar_t*, ParseTreeUserData> parser( compiler.parser_state_machine(), &error_policy_input ); parser.set_default_action_handler(parsetreeMaker); //lalr::Parser<const mychar_t*, AstUserDataDbg> parser( compiler.parser_state_machine(), &error_policy_input ); //parser.set_default_action_handler(astMakerDbg); //lalr::Parser<const mychar_t*, int> parser( compiler.parser_state_machine(), &error_policy_input ); parser.lexer_action_handlers() ( "C_MultilineComment", &C_MultLineCommentLexer::string_lexer ) ; if(dumpLexer) parser.dumpLex( &input_txt[0], &input_txt[0] + input_txt_size ); else parser.parse( &input_txt[0], &input_txt[0] + input_txt_size ); myShowDiffTime("parse input"); printf( "accepted = %d, full = %d\n", parser.accepted(), parser.full()); if(parser.accepted() && parser.full()) { print_parsetree( parser.user_data(), 0 ); } } } return EXIT_SUCCESS; }

mingodad · 2023-07-16T08:01:28Z

Of course I don't mind share the playground link, ideally it'll be moved to github pages.
I've got close to a good parse tree dump now see again the playground and my last commit mingodad@eb7ff4c .

I'm glad that we can join efforts to build an amazing tool to facilitate write/debug/develop grammars.

Thank you again for your great work !

cwbaker · 2023-07-16T08:05:11Z

Actually I can't comment on individual commits from the PR so I'll just do it here:

Fix to detect identifiers referenced in rules but not defined:
This error check existed until it was possible to use tokens for predence only. Let me go over what is happening here and make sure there aren't two conflicting use cases.

Make possible to accept associativity/precedence syntax like bison/byacc:
Unless I've misunderstood it's already possible to specify precedence but no associativity with the %none directive. I don't want the %prec and %nonassoc keywords from Bison/YACC in lalr. Lalr is supposed to be different, and hopefully better, rather than the same.

Check if '%whitespace' directive is present in the grammar and if not…:
I'm not sure that it's an error to leave out the whitespace directive. Let me think about it for a while.

Add code to allow generate an EBNF for railroad diagram generation:
The railroad diagrams are great but code to generate them belongs outside of the library. I believe there is enough information available from Grammar to do that.

Add method to dump the input from the lexer:
This should also be outside the library. I'd accept it as a debug feature to match what Parser::set_debug_enabled() does but not as its own method on Parser.

Add a method to show grammar compilation stats.:
This should also be outside of the library. I think the playground itself should implement this.

Add a naive implementation of "%case_insensitive" directive, right no…:
The case insensitive lexer will take some work to get in. I think I'm more interested in seeing a) being able to specify case (in)-sensitivity per token and b) what the syntax for that will look like in the grammar. I like the simplicity of lalr not having to deal with case sensitivity itself, i.e. "[Ss][Ee][Ll][Ee][Cc][Tt]".

Make trivial methods inline.:
This is okay but please define the functions outside of the class and preserve any documentation comments. I like the classes to provide a concise summary of the API and that gets lost when functions are defined within the class definition. Also preserve the class per-file, e.g. RegexNodeLess should be in RegexNodeLess.hpp not RegexNode.hpp.

cwbaker · 2023-07-16T08:07:00Z

Generally I think the playground directory should be a separate repository that uses a submodule or some dependency mechanism to bring in lalr. Then all of the output specific to the playground can go there too.

I like that because that keeps lalr as a smaller, simpler C++ library. I think that also frees you up to not depend on me for PRs and feedback in a lot of cases.

Thanks heaps,
Charles

…ring like 'error' inside 'errors'

…ay I can generate a better parse tree.

…troduced

…the line end is multi character like of '\r\n'

mingodad added 25 commits June 22, 2023 14:34

Add 'std::' in several places as suggested by clang

1633608

Fix to detect identifiers referenced in rules but not defined

4007497

Add preprocessor guards to allow build without threads/threadpool.

01e753c

Make possible to accept associativity/precedence syntax like bison/byacc

fe8da35

Add an error message for empty literal/regex declarations, also fix t…

98444cb

…o accept "'\''" literal.

Check if '%whitespace' directive is present in the grammar and if not…

82bfdf5

… emit an error message.

Add code to allow generate an EBNF for railroad diagram generation

0e05c13

Add method to dump the input from the lexer.

061bf24

Reuse result of already called function.

7a668bb

Add a method to show grammar compilation stats.

9243595

Reorder class member for better memory usage/alignment.

a64710d

Rename write output function to prevent clash with C lib ::write

642de0b

Use a typedef and macros to allow easy experimenting with different t…

a5387f3

…ype to store bitfields.

Simplify GrammaSymbolSet

d7cacbc

Only check for '%whitespace' directive if the grammar has no other er…

06dfb8e

…rors.

Fix examples/test that were missing '%whitespace' directive.

deb8e3e

Check if we are at the end and then stop

49af659

First working version of an wasm browser based playground

4e5acce

Add a naive implementation of "%case_insensitive" directive, right no…

41860c1

…w it only work on literals and only for ASCII

Add column info to error messages in ErrorPolicy

0aec408

Add special regex character escape for the naive case insensitive imp…

d9e8e15

…lementation

Make trivial methods inline.

9b72871

Add column info to GrammarSymbol and error messages

3aa9a4a

First implementation for outptut an parse tree. The MissingHeaders te…

f59b70c

…st is failing and need review.

Check if the input is accepted && full before print the parse tree

02178b9

Now I've got closer to a good parse tree dump

eb7ff4c

cwbaker reviewed Jul 16, 2023

View reviewed changes

Missing fixes for a better parse tree output

ccdca1a

mingodad force-pushed the playground branch from c155471 to cdcaa59 Compare July 16, 2023 11:59

Show an error message when associativity is assigned to a non-terminal.

9f907f6

mingodad force-pushed the playground branch from cdcaa59 to 9f907f6 Compare July 16, 2023 12:06

mingodad added 3 commits July 16, 2023 14:11

Undo a mistaken removing code for a better parse tree output.

1d09221

Fix generation of empty productions for genEBNF

8aa81d2

Add YACC generation from LALR grammars.

51ab70e

mingodad force-pushed the playground branch from 8539fed to 51ab70e Compare July 17, 2023 06:49

mingodad added 9 commits July 18, 2023 14:26

Fix to only match fully words, because before it was matching a subst…

208eda6

…ring like 'error' inside 'errors'

Add a missing case when generating a YACC file

fc40770

Add the reducing transition as a parameter to action handlers, this w…

08939ba

…ay I can generate a better parse tree.

Update examples to use the extra action handler parameter recently in…

ee28902

…troduced

Add 2 new grammar options to easy debug, also fix line counting when …

f5c1810

…the line end is multi character like of '\r\n'

Add code to detect user content changes and alert him/her

6f73f32

Added grammar examples

02946cc

Update code

d5231dc

Create static.yml

2e1ef74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a wasm browser based playground#41

Add a wasm browser based playground#41
mingodad wants to merge 40 commits into
cwbaker:mainfrom
mingodad:playground

mingodad commented Jun 23, 2023

Uh oh!

cwbaker commented Jul 16, 2023 •

edited

Loading

Uh oh!

cwbaker Jul 16, 2023

Uh oh!

mingodad Jul 16, 2023

Uh oh!

mingodad commented Jul 16, 2023

Uh oh!

cwbaker commented Jul 16, 2023 •

edited

Loading

Uh oh!

cwbaker commented Jul 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mingodad commented Jun 23, 2023

Uh oh!

cwbaker commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwbaker Jul 16, 2023

Choose a reason for hiding this comment

Uh oh!

mingodad Jul 16, 2023

Choose a reason for hiding this comment

Uh oh!

mingodad commented Jul 16, 2023

Uh oh!

cwbaker commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwbaker commented Jul 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cwbaker commented Jul 16, 2023 •

edited

Loading

cwbaker commented Jul 16, 2023 •

edited

Loading