Browse - Build expressive libraries with thunks
The Main Idea
Browse is a language used to build powerful libraries while keeping the end-user experience as simple as possible (think bash). Browse achieves this by treating thunks as first class citizens (Think of a thunk as an intent to call a function). This facilitates library development by allowing library creators to implement complex behavior with minimal changes to the end user experience. To show off Browse, we built a library called “web” which aims to make web scraping, browser automation and UI testing simple.
Rules and RuleSets
In order to implement first class thunks we use Rules and RuleSets. A Rule is an intent to execute some action, and a RuleSet is a collections of Rules.
Example
# The print rule being used
print "Hello World" # prints "Hello World" when evaluated
# A RuleSet
{
print "Hello"
print "World"
}
Every line of code in Browse begins with a rule name. So the above code can't be executed. However, we can pass a RuleSet into a rule.
# Evaluate the rules in the RuleSet sequentially
eval {
print "Hello"
print "World"
}
# Evaluate the rules in the RuleSet sequentially, but in reverse
eval(reverse) {
print "Hello"
print "World"
}
RuleSets are what Browse uses to represent Thunks.
At the top level of a browse program, every Rule is evaluated sequentially, but higher-order rules (Rules which take RuleSets as arguments) can change that behavior
Complete Documentation
👉 Check this short Wiki. We put a ton of work into it 👈
Some Design Decisions
Apart from the obvious decision to make RuleSets first class citizens, there are a few other key decisions worth noting (Roughly ranked in order of importance).
Implementing most language features as Rules
while
,if
, andfor
are all rules- Typically these are built into the language
- We proved that higher order Rules can look like they're a part of the language
- In fact,
while
is implemented completely in browse - Great for library developers 🙂
- To consume arguments to a function, we use a rule called
bind
- Typically arguments to a function are automatically consumed
- Argument handling is normally managed by the language (i.e. pass-by-{name|value|reference})
- The
bind
rule lets the rule's author define when and how arguments should be consumed - This even allows the author customize the meaning of thunk-composition
- Normally,
f.g === f(g(...))
. But, in browse,f
gets access to the thunk forg
instead of just receiving a value
- Normally,
- Arrays and Dictionaries are implemented as RuleSets using special rules to set elements
- Typically these are built into the language
- Special "subscript" syntax to access array and dictionary values can be implemented as syntactic sugar
Expressions only (Every Rule returns a value)
Implicit return
- The
eval
rule evaluates the RuleSet passed as an argument. - Since every rule must return a value,
eval
returns the value returned from the last rule in the RuleSet - This means that
return
is implicit, however, we do expose a rule calledreturn
that is an alias forid
(the identity rule). This works nearly as well and helps when reading the code
Named Arguments/Options
- We love how most bash programs take optional arguments as flags (
-h
or--force
) etc. - However, we didn't like how flags could be interspersed with positional arguments which leads to ambiguity
- So, we added a native syntax for passing options for rules. It looks like this
my_rule(double !yell prefix="Ans: ") 2
# This sets `double` to true, `yell` to false, and `prefix` to "Ans: "
# Look at the wiki to see how rules can `bind` options and use them
Unquoted Strings
- Inspired by bash,Q browse supports unquoted strings
print hello world
Optional Semicolons
- Inspired by bash, the absence of semicolons generally results in cleaner code
- That being said, it’s not too hard to dream up cases where semicolons could come in handy so they are optional.
Examples
If you’re confused about what some of these functions do, check this out for the standard browse rules, and this for the web rules.
Fibonacci
# A Fibonacci rule
rule fib {
# Bind "n" to the first argument
bind n
if ($n <= 1) then {
# Base case
return 1
} else {
# Recursion
return (fib $n - 1) + (fib $n - 2)
}
}
# Run the program
print (fib 5)
Higher order rules
Conway's Game of Life
Web Examples
Here are some examples of web scraping scripts written in browse:
Wikipedia Scraper
# Pass `--web` to browse when running this example
# $ browse --web ./examples/web/wikipedia.browse
page https://en.wikipedia.org/wiki/:slug {
# Grab a string from the webpage and store it in a variable called 'title'
# Note '@' is not a special symbol. The name of the rule happens to be '@string' that's all
@string title `#firstHeading`
# Grab an array of strings from the webpage and store them in paragraphs
@arr(string) paragraphs `div.mw-parser-output`
out title paragraphs
# uncomment this to infinitely crawl through wikipedia
# crawl `a`
}
# Start the crawl
visit https://en.wikipedia.org/wiki/Kevin_Bacon
The Gazette Scraper
page https://www.thegazette.co.uk/notice/:issue {
print $url
config { set output "./notices/" + $issue + ".json" }
wait `.wrapperContent`
@string title `h1.title`
@string? date `dd time`
@string? notice `div[about="this:notifiableThing"]`
out title date notice
}
visit https://www.thegazette.co.uk/all-notices/notice?text=&categorycode-all=all¬icetypes=&location-postcode-1=&location-distance-1=1&location-local-authority-1=&numberOfLocationSearches=1&start-publish-date=01%2F01%2F2000&end-publish-date=12%2F08%2F2020&edition=&london-issue=&edinburgh-issue=&belfast-issue=&sort-by=&results-page-size=10
# Also fetch these
for { set i 2; test $i < 5; set i $i + 1 } {
visit https://www.thegazette.co.uk/London/issue/ + $i + "/page/2"
}
Twitch Sign Up
set headless false
page https://www.twitch.tv {
# Click link for full code
set logIn ...
set username ...
set birthMonth ...
set birthDay ...
set birthYear ...
type 'RandomTwitchUser31415'
wait $logIn
click $logIn
wait $username
type 'RandomTwitchUser31415'
click '#password-input'
type 'jfnosenfjksef'
click '#password-input-confirmation'
type 'jfnosenfjksef'
click $birthMonth
type 'apr'
press Enter
click $birthDay
type '29'
click $birthYear
type '1997'
click '#email-input'
type '[email protected]'
sleep 1000 * 10 # 10s
}
visit https://www.twitch.tv/
More Examples
Here
Technical Trade offs
Prototyping the language in javascript
- Pros
- Faster development
- Lots of libraries
- Cons
- Slower performance
- Single thread
- Harder to implement memory optimizations
- Pros
Implementing language features as rules
- Pros
- Customizable behavior for language designers
- Browse Grammar is relatively simple
- Cons
- Slower execution times
- Control flow is hard or impossible to derive from the AST alone. (We don’t necessarily know what if does for instance)
- Pros
The Roadmap
- Error handling
- Inspired by Rust which uses a Maybe monad (the
Result
type) - All rules can be appended with
?
and!
to control error handling. Stay tuned
- Inspired by Rust which uses a Maybe monad (the
- Self Compilation
- Moving the language to Rust
- Subsequent performance optimizations
- Building out the standard library
- A Static type system
- Really hard because of
bind
and non-linear control flow - However, it can mostly be addressed with macros
- Really hard because of
Language Support
- VS Code extension
- A Syntax Highlighter
- Formatter
- Formatter can be run via
browse format
- BrowseDoc: A Documentation Framework
- Generating documentation for browse rules from comments in .js and .browse files
- Inspired by JSDoc
Team @windsorio
@atfaust2 💻 📖 🖋 🤔
@pranaygp 💻 📖 🎨 🤔
Whoa, awesome! Gives python a run for its money for web scraping!
Interesting! This reminds me of a similar language (with similar features) called Tcl
@theangryepicbanana That's really cool. Gonna have to steal a bunch of ideas ;)