Browse - Build expressive libraries with thunks
pranaygp (3)

The Main Idea

Browse is a language used to build powerful libraries while keeping the end-user experience as simple as possible (think bash). Browse achieves this by treating thunks as first class citizens (Think of a thunk as an intent to call a function). This facilitates library development by allowing library creators to implement complex behavior with minimal changes to the end user experience. To show off Browse, we built a library called “web” which aims to make web scraping, browser automation and UI testing simple.

Rules and RuleSets

In order to implement first class thunks we use Rules and RuleSets. A Rule is an intent to execute some action, and a RuleSet is a collections of Rules.

Example

# The print rule being used
print "Hello World" # prints "Hello World" when evaluated
# A RuleSet
{
    print "Hello"
    print "World"
}

Every line of code in Browse begins with a rule name. So the above code can't be executed. However, we can pass a RuleSet into a rule.

# Evaluate the rules in the RuleSet sequentially
eval {
    print "Hello"
    print "World"
}

# Evaluate the rules in the RuleSet sequentially, but in reverse
eval(reverse) {
    print "Hello"
    print "World"
}

RuleSets are what Browse uses to represent Thunks.

At the top level of a browse program, every Rule is evaluated sequentially, but higher-order rules (Rules which take RuleSets as arguments) can change that behavior

Complete Documentation

👉 Check this short Wiki. We put a ton of work into it 👈

Some Design Decisions

Apart from the obvious decision to make RuleSets first class citizens, there are a few other key decisions worth noting (Roughly ranked in order of importance).

Implementing most language features as Rules

  • while, if, and for are all rules
    • Typically these are built into the language
    • We proved that higher order Rules can look like they're a part of the language
    • In fact, while is implemented completely in browse
    • Great for library developers 🙂
  • To consume arguments to a function, we use a rule called bind
    • Typically arguments to a function are automatically consumed
    • Argument handling is normally managed by the language (i.e. pass-by-{name|value|reference})
    • The bind rule lets the rule's author define when and how arguments should be consumed
    • This even allows the author customize the meaning of thunk-composition
      • Normally, f.g === f(g(...)). But, in browse, f gets access to the thunk for g instead of just receiving a value
  • Arrays and Dictionaries are implemented as RuleSets using special rules to set elements
    • Typically these are built into the language
    • Special "subscript" syntax to access array and dictionary values can be implemented as syntactic sugar

Expressions only (Every Rule returns a value)

Implicit return

  • The eval rule evaluates the RuleSet passed as an argument.
  • Since every rule must return a value, eval returns the value returned from the last rule in the RuleSet
  • This means that return is implicit, however, we do expose a rule called return that is an alias for id (the identity rule). This works nearly as well and helps when reading the code

Named Arguments/Options

  • We love how most bash programs take optional arguments as flags (-h or --force) etc.
  • However, we didn't like how flags could be interspersed with positional arguments which leads to ambiguity
  • So, we added a native syntax for passing options for rules. It looks like this
    my_rule(double !yell prefix="Ans: ") 2
    # This sets `double` to true, `yell` to false, and `prefix` to "Ans: "
    # Look at the wiki to see how rules can `bind` options and use them

Unquoted Strings

  • Inspired by bash,Q browse supports unquoted strings
    print hello world

Optional Semicolons

  • Inspired by bash, the absence of semicolons generally results in cleaner code
  • That being said, it’s not too hard to dream up cases where semicolons could come in handy so they are optional.

Examples

If you’re confused about what some of these functions do, check this out for the standard browse rules, and this for the web rules.

Fibonacci

# A Fibonacci rule

rule fib {
  # Bind "n" to the first argument
  bind n

  if ($n <= 1) then {
    # Base case
    return 1
  } else {
    # Recursion
    return (fib $n - 1) + (fib $n - 2)
  }
}

# Run the program

print (fib 5)

Higher order rules

Conway's Game of Life

Web Examples

Here are some examples of web scraping scripts written in browse:

Wikipedia Scraper

# Pass `--web` to browse when running this example
# $ browse --web ./examples/web/wikipedia.browse

page https://en.wikipedia.org/wiki/:slug {
  # Grab a string from the webpage and store it in a variable called 'title'
  # Note '@' is not a special symbol. The name of the rule happens to be '@string' that's all
  @string title `#firstHeading`

  # Grab an array of strings from the webpage and store them in paragraphs
  @arr(string) paragraphs `div.mw-parser-output`
  out title paragraphs

  # uncomment this to infinitely crawl through wikipedia
  # crawl `a`
}

# Start the crawl
visit https://en.wikipedia.org/wiki/Kevin_Bacon

The Gazette Scraper

page https://www.thegazette.co.uk/notice/:issue {
  print $url
  config { set output "./notices/" + $issue + ".json" }

  wait `.wrapperContent`
  
  @string title `h1.title`
  @string? date `dd time`
  @string? notice `div[about="this:notifiableThing"]`

  out title date notice
}

visit https://www.thegazette.co.uk/all-notices/notice?text=&categorycode-all=all&noticetypes=&location-postcode-1=&location-distance-1=1&location-local-authority-1=&numberOfLocationSearches=1&start-publish-date=01%2F01%2F2000&end-publish-date=12%2F08%2F2020&edition=&london-issue=&edinburgh-issue=&belfast-issue=&sort-by=&results-page-size=10

# Also fetch these
for { set i 2; test $i < 5; set i $i + 1 } {
  visit https://www.thegazette.co.uk/London/issue/ + $i + "/page/2"
}

Twitch Sign Up

set headless false

page https://www.twitch.tv {

  # Click link for full code
  set logIn ...
  set username ...
  set birthMonth ...
  set birthDay ...
  set birthYear ...

  type 'RandomTwitchUser31415'
  wait $logIn
  click $logIn
  wait $username
  type 'RandomTwitchUser31415'
  click '#password-input'
  type 'jfnosenfjksef'
  click '#password-input-confirmation'
  type 'jfnosenfjksef'
  click $birthMonth
  type 'apr'
  press Enter
  click $birthDay
  type '29'
  click $birthYear
  type '1997'
  click '#email-input'
  type '[email protected]'
  sleep 1000 * 10 # 10s
}

visit https://www.twitch.tv/

More Examples

Here

Technical Trade offs

  • Prototyping the language in javascript

    • Pros
      • Faster development
      • Lots of libraries
    • Cons
      • Slower performance
      • Single thread
      • Harder to implement memory optimizations
  • Implementing language features as rules

    • Pros
      • Customizable behavior for language designers
      • Browse Grammar is relatively simple
    • Cons
      • Slower execution times
      • Control flow is hard or impossible to derive from the AST alone. (We don’t necessarily know what if does for instance)

The Roadmap

  • Error handling
    • Inspired by Rust which uses a Maybe monad (the Result type)
    • All rules can be appended with ? and ! to control error handling. Stay tuned
  • Self Compilation
  • Moving the language to Rust
    • Subsequent performance optimizations
  • Building out the standard library
  • A Static type system
    • Really hard because of bind and non-linear control flow
    • However, it can mostly be addressed with macros

Language Support

  • VS Code extension
    • A Syntax Highlighter
    • Formatter
  • Formatter can be run via browse format
  • BrowseDoc: A Documentation Framework
    • Generating documentation for browse rules from comments in .js and .browse files
    • Inspired by JSDoc

Team @windsorio

@atfaust2 💻 📖 🖋 🤔

@pranaygp 💻 📖 🎨 🤔

You are viewing a single comment. View All
Scoder12 (752)

Whoa, awesome! Gives python a run for its money for web scraping!