WWW::Mechanize::Chrome

Max Maischein

Frankfurt.pm

Overview

  • Why WWW::Mechanize::Chrome?

  • What is WWW::Mechanize::Chrome?

  • Development of WWW::Mechanize::Chrome

  • Applications

Who am I

  • Max Maischein

  • DZ BANK Frankfurt

  • Deutsche Zentralgenossenschaftsbank

  • Data Solutions

Automation - My leitmotiv

  • If I can do it manually

  • ... the computer can repeat it

  • ... correctly every time

My tools

  • Perl (well, duh)

  • WWW::Mechanize

  • WWW::Mechanize::Shell (GPW 2002)

  • WWW::Mechanize::Firefox (YAPC::E 2010)

  • WWW::Mechanize::PhantomJS (YAPC::E 2014)

  • WWW::Mechanize::Chrome (today)

  • (also Chromium, v59+)

Browser Evolution

  • Web applications are still cool

  • Web Service Workers are another layer

  • PhantomJS has stopped development

  • Mozilla+Firefox fight extensions

Automation Best Practices

  • Freeze your prerequisites

  • Disable auto-update

Development evolution

  • Local regression tests

  • Travis CI

Javascript

  • Recognized platform

  • Compatible platform

  • Interactive platform

  • WWW::Mechanize::Chrome

Interactivity is not everything

  • Chrome is a browser, with my cookies

  • Chrome wants a UI window

  • --headless

Control

  • Chrome

Control

  • Chrome

  • Chrome DevTools

Control

  • Chrome

  • Chrome DevTools

  • WebSocket

Control

  • Chrome

  • Chrome DevTools

  • WebSocket

  • AnyEvent or Mojolicious

Control

  • Chrome

  • Chrome DevTools

  • WebSocket

  • AnyEvent or Mojolicious

  • WWW::Mechanize::Chrome

Control

  • Chrome

  • Chrome DevTools

  • WebSocket

  • AnyEvent or Mojolicious

  • WWW::Mechanize::Chrome

  • My program

Definition of WWW::Mechanize::Chrome

  • an extended API

  • of WWW::Mechanize

  • using Chrome as backend

WWW::Mechanize::Chrome

 1:  my $mech = WWW::Mechanize::Chrome->new();
 2:  $mech->get('http://act.yapc.eu/lpw2017/');
 3:  $mech->content_as_png();

Features

  • Normal WWW::Mechanize API

  • Javascript

  • CSS Selectors (via HTML::Selector::XPath)

  • XPath Selectors

  • Javascript error messages

Using WWW::Mechanize::Chrome

  • Web site automation

  • Integrated JS Unit Tests

  • Sniffing Websockets

  • Android Chrome remote control

Live Demo

Automate Chrome

01-open-local.pl

 1:  my $mech = WWW::Mechanize::Chrome->new();
 2:  $mech->get_local('file.html');

Live demo

Webseite Test for usability

02-dump-links.pl

 1:  my $mech = WWW::Mechanize::Chrome->new();
 2:  $mech->get_local('links.html');
 3:
 4:  print $_->get_attribute('href'),
 5:        "\n\t-> ",
 6:        $_->get_attribute('innerHTML'), "\n"
 7:    for $mech->selector('a.download');

Live demo

Run Javascript

03-javascript.pl

 1:  // Javascript
 2:      
 3:      
 4:      
 5:      
 6:      
 7:      " ".join(["Just","another","Perl","Hacker"]);

Live demo

Run Javascript

03-javascript.pl

 1:  # Perl
 2:  my $mech = WWW::Mechanize::Chrome->new(
 3:      headless => 1,
 4:  );  
 5:  
 6:  print( ($mech->eval_in_page(<<'JS'))[0]);
 7:      " ".join(["Just","another","Perl","Hacker"]);
 8:  JS

Screenshots for documentation/logging

  • Google Keep Clone

  • Javascript+Perl

  • Service worker

  • Tests

Screenshots for documentation/logging

05-screenshot-online.pl

 1:  my $mech = WWW::Mechanize::Chrome->new();
 2:  my $url= 'https://corion.net/notes.psgi';
 3:  print "Loading $url\n";
 4:  $mech->get($url);
 5:  my $page_png = $mech->content_as_png();

End-to-end Test of JS app

06-create-note.pl

 1:  $mech->get($url);
 2:
 3:  $mech->sleep( 5 );
 4:  # Create note
 5:  $mech->eval_in_page(<<'JS', $name);
 6:  ...

End-to-end Test of JS app

06-create-note.pl

 1:  $mech->get($url);
 2:
 3:  $mech->sleep( 5 );
 4:  # Create note
 5:  $mech->eval_in_page(<<'JS', $name);
 6:      var item = {
 7:          title : "Hello London",
 8:          text  : "Created with WWW::Mechanize::Chrome",
 9:          ...
10:      };
11:      saveItem( item );
12:  JS
13:  sleep 1;

Convert HTML to PDF

PDF output is only available in headless Chrome

07-screenshot-pdf.pl

 1:  my $mech = WWW::Mechanize::Chrome->new(
 2:      headless => 1,
 3:  );
 4:  my $url= 'http://localhost:5000';
 5:  print "Loading $url\n";
 6:  $mech->get($url);
 7:
 8:  $mech->render_content(
 9:      format => 'pdf',
10:      filename => 'screen.pdf'
11:  );

API-Extensions

  • Alerts (window.alert())

  • $mech->on_dialog(...)

  • $mech->handle_dialog( 1 );

  • Browser Console

Prerequisites of WWW::Mechanize::Chrome?

  • Chrome / Chromium

  • WWW::Mechanize::Chrome

  • AnyEvent or Mojolicious

What is missing with WWW::Mechanize::Chrome?

  • API Implementation (->post() , ...)

  • Documentation

Missing API Implementation

  • ->post()

Need-driven Development

NDD

Need-driven Development

NDD

Need-driven Development

Need-driven Development

  • Simple things first

  • ->post() half implemented

  • So far no need

  • HTTP::Cookies::Chrome for cookie management

Missing API

API for

  • Browser windows (open, close, popup)

  • Downloads

  • Event API? Callback API?

Mising documentation

  • Documentation for the API

  • WWW::Mechanize::Chrome

  • Documentation for FAQs

  • WWW::Mechanize::Chrome::Examples

Missing documentation

  • Rewrite the ::Firefox documentation

  • WWW::Mechanize::Chrome::Examples

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox
 2:  
 3:  Display        Optional   No             Yes

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox
 2:  
 3:  Display        Optional   No             Yes
 4:
 5:  Cookies
 6:    persistent   Yes        No             Yes

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox
 2:  
 3:  Display        Optional   No             Yes
 4:
 5:  Cookies
 6:    persistent   Yes        No             Yes
 7:  Custom
 8:    certificates Ignore     Easy           Hard

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox
 2:  
 3:  Display        Optional   No             Yes
 4:
 5:  Cookies
 6:    persistent   Yes        No             Yes
 7:  Custom
 8:    certificates Ignore     Easy           Hard
 9:  Dialogs        Yes        Possible       Hard

Comparison of the modules

 1:                 Chrome     PhantomJS      Firefox
 2:  
 3:  Display        Optional   No             Yes
 4:
 5:  Cookies
 6:    persistent   Yes        No             Yes
 7:  Custom
 8:    certificates Ignore     Easy           Hard
 9:  Dialogs        Yes        Possible       Hard
10:  Downloads      No         Yes?           Yes

Automation Best Practices

  • Freeze your prerequisites

  • Disable auto-update

  • Use whatever browser suits your profile

A look back on the development of WWW::Mechanize::Chrome

A look back on the development of WWW::Mechanize::Chrome

The Good

  • Testsuite of WWW::Mechanize::Firefox and ::PhantomJS

  • API of WWW::Mechanize

  • Experience with ::Firefox

  • 32bit App, 64bit Perl -> TCP!

  • Future ideal instead of callbacks

  • WebSocket trivial using AnyEvent or Mojolicious

  • API much more pleasant than Selenium

A look back on the development of WWW::Mechanize::Chrome

The Good, the Bad

  • Chrome / Chromium is a moving target

  • No PDF-support despite documented

  • Can't set the Referer header since Chrome v64

A look back on the development of WWW::Mechanize::Chrome

The Good, the Bad, the Ugly

  • API coverage through tests

  • Fine differences between ::Firefox , ::PhantomJS and ::Chrome

Automation Best Practices

  • Freeze your prerequisites

  • Disable auto-update

Sample code

The code is on CPAN as

WWW::Mechanize::Chrome::Examples

Thanks

Thanks

Questions?

Thanks

Questions?

Slides are online:

https://corion.net/talks/

WWW::Mechanize::Chrome on CPAN

https://github.com/corion/www-mechanize-chrome on Github

Thanks

Bonus Section

Outlook

  • Firefox --headless

  • just now Windows

  • only Selenium (PhantomJS)

  • Firefox hates extensions and XUL

Outlook

  • Screencast mode

  • Automatic replay on HTTP errors

Javascript without a browser

  • Javascript::SpiderMonkey ( Mike Schili, Thomas Busch on CPAN )

  • Javascript::Duktape ( Mahmoud A. Mehyar on CPAN )

  • Installable via CPAN, no header files needed

  • Windows needs patch

Javascript (Ia)

  • Javascript::Engine ( Father Chrysostomos/SPROUT on CPAN )

  • Pure Perl, slooooow

  • JavaScript::Any ( PLICASE on CPAN )

Thanks

Questions?

Thanks

Fragen?

Slides are online:

https://corion.net/talks/

WWW::Mechanize::Chrome on CPAN

https://github.com/corion/www-mechanize-chrome on Github