People use browsers to do all kinds of things such as access information, web testing, data extraction, fill out forms, generate reports and much more. At times, tasks are repetitive, tedious and even huge in terms of scope. For humans to perform such tasks can be error-prone at times. With the right tools and approaches, these tasks can be automated. Browser Automation is an unpredictable yet effective way to get such things done. With this approach, we try to simulate a user's behaviour and actions. It is a set of steps performed in the browser by the system on our behalf.
Use Cases
I have created and deployed Browser Automation bots for various reasons such as:
- Report generation and download from third party dashboard.
- Automating account management & scraping LinkedIn.
- Automation Instagram for marketing & promotions.
- Google Adsense and DFP automation.
- Web Scraping.
- UI testing.
and much more...
Tips
-
you must use a stable and developer-friendly library for these tasks. These libraries should have good support and a strong ecosystem around them. More powerful and robust the library would be, more chances of your tasks being successful. I, personally, recommend Puppeteer or Nightmare.
-
I am all in for Vanilla JavaScript when it comes to development. However, one needs to understand which tools are the best for the job. Try to inject libraries like jQuery, Lodash and more. Such libraries can help in many ways by providing battle-tested APIs which speed up the development and can take care of many tasks rather than reinventing the wheel.
-
Most of the steps in a Browser Automation system are like clicking a button, selecting the text, filling out forms and so on. These tasks depend upon
CSS selectors
. So, you must use shorter, unique and generic selectors. If selectors are too long then they have more chance of breaking in case of any UI change. Using a library likejQuery
can help here in terms of using selectors likeeq
which aren't natively available. -
While navigating any third party dashboard, it is a series of steps like login, finding some content and clicking some buttons. We all know UI can change and CSS selectors can break at any time. To get around this, try to navigate via URLs rather than click on buttons/links. For example, in Adsense automation, I faced a similar situation where I needed to get into a particular user account based on their id. There can be multiple ways to do it:
One way:
- Filter user
- Wait for loader animation to be over
- Click on the account button
- Handling tab change
- Wait for the profile to be loaded
Better way:
Programming is full of patterns. From data structures like an array to Browser Automation, it holds everywhere. Understand the pattern and ace it! Rather than doing all this, we could just directly navigate to that page with the URL like
- Goto google.com/adsense/new/u/0/__ROOTPUBID__/account/__PUBID__/home
-
Let's consider a case where you want to generate a report and download it. Try to study how things are happening. You might have to go to the reports page, apply some filters & dimensions then click on the download button, handle save file pop up and so on.
-
Always try to understand the URL pattern. It might be the case that you can bypass all the goto, apply filters steps by simply navigating to a specific URL. Like going to abc.com/reports/?from=__FROM-DATE__&to=__TO-DATE__&dimensions=siteid,revenue,pageviews&metrics=adunits. Just replace the URL params and good to go!
-
Always keep an eye on XHR requests. You might be able to skip the whole download file handling by simulating report fetching XHR requests and then parsing its response.
-
-
Many websites like LinkedIn, Instagram and Google properties are very good at detecting automation. They always rate limit the requests or throw a captcha whenever they detect something fishy. In such cases:
- Have patience. Do not try to write a script that works like an assembly line and try to finish tasks as soon as possible.
- Introduce artificial pauses between actions. For example, in the case of a bot that likes media on Instagram or a bot that follows/DM people on LinkedIn should have an arbitrary amount of pause/sleep between actions. This can be done using packages like sleep or simply setting timeouts.
- Try not to behave like a machine and do seemingly random actions like
scroll
,click buttons
and things that might occur in a normal user browsing session. - Added request
headers
likecookies
,UA
. This is very important! You could also use packages like puppeteer-extra-plugin-stealth. This can make or break your bot! - Try using proxies when it comes to making requests because at the end of the day they might block your IP.
-
Always log what is happening. Always take screenshots when something goes wrong. This way you can keep track of what exactly is happening and what exactly happened in case anything goes wrong. Tiny things like these can help immensely while debugging.
const moment = require('moment'); function logger(message) { console.log(`${message} | Current Time : ${moment().format('dddd, MMMM Do YYYY, h:mm:ss a')}`); } function findUser(id) { logger(`Logged in. Will find user based on id: ${id}`); ... }
-
Try to save your progress. Let's consider an example where you have a bot that likes Instagram's media based on a list of hashtags. You can save which hashtag you are currently working on so that if anything goes wrong and you need to restart then you can resume from the checkpoint rather than start from the very first one. Suppose you are sending a message to all your LinkedIn connections from a list of profiles. Saving progress in terms of processed profile URLs will help you avoid duplication of a message to the same users. This would make your approach more authentic, less spam and more chance of people not reporting you. :P
-
Try to deploy this to a server like AWS, DigitalOcean or Heroku. They have a much more stable environment in terms of internet connection, high availability and so on. Having said that, you can take it a step further and registry the bot as a service at your OS level. This will help you add a bit more perceived randomness to the flow. For example, service gets started at bootup and disconnects when you put your system on standby or shut down. Of course, this will hamper the speed but will increase your chance of preventing detection. Saving progress as mentioned earlier will come in handy here as your service restarts many times.
There are so many other things you can do. I might be missing some so would love to hear what you think can be done to write better bots!