HARdy HAR HAR HAR: HAR File Collection and Analysis for Malware

HARdy HAR HAR HAR: HAR File Collection and Analysis for Malware

HAR files or HTTP Archives are a format for recording sessions between a browser and a web server. This type of file is a rarely used, but powerful tool for analyzing malicious links. Using Selenium’s set of web browser automation tools, one can automate visiting an endless stream of links and save a HAR file record of each one. Since the HAR file is saved in JSON format, it is quite easy to work with and can be stored in ElasticSearch. Using Python and the haralyzer framework, network indicators of compromise can be extracted from the HAR file. In addition to network IOCs, the HAR file also may contain the payload binary in a base64 encoded format. Additionally, a HAR file can be replayed over and over with slight variations to circumvent common anti-analysis techniques. The beauty of this method over a PCAP is that it gives the researcher visibility inside of SSL/TLS connections due to the fact that the HAR file represents the data of the HTTP session before encryption occurs. This talk will cover the details of how a HAR file is structured. This includes highlighting specific components that would be of interest during analysis of a malicious link. It will also demonstrate how to setup a collection system based on Python, Selenium, Firefox, Fire- Bug, and NetExport. Next, it will show step-by-step how to extract the network IOCs from the generated HAR file using a set of Python scripts and the open source haralyzer framework. Lastly, it will cover all the steps needed to extract and decode payload binaries so that they’re ready for submission to an automated malware analysis system. The code for these scripts will be released at the end of the talk.

Presented by