Alternative Data
What is Alternative Data in Business Analytics?
Alternative data refers to information gathered from non-traditional sources, providing insights beyond what traditional data sources can offer. The definition of alternative data varies across industries, depending on the conventional data sources used by businesses and competitors.
Common Types of Alternative Data
When discussing alternative data, several data types are primarily used:
- Satellite data
- Mobile data
- Sensor data
- Web data
Additionally, alternative data can encompass:
- Geolocation (foot traffic)
- Credit card transactions
- Email receipts
- Point-of-sale transactions
- Social media posts
- Online browsing activity
- Shipping container receipts
- Product reviews
- Price trackers
- Weather and micro-climates
- Flight and shipping trackers
In recent years, the surge of data from mobile devices, satellites, sensors, and websites has led to vast amounts of structured, semi-structured, and unstructured data, collectively known as big data. Utilizing alternative data enables businesses to gain unique insights, a competitive edge, and increased profits. By combining datasets from various sources, companies can obtain a comprehensive view of their market landscape.
There are three primary methods for accessing alternative data:
- Acquisition of raw data
- Third-party licensing
- Web scraping (also known as web harvesting or web data extraction)
Web Scraping Techniques and Tools
Web scrapers are Application Programming Interfaces (APIs) that extract data from websites, providing crucial insights for businesses. Newer web scraping methods involve listening to data feeds from web servers, with JSON commonly used as a transport storage mechanism between the client and the server.
Automated scraping techniques include:
- HTML Parsing: HTML parsing uses JavaScript to target linear or nested HTML pages.
- DOM Parsing: Document Object Model (DOM) defines the style, structure, and content within XML files.
- Vertical Aggregation: Vertical aggregation platforms are created by organizations with significant computing power, targeting specific verticals.
- XPath: XML Path Language (XPath) is a query language used on XML documents.
- Google Docs: Google Sheets can be used similarly to writing a scraper in programming languages like Python or Ruby, making it a quick way to introduce the basics of certain types of scrapers.
- Text Pattern Matching: This technique uses the UNIX grep command and is often combined with popular programming languages like Perl or Python for regular expression matching.