Integration Approaches

There are a few different ways to build Content Gateway integrations:

  • Using system APIs
  • Using web scrapers

Strategy 1: Using System APIs

System APIs offer a structured, programmatic way to retrieve data directly from the source system. This is the most robust approach for building Content Gateway integrations.

For this approach, you need to follow 3 steps:

  1. Conduct source system API discovery (including API endpoints and authentication).
  2. Create a server that can host Content Gateway APIs . You can middleware tools or host your own server.
  3. Return content using source system APIs every time your Gateway APIs are invoked.

Strategy 2: Using Web Scrapers

Web scraping can be used when source systems APIs are unavailable, though it comes with significant challenges.

For this approach, you need to follow 3 steps:

  1. Build a web scraper to crawl and retrieve content from source systems. You may need to use external libraries such as Beautiful Soup or Selenium depending on your purpose.
  2. Create a server that can host Content Gateway APIs . You can middleware tools or host your own server.
  3. Return content by scraping content from source systems every time your Gateway APIs are invoked.

Comparison of approaches

While you can build gateway integrations either using source system APIs or web scrapers. We highly recommend using source system APIs, since scrapers can easily break and are unreliable.

Here is a detailed compaison of the 2 approaches:

Aspect Web Scraping (Cons) Why system APIs are better
Reliability Scrapers depend on the structure of the site which can change without notice, easily breaking your integration. APIs are designed for structured data access with stable endpoints.
Data Precision Scraped data often includes unnecessary parsed data such as HTML headers, footers, images or Javascript snippets. APIs provide precise, well-defined data that ensures higher Copilot accuracy.
Performance Scraping is much slower because it involves rendering and parsing HTML, JavaScript, and CSS. APIs are optimized for performance, allowing efficient data retrieval.
Scalability Requires additional effort to handle rate limits, paginated data, and large datasets efficiently. APIs are designed with scalability in mind, including features like rate limits and pagination.
Access Permissions Scraping may not support authenticated access or require complex workarounds to handle web login. APIs have secure authentication methods like OAuth and offer robust permission management.