How to handle a large number of products in a sitemap for Opencart?

One of our clients had products around 50k because of which the sitemap could not load and always throw 500 errors as Opencart tried to load all the products’ sitemap URLs on one page. So, our solution is to show the sitemap URLs in a chunk of 500 and use the sitemap index.

Solution:

Although a single sitemap limit is 50MB or 50k URLs, mostly the hosting providers or servers cannot handle showing all of those 50k URLs on one page so our solution is to create chunks to 500 URLs as one sitemap and list all sitemaps using sitemapindex and submit this one sitemap which includes the sitemapindex.

Here is an example of sitemapindex:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://webocreation.com.com/index.php?route=extension/feed/google_sitemap&amp;start=0&amp;end=500</loc>
        <lastmod>2023-02-19T18:23:17+00:00</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://webocreation.com.com/index.php?route=extension/feed/google_sitemap&amp;start=500&amp;end=500</loc>
        <lastmod>2023-02-19T18:23:17+00:00</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://webocreation.com.com/index.php?route=extension/feed/google_sitemap&amp;start=1000&amp;end=500</loc>
        <lastmod>2023-02-19T18:23:17+00:00</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://webocreation.com.com/index.php?route=extension/feed/google_sitemap&amp;start=1500&amp;end=500</loc>
        <lastmod>2023-02-19T18:23:17+00:00</lastmod>
    </sitemap>
    ...
    ...
    ...
</sitemapindex>

When you see one of the sitemap URLs, for example, this URL https://webocreation.com.com/index.php?route=extension/feed/google_sitemap&amp;start=0&amp;end=500 then you will see the sitemap code like below of 500 products:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
        <loc>https://webocreation.com.com/walnut</loc>
        <changefreq>daily</changefreq>
        <lastmod>2023-02-20T00:00:00+00:00</lastmod>
        <priority>1.0</priority>
        <image:image>
            <image:loc>https://webocreation.com.com/image/cache/catalog/data/GKOF20SVE-1000x1000.jpg</image:loc>
            <image:caption> Walnuts</image:caption>
            <image:title> Walnuts</image:title>
        </image:image>
    </url>
    <url>
        <loc>https://webocreation.com.com/insurances</loc>
        <changefreq>daily</changefreq>
        <lastmod>2023-02-20T00:00:00+00:00</lastmod>
        <priority>1.0</priority>
        <image:image>
            <image:loc>https://webocreation.com.com/image/cache/catalog/data/GPT2A45410-1000x1000.jpg</image:loc>
            <image:caption>Insurances</image:caption>
            <image:title>Insurances</image:title>
        </image:image>
    </url>
   ...
   ...
   ...
</urlset>

Code changed for the Google sitemap extension

Open file catalog/controller/extension/feed/google_sitemap.php, remove all code, and paste the following:

<?php
class ControllerExtensionFeedGoogleSitemap extends Controller {
	public function index() {
		if ($this->config->get('feed_google_sitemap_status')) {
			if(isset($_GET['manufacturers']) && $_GET['manufacturers']=='active'){
				$output = $this->getManufactureresSiteMaps();
			}elseif (isset($_GET['start'])) {
				$output = $this->getProductsSiteMaps($_GET['start'], $_GET['end']);
			}else{
				
				$output ='<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

				$this->load->model('catalog/product');
				$totalProducts = $this->model_catalog_product->getTotalProducts();
				for($i=0; $i<$totalProducts; $i=$i+500 ){
					$output .= '<sitemap><loc>'.HTTPS_SERVER.'index.php?route=extension/feed/google_sitemap&amp;start=' . $i . '&amp;end=500</loc><lastmod>2023-02-19T18:23:17+00:00</lastmod></sitemap>';
				}
				$output .= '<sitemap><loc>'.HTTPS_SERVER.'index.php?route=extension/feed/google_sitemap&amp;manufacturers=active</loc><lastmod>2023-02-19T18:23:17+00:00</lastmod></sitemap>';
				$output.='</sitemapindex>';
			}

			$this->response->addHeader('Content-Type: application/xml');
			$this->response->setOutput($output);
		}
	}

	protected function getCategories($parent_id, $current_path = '') {
		$output = '';

		$results = $this->model_catalog_category->getCategories($parent_id);

		foreach ($results as $result) {
			if (!$current_path) {
				$new_path = $result['category_id'];
			} else {
				$new_path = $current_path . '_' . $result['category_id'];
			}

			$output .= '<url>';
			$output .= '  <loc>' . $this->url->link('product/category', 'path=' . $new_path) . '</loc>';
			$output .= '  <changefreq>daily</changefreq>';
			$output .= '  <priority>0.7</priority>';
			$output .= '</url>';

			// $this->load->model('catalog/product');

			// $products = $this->model_catalog_product->getProducts(array('filter_category_id' => $result['category_id']));

			// foreach ($products as $product) {
			// 	$output .= '<url>';
			// 	$output .= '  <loc>' . $this->url->link('product/product', 'path=' . $new_path . '&product_id=' . $product['product_id']) . '</loc>';
			// 	$output .= '  <changefreq>daily</changefreq>';
			// 	$output .= '  <priority>1.0</priority>';
			// 	$output .= '</url>';
			// }

			$output .= $this->getCategories($result['category_id'], $new_path);
		}

		return $output;
	}

	protected function getProductsSiteMaps ($start, $end) {
		$output = '<?xml version="1.0" encoding="UTF-8"?>';
		$output .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">';

		$this->load->model('catalog/product');
		$this->load->model('tool/image');

		$filter_data = array(
			'start' => $start,
			'limit' => 500,
		);
		$products = $this->model_catalog_product->getProducts($filter_data);
		foreach ($products as $product) {
			if ($product['image']) {
				$output .= '<url>';
				$output .= '  <loc>' . $this->url->link('product/product', 'product_id=' . $product['product_id']) . '</loc>';
				$output .= '  <changefreq>daily</changefreq>';
				$output .= '  <lastmod>' . date('Y-m-d\TH:i:sP', strtotime($product['date_modified'])) . '</lastmod>';
				$output .= '  <priority>1.0</priority>';
				$output .= '  <image:image>';
				$output .= '  <image:loc>' . $this->model_tool_image->resize($product['image'], $this->config->get('theme_' . $this->config->get('config_theme') . '_image_popup_width'), $this->config->get('theme_' . $this->config->get('config_theme') . '_image_popup_height')) . '</image:loc>';
				$output .= '  <image:caption>' . (str_replace('&', 'and', $product['name'])) . '</image:caption>';
				$output .= '  <image:title>' . (str_replace('&', 'and', $product['name'])) . '</image:title>';
				$output .= '  </image:image>';
				$output .= '</url>';
			}
		}
		$output .= '</urlset>';

		return $output;
	}

	protected function getManufactureresSiteMaps(){
		$output = '<?xml version="1.0" encoding="UTF-8"?>';
		$output .= '<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

		$this->load->model('catalog/category');

		$output .= $this->getCategories(0);

		$this->load->model('catalog/manufacturer');

		$manufacturers = $this->model_catalog_manufacturer->getManufacturers();

		foreach ($manufacturers as $manufacturer) {
			$output .= '<url>';
			$output .= '  <loc>' . $this->url->link('product/manufacturer/info', 'manufacturer_id=' . $manufacturer['manufacturer_id']) . '</loc>';
			$output .= '  <changefreq>daily</changefreq>';
			$output .= '  <priority>0.7</priority>';
			$output .= '</url>';

			//$products = $this->model_catalog_product->getProducts(array('filter_manufacturer_id' => $manufacturer['manufacturer_id']));

			// foreach ($products as $product) {
			// 	$output .= '<url>';
			// 	$output .= '  <loc>' . $this->url->link('product/product', 'manufacturer_id=' . $manufacturer['manufacturer_id'] . '&product_id=' . $product['product_id']) . '</loc>';
			// 	$output .= '  <changefreq>daily</changefreq>';
			// 	$output .= '  <priority>1.0</priority>';
			// 	$output .= '</url>';
			// }
		}

		$this->load->model('catalog/information');

		$informations = $this->model_catalog_information->getInformations();

		foreach ($informations as $information) {
			$output .= '<url>';
			$output .= '  <loc>' . $this->url->link('information/information', 'information_id=' . $information['information_id']) . '</loc>';
			$output .= '  <changefreq>daily</changefreq>';
			$output .= '  <priority>0.5</priority>';
			$output .= '</url>';
		}
		$output .= '</urlset>';
		return $output;
	}
}

This is the rough way to do it for now but no worries we will soon create an extension for it and provide you guys, for now, this is the quick way to achieve it.

In this way, you can submit the sitemap for a large number of products and submit it to google, bing, or other search engines. Please let us know if you have any kind of projects, you can email us at webocreation.com@gmail.com. Hope you liked this tutorial, please subscribe to our YouTube Channel and get more Opencart free extensions. You can also find us on Twitter and Facebook.

Previous articleTips and tricks to improve Pardot form select fields with JavaScript
Next articleRust-based Decentralized Applications for Blockchain
Author of three Opencart book. The recent Opencart 4 book is at https://amzn.to/4dOlbOR

LEAVE A REPLY

Please enter your comment!
Please enter your name here