Hubspot API: Get generated live content for a page to index it (without hubl tags)


#1

I want to index my Hubspot pages via an external Elasticsearch service that I have set up. Now when I get the content via the API for my pages the body html still has the HUBL tags ( {% … %} ) in it and is not really the live content as it is displayed when a visitor sees that page.

Is there a way to get this generated content via API? Or is there another way?


#2

Hi @Jelle_Vermeulen,

How are you currently getting your pages via API? Are you using the templates API or the pages API?


#3

Sorry for the late reply, kinda forgot about this question.

I have sorted it out. We get the pages (blog or page) via API and then we get only the things that we want and filter the content to be indexed with following code:

First get all the pages:

/*
             * GET HUBSPOT CONTENT
             */
            // tip: how to add greater then > to a date: .'&publish_date__gt=0&deleted_at=0'
            if($type == 'blog') {
                $url = "http://api.hubapi.com/content/api/v2/blog-posts?hapikey=" . $settings['hapikey'];
            } else {
                $url = "http://api.hubapi.com/content/api/v2/pages?hapikey=" . $settings['hapikey'];
            }

            $ch = curl_init($url);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

            $result = curl_exec($ch);
            $info = curl_getinfo($ch);

            $resultArray = json_decode($result, true);
            curl_close($ch);

Then loop throught the pages and for each page do the following:
($type must be ‘blog’ or ‘page’ depending on the incoming result you got from the API curl)

function getContent($page, $type) {
        $pageIndex = array();
        $pageIndex['id'] = $page['id'];
        $pageIndex['title'] = $page['title'];
        $pageIndex['absolute_url'] = $page['absolute_url'];
        $pageIndex['created'] = $page['created'];
        $pageIndex['updated'] = $page['updated'];
        $pageIndex['currently_published'] = $page['currently_published'];
        $pageIndex['publish_date'] = $page['publish_date'];
        $pageIndex['deleted_at'] = $page['deleted_at'];
        $pageIndex['featured_image'] = $page['featured_image'];

        if(isset($page['campaign'])) $pageIndex['campaign'] = $page['campaign'];
        if($type == 'blog') $pageIndex['content_group_id'] = $page['content_group_id'];

        $pageIndex['content'] = array();
        if($type == 'page') {
            // Loop widget_containers
            foreach($page['widget_containers'] as $module){
                foreach($module['widgets'] as $item) {
                    if(!empty($item['body']) && !empty($item['body']['html']) && is_string($item['body']['html'])) {
                        $content = cleanupContent($item['body']['html']);
                        if($content) $pageIndex['content'][] = $content;
                    }
                    if(!empty($item['body']) && !empty($item['body']['value']) && is_string($item['body']['value'])) {
                        $content = cleanupContent($item['body']['value']);
                        if($content) $pageIndex['content'][] = $content;
                    }
                }
            }
            // Loop widgets
            foreach($page['widgets'] as $item){
                if(!empty($item['body']) && !empty($item['body']['html']) && is_string($item['body']['html'])) {
                    $content = cleanupContent($item['body']['html']);
                    if($content) $pageIndex['content'][] = $content;
                }
                if(!empty($item['body']) && !empty($item['body']['value']) && is_string($item['body']['value'])) {
                    $content = cleanupContent($item['body']['value']);
                    if($content) $pageIndex['content'][] = $content;
                }
            }
        } else {
            if(!empty($page['post_body'])) {
                $content = cleanupContent($page['post_body']);
                if($content) $pageIndex['content'][] = $content;
            }
        }
            
        return $pageIndex;
    }

    function cleanupContent($content){
        // remove <script> <style> {{ }} {% %} """ (triple quotes)
        $content = strval($content);
        $content = preg_replace('#<script([\s\S]*?)(</script>)#is', '', $content);
        $content = preg_replace('#<style([\s\S]*?)(</style>)#is', '', $content);
        $content = preg_replace('#{{([\s\S]*?)(}})#is', '', $content);
        $content = preg_replace('#{%([\s\S]*?)(%})#is', '', $content);
        $content = preg_replace('#"""#is', '"', $content);
        return trim(strip_tags($content));
    }