Extract URL Preview Content with PHP and jQuery

How to extract URL contents? This post will guide you how to extract URL content like many websites "Facebook, Twitter, Google" and retrieve the information about what any URL title and description is about.

Demo Download

Sometimes you want to show rich preview of a URL to user to enhance user experience and display some meta information of a URL before user visits it. In this post you will learn how to fetch metadata information from URL like title, description and image with PHP and jQuery. We will be creating following files to achieve this:

index.html: Contains HTML form that will allow us to submit a URL for extraction.
extract-contents.php: Contains the code to fetch required data from submitted URL.
javascript.js: Contains the code to send AJAX request to extract-contents.php
style.css: Contains all the style formatting for our HTML page and URL data preview box.

To extract URL preview content, the extract-contents.php will be doing the main job.

Prepare Regular Expression to Validate URL.
Validate the URL and fetch the URL content.
Open a new DOM document and load the fetched content into DOM.
Search for first image in content, title and description tags.
Prepare the HTML preview container and return the response.

Prepare HTML Form and Link Preview Container

Prepare a web page with HTML form to send URL to server via AJAX request and content preview container to show URL preview content.

index.html

<!DOCTYPE html>
<html>
    <head>
        <title>Extract URL Contents with PHP and jQuery - Demo</title>
        <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
        <meta content="width=device-width, initial-scale=1, maximum-scale=1" name="viewport" />
        <script type="text/javascript" src="js/jquery-3.1.1.min.js"></script>
        <script type="text/javascript" src="js/javascript.js"></script>
        <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
        <link rel="stylesheet" href="css/style.css" />
    </head>
    <body>
        <section class="section py-4">
            <div class="container">
                <div class="extract-wrapper">
                    <label>Enter an absolute URL like https://www.codestacked.info</label>
                    <form class="url-extract-form">
                        <div class="input-group">
                            <input type="url" class="form-control url-input" value="" required="required" placeholder="Enter a URL to extract contents" />
                            <button type="submit" class="btn btn-green">Extract</button>
                        </div>
                        <div class="loader">
                            <i class="fa fa-spinner fa-spin"></i>
                        </div>
                    </form>
                    <div class="content-wrapper" id="content-wrapper"></div>
                </div>
            </div>
        </section>
    </body>
</html>

Send AJAX Request to PHP

Send jQuery AJAX request to server side PHP script to process and fetch URL meta tags information.

javascript.js

$(document).ready(function(){
    $(".url-extract-form").on("submit",function(e){
        e.preventDefault();
        var url = $(".url-input").val();
         $(".content-wrapper").hide();
         if(url != ''){
             $(".loader").fadeIn();
             $.ajax({
                url: "extract-contents.php",
                type: "POST",
                data:{
                    url: url
                },
                success: function(data){
                    $(".content-wrapper").html(data).slideDown();
                    $(".loader").fadeOut();
                }
             });
         }
    });
});

Extract URL Metadata for Link Preview in PHP

We will open DOM Document in PHP and extract content from URL in order to generate link preview in PHP. The new domxpath() will be used for accessing elements in loaded DOM document using xpath queries. This is what this script is doing to read meta tags from URL. We will extract title, description and image from URL.

Create a regular expression to validate the submitted URL.
If URL is not valid prepare and return an error response.
Set Title, Description as empty string.
Prepare an array for images in case there is no open graph image found on page, we will use the first image from document.
If URL is valid we will fetch the contents of URL
Open a new DOM document and load this fetched content from URL.
Loop through all images in document and add them to $images array.
Get title from URL with xpath query and set it to $title variable. First fetch open graph meta title, if it does not exist then get the content of document title tag.
Get description from URL with xpath query and set it to $description variable. First fetch open graph meta description, if it does not exist then get the description from meta tag type="description".
Get the image from URL with xpath query and set it to $image variable. First fetch the open graph meta image, if it does not exist then use the first image from $images array.
Finally return the response as link preview in HTML format.

extract-contents.php

<?php
if(!empty($_POST)){
    $post = $_POST;

    $url = strtolower($post['url']);
    $url = str_starts_with($url, 'http') ? $url : 'https://'. $url;

    // regular expression to validate url
    $regex = '/^((https?|ftp):\/\/)(www\.)?[\w\-]+\.[a-z]{2,4}\/?[\w\/\-]*(\.[a-z]{2,4})?$/';

    preg_match($regex, $url, $hostname);

    // Check if url is a valid url
    if(preg_match($regex, $url)){
        // Get contents of url
        $content =@file_get_contents($url);

        // If failed to get contents show an error
        if(!$content){
            die('<div class="error">Error parsing the submitted URL.</div>');
        }
        $title = $description = "";

        $images_arr = [];

        // Open new dom document object
        $dom = new domDocument('1.0', 'UTF-8');

        // Load url content to dom document object
        @$dom->loadHTML($content);

        // Get images from dom document
        $images = $dom->getElementsByTagName('img');

        // Loop through images and push them to images array
        foreach ($images as $image)
        {
            $src = parse_url($image->getAttribute('src'));
            if($src['path'])
                $images_arr[]=$image->getAttribute('src');
        }

        // Open xpath object for current dom document
        $xpath = new domxpath($dom);
        $og_title = $xpath -> query('//meta[@property="og:title"]');
        $og_description = $xpath -> query('//meta[@property="og:description"]');
        $og_image = $xpath -> query('//meta[@property="og:image"]');

        $meta_description = @$xpath -> query('//meta[@name="description"]');
        $meta_title = @$xpath -> query('//title');

        // Prepare title of document
        if($og_title->length){
            $title = $og_title -> item(0)->getAttribute('content');
        }elseif($meta_title->length){
            $title = $meta_title -> item(0)->textContent;
        }

        // Prepare description of document
        if($og_description->length){
            $description = $og_description -> item(0)->getAttribute('content');
        }elseif($meta_description->length){
            $description = $meta_description -> item(0)->getAttribute('content');
        }

        // Prepare image of document
        if($og_image->length){
            $image = $og_image -> item(0)->getAttribute('content');
        }elseif($meta_description->length){
            $image = reset($images_arr);
        }?>
        <div class="url-info-box">
            <?php
            if(!empty($image)){
                // Handling the https urls for images
                $image = (preg_match('/^(https?)/',$image)) || (preg_match('/^(\/\/)/',$image))
                    ? $image
                    : $hostname[0].$image;

                list($width, $height) = getimagesize($image);
                ?>
                <div class="image">
                    <img src="<?=$image;?>" class="img-responsive" width="<?=$width?>" height="<?=$height?>" alt=""/>
                </div>
            <?php } ?>
            <div class="data">
                <div class="title">
                    <?=$title;?>
                </div>
                <div class="description"><?=$description;?></div>
            </div>
        </div>
        <?php
    }else{
        echo '<div class="error">Invalid URL submitted.</div>';
    }
}

Add CSS Styles for Form, Loader and Preview Container

Add CSS styles for whole page including form input field, spinner and URL preview container.

style.css

* {
    box-sizing: border-box;
}
html,body {
    margin: 0;
    padding: 0;
}
body {
    background-color: #f6f6f6;
    font-family: "Segoe UI", "Roboto", "Helvetica", sans-serif;
    font-size: 15px;
    font-weight: normal;
    font-style: normal;
}
.py-4 {
    padding-top: 1rem;
    padding-bottom: 1rem;
}
.container {
    max-width: 1024px;
    margin: 0 auto;
    padding-left: 15px;
    padding-right: 15px;
}
.url-extract-form {
    position: relative;
    margin-bottom: 1rem;
}
.extract-wrapper label {
    display: inline-block;
    margin-bottom: 0.25rem;
}
.input-group {
  position: relative;
  display: flex;
  flex-wrap: wrap;
  align-items: stretch;
  width: 100%;
}
.form-control {
    border: 1px solid #ddd;
    padding: 10px;
    position: relative;
    font-size: inherit;
    flex: 1 1 auto;
    width: 1%;
    min-width: 0;
}
.form-control:focus {
  border-color: #00c0ef;
  outline: 0;
}
.loader {
    position: absolute;
    inset: 0;
    font-size: 1.75rem;
    background: rgba(150,150,150,0.5);
    z-index: 5;
    padding: 0px 10px;
    display: none;
    color: #006699;
    text-align: center;
}
.url-extract-form button {
    display: inline-block;
    padding: 5px 10px;
    cursor: pointer;
    font: inherit;
    background: #00a65a;
    border: 1px solid #009549;
    color: #fff;
    margin-left: -1px;
}
.content-wrapper .error {
    padding: 10px;
    background: #e95454;
    color: #fff;
}
.url-info-box {
    background: #fefefe;
    border: 1px solid #fefefe;
    overflow: hidden;
    font-size: 13px;
    max-width: 300px;
}
.img-responsive {
    max-width: 100%;
    height: auto;
    display: block;
    margin: 0 auto;
}
.url-info-box .data {
    padding: 15px;
    background: #efefef;
}
.url-info-box .title {
    font-weight: bold;
    max-height: 35px;
    overflow: hidden;
    color: #3778cd;
}