WP Content Crawler v1.13.0 - Get content from almost any site
Get content from almost any site to your WordPress blog, automatically!
Docs | Demo | Website | Join our Discord server! (New)
FOR WHAT IT CAN BE USED
- Create a personal site which collects news, posts, etc. from your favorite sites to see them in one place
- Use it with WooCommerce to collect products from shopping sites
- Collect products from affiliate programs to make money
- Collect posts to create a test environment for your plugin/theme
- Collect plugins, themes, apps, images from other sites to create a collection of them
- Keep track of competitors
- You can imagine anything. The internet is full of contents
Before you buy, make sure you do the following:
- Watch the quick start video and use the plugin in the demo. You can also watch the other video tutorials to learn how to use the plugin. There are also many guides explaining how to do certain things with the plugin.
- Make sure the plugin can retrieve the data from the site you want to crawl by following the instructions in Can I get content from X site? FAQ.
- If you are still not sure if the plugin can retrieve content from a specific site, ask us in the comments section.
- You can check the FAQs if you have any questions. If the answer to your question is not there, you can always ask us in the comments section.
QUICK START
HOW IT WORKS
It’s all about CSS selectors and you can learn how to use them in minutes by watching the introduction tutorial. The plugin’s Visual Inspector tool also helps you find CSS selectors easily by clicking onto the elements in the target sites. Here is the gist of it:
WHAT WP CONTENT CRAWLER CAN DO
Here is the list of some features of WP Content Crawler. To learn about all of the features, please see the features table below.
SEE IT IN ACTION, LEARN IN MINUTES
WP Content Crawler introduction video (English) | |
WP Content Crawler introduction video (Turkish) |
VIDEO TUTORIALS
Quick Start Guide | |
Using CSS Selectors in WP Content Crawler | |
HTML & CSS Selectors | |
Using short codes to place any data anywhere in the post | |
Save images as WooCommerce product gallery | |
Save arcade games |
MAIN FEATURES
Save every post detail Title, excerpt, content, tags, categories, slug, date, custom meta, taxonomies, meta keywords, meta description, featured image, post images, status… Just everything. |
Visual Inspector Just click to an element to find its CSS selector. You can also get alternative CSS selectors that you might be interested in. There is no need to leave your admin panel anymore. |
|
Crawl (scrape, grab, save) posts After the settings are configured, the plugin finds URLs of the posts and crawls them automatically in the background. |
Recrawl (update) posts Recrawl posts automatically to keep them updated all the time. You can limit how many times a post can be updated, set update interval, and ignore old posts. |
|
Delete posts You want to delete old crawled posts? The plugin can delete them automatically. |
Control scheduling You can set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minutes. |
|
Save categories The target category does not exist in your site? No problem. The plugin can create the target categories for you. Just define the CSS selectors that find category names. They can even be created as subcategories. |
Save slugs (permalink) You can define the permalink of the posts. You can get the permalink from the target site, enter custom text, and even create templates for the slugs by using short codes. |
|
Save taxonomies Save taxonomy values by retrieving them from the target site or entering manually. Saving details of custom post types is easier than ever. |
Save posts into custom categories A custom post type has custom categories? No problem. You can define custom category taxonomies used by the custom post type and select those categories when defining the categories of the post. The plugin can also create custom categories for you. |
|
Custom post meta Save anything as custom post meta. You can use a CSS selector or just type the value. |
Content templates Prepare post content, title, excerpt, list item and gallery item templates using short codes. Moreover, you can define templates for values of each CSS selector using the options box. |
|
Alternative selectors You can write alternative selectors to get the data even if the target site has post pages designed differently from each other. |
Find and replace anything You can use plain text or regular expressions to find and replace anything. You can even modify the HTML of the page, create your own HTML elements and write selectors to use them. You can even change image URLs. You have the power. |
|
Paginated posts Target post has more than one page? No worries. You can save paginated posts as well. |
List type posts Some sites create posts with a list inside. You can extract the list from the post, create a template that should be applied to each list item and even reverse the list. |
|
Remove unnecessary elements Sometimes you need to get rid of some elements, such as advertisements, comments, you name it. Just write its CSS selector and it is removed. |
Automatically insert category URLs Target site has hundreds of categories? Piece of cake. Just write the CSS selector and the plugin will insert them for you. |
|
Post types Set post type. It can be a post, a page, a product, or any other post type available in your WordPress installation. |
Remove links You can remove links from the post. Just check the checkbox and the links are gone. That easy. |
|
Password protection You can set a password for the posts to show them only to the users who have the password. |
Notes You can add notes for yourself to remind you things about the site. CSS selectors, TODO list, anything. |
|
Test everything on the fly Test post crawling, URL collection, CSS selectors, regular expressions, find and replace options and proxies on the fly. You can also enable caching to perform the tests much faster and reduce the requests sent to the target site. |
Test all the settings of a site at once Using the tester, you can test all options you configured in the site settings to make sure everything works as you want before enabling automatic crawling. |
|
Tools Using the tools, you can save posts manually with their URL, recrawl posts with their ID or delete already-saved URLs. |
Custom general settings for each site You can provide custom general settings for each post to override them and make them suitable for a site. |
|
Post status You can directly publish the saved posts or keep them as draft to check them before publishing. |
Save all images in post content Saving all images in the content of the post is as easy as checking a single checkbox. |
|
Save images as gallery You can save the images in the target page as gallery and provide a template for each image to make it suitable for the gallery library that you use on frontend. You can also save the images as WooCommerce gallery by just checking one checkbox. |
Any data as short code Get anything from target page as a short code and use the short codes in the plugin’s templates to place any data anywhere you want. |
|
Proxy Use a proxy or proxies to get content from the sites to which your IP does not have access. |
Cookies Attach cookies, such as session cookies, to each request. By this way, for example, you can crawl the target site as if you are logged in. |
|
Crawl as many posts as you want You can set how many times post crawling or URL collection CRON events should run. By this way, you can, e.g., save 100 posts every minute. Just be careful and consider your server’s capacity. |
Email notifications Set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification. |
|
Get data from JSON When you enable JSON parsing for a CSS selector, you can get the values from the JSON easily. |
Advanced HTML manipulations Find-replace in response HTML, find and replace in element attributes, exchange element attributes, remove element attributes, manipulate HTML of an element, remove HTML elements… |
|
Automatic translation Use the artificial intelligence of DeepL Translate API, Google Cloud Translation API, Microsoft Translator Text API, Yandex Translate API or Amazon Translate API to automatically translate the posts. Note that these are paid services. They generally offer the service for free for a limited amount of time. You can see their pricing pages to learn more. |
Automatic spinning Use spinning to automatically rewrite crawled posts’ contents to improve search engine optimization. The plugin currently implements Spin Rewriter API and Turkce Spin API, which are paid services. You can visit their website to learn the pricing details. |
|
Duplicate post check Check duplicate posts by URL, post title and/or post content. If you are using WooCommerce, products whose SKU already exists are considered as duplicate and they will not be added to your site. |
Scheduled posts You can add/remove minutes to/from the post date. By this way, you can schedule post publishing. |
|
Save WooCommerce products Save price, inventory, shipping, attributes, and advanced options. You can save the product as a simple or an external product. You can also set downloadable file options and define the product as virtual. The options are available for WooCommerce versions greater than or equal to 3.3. |
Options box You have the control! Define many options for the values found by a CSS selector. The options include find-replace, calculation, template, and JSON parsing settings. You can easily import/export the options defined in the options boxes as well. |
|
Handle files like a pro Rename, copy, and move saved files easily. You can also define title, description, caption, and alt texts for the saved media files using templates in which you can use any short code. It is also possible to give random names to the saved files. |
Handle iframes and scripts like a pro WordPress does not allow showing iframes and scripts since they pose a security risk. You can turn iframe and script HTML elements into short codes by just checking a checkbox. The short code will show iframes and scripts from the allowed source domains defined by you. |
|
Quick save With quick save button, you can save the settings much more quickly. No need to wait for page to reload. |
Regular expressions Define regular expressions in find-replace options to find-replace anything. You can also use delimiters and modifiers to match more precisely. |
|
Save “srcset” attributes When alternative sizes of the saved images are available, the plugin assigns them into srcset attribute of img elements so that your pages will load faster in different screen sizes. |
Save “alt” and “title” attributes When you save images, their “alt” and “title” attributes are automatically retrieved from the target site and assigned to the saved media. You can also define templates for them to apply your SEO strategies. |
|
Warnings Learn when there is a problem. The plugin will show you the details of the error so that you can fix it right away. |
Handle character encoding problems The plugin is able to handle different character encodings, even if the target site contains mixed encodings. You can convert the encoding by checking a single checkbox. |
|
Navigate between settings easily Fix navigation to the top! The plugin stores where you were before switching to a new tab and restores your previous location when you activate that tab again. No more getting lost among the settings. |
Manual crawling tool With manual crawling tool, save multiple posts by entering their URLs. You can also enter category URLs so that the tool can get post URLs from there. Moreover, you can set it to crawl multiple posts at the same time. |
|
Add URLs to the database The plugin collects URLs automatically. However, if you want it to crawl only certain URLs, you can add them to the database manually using the manual crawling tool. By this way, the specified URLs will be crawled using your scheduling options, automatically. |
Enable/disable automatic crawling for a specific site You can enable or disable automatic crawling for each site individually. |
|
Import/export You can import and export site settings easily. Just copy and paste the code created by the plugin. |
Unlimited Add unlimited sites to the plugin and activate how many of them you want. |
|
Detailed dashboard See what’s going on in the background. Active sites, number of posts crawled, number of posts updated, last crawled and updated posts, last added URLs, last and next run of CRON events, currently being saved posts and URLs… |
Get updates from your admin panel You can update the plugin with just one click whenever an update is ready. Just go to your updates page in your admin panel. |
|
Use the most secure PHP The plugin supports the latest versions of PHP. |
Use the most modern browsers The plugin supports Chrome, Firefox, Safari, Opera, and Edge. |
|
Interactive guides Interactive guides show you how to configure settings to achieve certain things, step-by-step, like a living documentation. You can start these guides whenever you want. You can even start them from a specific step. |
Online documentation You can check the online documentation whenever you feel a need. |
|
Quick guides right next to the settings Each setting in the plugin has a quick guide that will help you understand what each setting does. |
Video tutorials Watch video tutorials to easily learn how to use the plugin. |
|
Ready to translate You can translate the plugin into your own language using Poedit. |
Filters With filters, you can do things conditionally. For example, you can increase the price of a product if one of its attribute values contains a specific word. Filters contain many action commands. See the commands in the documentation. |
|
Use OpenAI GPT (ChatGPT) You can use OpenAI GPT models to change the title, content, tags, file names, and more. You can use GPT-3.5 and GPT-4. With the advanced short code builder, you can use the chat, complete, edit, and insert modes. To learn more, watch this video! |
Requirements | PHP >= 7.3, json, mbstring, curl, dom, fileinfo, WP-Cron. These are already available in most hosts. Even if the extensions are not already active, most hosting sites let you enable these from their control panel. See the documentation for more information. |
Tested with WP versions | 6.2, 6.1, 6.0, 5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.0 |
Tested with WooCommerce versions | 7.5, 7.4, 7.3, 7.2, 7.1, 7.0, 6.9, 6.5, 6.1, 5.2, 5.1, 4.9, 4.5 |
Languages | English, Türkçe |
Shortcomings | The plugin cannot retrieve content that is created by using JavaScript. For more information, please see Can I get content from X site?. |
HAPPY CUSTOMERS
WHY WP CONTENT CRAWLER
Problems with crawling a website
- Not an easy task, requires advanced programming skills
- Every website is different and needs tailored crawling implementation
- Not just every website is different, but also pages of a single website can differ
- Pages and their source codes need to be investigated intensively to come up with a crawling plan
- Knowing how to save certain information in a specific place in WordPress requires knowledge about the internal structure of WordPress and how WordPress works
- If certain information should be saved into a specific field defined by a third-party plugin, one should modify the crawling implementation after researching for hours about how to save that information
- One should know about how HTML works and how to extract certain parts from HTML code
- One should handle all possible inconsistencies that might be in the source codes of websites to provide a robust solution that will keep working
- What if the posts need to be shared in regular time intervals?
- What if you want to crawl new posts added to a website after some time?
- What about translating the posts from one language to another?
- What if the posts needs to be paraphrased to provide a better search engine optimization for the website?
- What if some information should not be retrieved?
- What if certain information should be changed to make it suitable for your site?
- What if another site needs to be crawled, not just one?
- What if that other site needs a different crawling plan?
- What if you need to login to the website to crawl it?
- What if the website changes its source code?
- What if you want to update the crawled posts by recrawling them from the original website?
- What if you want to make sure if the information is retrieved exactly as you want it before automatically posting the posts to your website?
- What if you want to ensure your site’s security by making sure no malicious-code-executing code ends up in your site?
- And many more what-ifs that you might not even imagine unless you come across them
Our vision and mission
We believe that robust, reliable, and automated crawling capabilities should be available for anyone. We want to democratize this field by letting anyone have these capabilities, not just developers. With this purpose, we aim to provide a plugin that you will fall in love with and feel at home when using it. To let it accessible by anyone, we make the plugin low-cost and easy-to-use. We do not implement the features just to make sales. We plan and execute for the future. We always listen to your feedback and make required changes accordingly. We think that WordPress plugins should be developed with enterprise-level care. So, we intensively test the plugin before each release with automated end-to-end UI tests, currently over 1700 tests, that run in many different environments in the cloud for a total of over 40 hours to ensure the plugin is compatible with your server and WordPress environments and you, our valuable customers, get the quality and reliability you deserve.
How we solve these problems
We have been developing WP Content Crawler for almost 4 years such that we have come across almost all the what-ifs. Working with our customers and listening to their needs, we provide robust and reliable solutions to these problems. We believe that one should just provide from which site the information should be retrieved and what information should be retrieved from that website and then start crawling that site, without worrying about the complex behind-the-scenes operations.
To make it available to anyone, we provide a detailed online documentation that contains not just the description of the settings but how to use the settings to achieve your goals. Sometimes you might not feel like reading the documentation. We also provide interactive step-by-step guides that are available in the plugin, just one click away. You can start the interactive guides showing you step-by-step how you can do certain things any time and from any step you want.
One of the most distinctive features of WP Content Crawler is the ability to test almost any configuration. By this way, you will not come across any surprises after you enable automatic crawling. When testing, the errors related to your settings are shown so that you can fix them before they cause any problems.
WP Content Crawler has so many features that even we do not know how many of them are there. You can automatically crawl, update, and delete the posts, you can translate posts, spin posts, you can even define what fields need to be translated or spun if you do not want them all changed. You can find-replace almost anything. You can assign some information from the target post to a short code and place that information anywhere in the post. You can save WooCommerce products. You can save details for third-party plugins that we do not even know they exist. The features of the plugin are designed such that you feel that you are in control when you use them. We make them as flexible as possible to make them fit your needs. When designing new features, we always keep in mind that you might need a more advanced version of that feature and we design the features accordingly. We ensure that the features and the entire code of the plugin are maintainable and extendable so that we can always improve the plugin.
CHANGELOG
Changelog is kept in the documentation site. Click here to see the changelog.