1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 28-Mar-2024 18:44 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Find Sitemap & search url's API (for external domains/sitemaps)
Posted by: External project from PeoplePerHour
Started: 08-Mar-2023 12:28 GMT
Description: I need an API that I can use to search for an url, or part of an URL within an external site’s sitemap.

Usage: Laravel & MYSQL

A sitemap is not always located in the same location, nor is the location always mentioned in the robots.txt. So we need to save sitemap locations in a mysql table so we can use those locations to try on other domains, and so be able to locate more sitemaps.

Finding a sitemap
Create a mysql table “sitemaps” (example name) that we can use to save sitemap names (e.g. sitemap.xml, sitemap_index.php, etc). The table has a ‘sitemap’ and ‘count’ field, the count field is simply a counter for each time we find a sitemap with the same name.
Check if the given domain has a robots.txt (https://example.com/robots.txt), if there is a robots.txt you look for the sitemap directive.
“Sitemap: https://www.example.com/example.xml” (can be multiple)
You save the sitemap location to the sitemaps table, if it already exists you do a +1 on the count field.
If we don’t find the sitemap location in the robots.txt we try to find it using all the sitemap locations we have in our sitemaps table (the more we get, the higher the chance we find it) you check themaps with the highest counts first.

Finding a Url
Once you find the sitemap(s), you create an index of all urls in the sitemap and its nested sitemaps.
Now you simply try to find the given search term using a mysql query or regex.

Example request
/Sitemap?domain=example.com&search=url

Example API Response:
What i want is the API to return matching url’s in json format,
{
search: 'example'
domain: domain.com
statistics{
sitemaps_found: 3,
sitemaps{
1: 'www.domain.com/sitemap1.xml',
2: 'www.domain.com/sitemap453.xml',
3: 'www.domain.com/sitemap345.xml'
}
urls: 28892,
matches: 25
},
matches{
1:'www.domain.com/example/13324223',
2:'www.domain.com/example/94827497'
}
}

Discussion;
We can save the sitemap files we find to our server, and search within those files. Or we can insert all sitemap urls in a mysql table and search from there. Not sure what’s faster, let’s discuss.

Save all url’s in Mysql
Pro: Fast searching
Pro: Easily create a cron to delete entries older than x hours
Pro : Easy maintenance
Con: Need to extract all urls from the sitemap files (can potentially be hundred of thousands url’s)

Save sitemap as files
Pro: No need to extract urls and put them in mysql
Cons: Downloading files that might contain vulnerabilities
Cons: Saving files costs more space than saving only the urls in mysql


>>>Outside the scope of the initial task, but would be a followup task, do-not price this in
Project ID: 3314557
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
3D Promotion Video for LED Tennis Court Lights
Category: 3D Animation, 3D Modelling, 3D Rendering, After Effects, Video Services
Budget: $10 - $50 USD
28-Mar-2024
17:04 GMT
Creative, Minimalist Pictorial Logo Design
Category: Graphic Design, Illustration, Logo Design, Photoshop
Budget: $30 - $250 USD
28-Mar-2024
17:02 GMT
Netsuite Order Management system Tester
Category: NetSuite, Oracle
Budget: $15 - $25 USD
28-Mar-2024
17:02 GMT
Web-Based Bitcoin & Ethereum Forecasting Tool
Category: HTML, JavaScript, PHP, Software Architecture, Web Design
Budget: $30 - $250 USD
28-Mar-2024
17:02 GMT
Lifestyle Models for Reels Production For A Digital Marketing Company
Category: Fashion Modeling, Instagram, Video Production, Video Services, Videography
Budget: $8 - $15 USD
28-Mar-2024
17:02 GMT
WordPress landing page for leads required
Category: CSS, HTML, PHP, Web Design, WordPress
Budget: ₹100 - ₹400 INR
28-Mar-2024
17:02 GMT
Android App Dev & Google Console
Category: Android, Android Studio, Flutter, Mobile App Development
Budget: ₹1500 - ₹12500 INR
28-Mar-2024
17:00 GMT
Firmware Architect for Consumer Electronics
Category: Arduino, Circuit Design, Electrical Engineering, Electronics, PCB Layout
Budget: £18 - £36 GBP
28-Mar-2024
16:58 GMT
Shopify Website Customization: Fonts & Layout
Category: CSS, HTML, Shopify, Shopify Templates, Web Design
Budget: $250 - $750 USD
28-Mar-2024
16:58 GMT
Hostile Reconnaissance Behavioural Analysis
Category: Article Writing, Report Writing, Research, Research Writing, Technical Writing
Budget: £20 - £250 GBP
28-Mar-2024
16:57 GMT
E-commerce Website Migration: Webflow to Wix
Category: ECommerce, Graphic Design, HTML, Shopping Carts, Web Design
Budget: $250 - $750 USD
28-Mar-2024
16:56 GMT
Bold, Eye-Catching Website Banner Design
Category: Banner Design, Graphic Design, Logo Design, Photoshop, Photoshop Design
Budget: $8 - $15 AUD
28-Mar-2024
16:56 GMT
Infidelity Discovery: Social Media and GPS
Category: Certified Ethical Hacking, Mobile App Development, Research
Budget: $250 - $750 USD
28-Mar-2024
16:53 GMT
Mobile App & Admin Panel Enhancements/Modifications
Category: Android, Drupal, IOS Development, Mobile App Development, PHP
Budget: $250 - $750 USD
28-Mar-2024
16:52 GMT
EXtremely talented custom CSS coder needed
Category: CSS, HTML, JavaScript, JQuery / Prototype, PHP
Budget: £10 - £20 GBP
28-Mar-2024
16:51 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Long URL:
www.1001freelanceprojects.com
Mobile version:
m.1001fp.com
Copyright © 2005-2022 1001 Freelance Projects