Search and Find

Book Title

Author/Publisher

Table of Contents

Show eBooks for my device only:

 

Hacking Web Intelligence - Open Source Intelligence and Web Reconnaissance Concepts and Techniques

Hacking Web Intelligence - Open Source Intelligence and Web Reconnaissance Concepts and Techniques

of: Sudhanshu Chauhan, Nutan Kumar Panda

Elsevier Reference Monographs, 2015

ISBN: 9780128019122 , 301 Pages

Format: PDF, ePUB, Read online

Copy protection: DRM

Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Apple iPod touch, iPhone und Android Smartphones Read Online for: Windows PC,Mac OSX,Linux

Price: 42,95 EUR



More of the content

Hacking Web Intelligence - Open Source Intelligence and Web Reconnaissance Concepts and Techniques


 

Chapter 2

Open Source Intelligence and Advanced Social Media Search


Abstract


After understanding the basics of the internet in the first chapter, in this chapter we will be learning about the open source intelligence. We will see it from an average user's perspective and will talk about the basic ways in which we access it. Then we will move forward and understand the evolution of the web as we know it today, i.e., Web 2.0 and how it influences us. Afterward we will discuss about Social Media Intelligence. Then we will be dealing with social networks and talk about advanced social media search. In the end we will briefly talk about the web of the future, also termed as Web 3.0

Keywords


Open source intelligence; OSINT; Social media intelligence; SOCMINT; Web 2.0; Web 3.0
Information in this chapter
• Open source intelligence (OSINT)
• Web 2.0
• Social media intelligence (SOCMINT)
• Advanced social media search
• Web 3.0

Introduction


As we already covered the basic yet essential terms with little details in the previous chapter, it’s time to move on to understanding the core topic of this book, that is open source intelligence also known by its acronym OSINT, but before that we need to recognize how we see the information available in public and up to what extent we see it.
For most of us internet is limited to the results of the search engine of our choice. If we talk about a normal user who wants some information from the internet he/she directly goes to a search engine; let’s assume it’s one of the most popular search engine Google and puts a simple search query. A normal user unaware of advanced search mechanisms provided by Google or its counterparts puts simple queries he/she feels comfortable with and gets a result out of it. Sometime it becomes difficult to get the information from search engine due to poor formation of the input queries. For example, if a user wants to search for a windows blue screen error troubleshoot, he/she generally enters in the search engine query bar “my laptop screen is gone blue how to fix this,” now this query might or might not be able to get the desired result in the first page of the search engine, which can be a bit annoying at times. It’s quite easy to get the desired information from the internet, but we need to know from where and how to collect that information, efficiently. A common misconception among users is that the search engine that he/she prefers has whole internet inside it, but in real scenario the search engines like Google have only a minor portion of the internet indexed. Another common practice is that people don’t go to the results on page two of a search engine. We all have heard the joke made on this that “if you want to hide a dead body then Google results page two is the safest place.” So we want all our readers to clear their mind if they also think the same way, before proceeding to the topic.

Open source intelligence


Simply stated, open source intelligence (OSINT) is the intelligence collected from the sources which are present openly in the public. As opposed to most other intelligence collection methods, this form does not utilize information which is covert and hence does not require the same level of stealth in the process (though some stealth is required sometimes).
OSINT comprises of various public sources, such as:
• Academic publications: research papers, conference publications, etc.
• Media sources: newspaper, radio channels, television, etc.
• Web content: websites, social media, etc.
• Public data: open government documents, public companies announcements, etc.
Some people don’t give much heed to this, yet it has proven its importance time and again. Most of the time it is very helpful in providing a context to the intelligence provided from other modes but that’s not all, in many scenarios it has been able to provide intelligence which can directly be used to make a strategic decision. It is thought to be one of the simplest and easiest modes by many if not most, yet it does has its difficulties; one of the biggest and unique out of all is the abundance of data. Where other forms of intelligence starve for data, OSINT has so much data that filtering it out and converting it into an actionable form is the most challenging part.
OSINT has been used for long time by government, military as well as the corporate world to keep an eye on the competition and to have a competitive advantage over them.
As we discussed, for OSINT there are various different public sources from which we can collect intelligence, but during the course of this book we will be focusing on the part which only uses internet as its medium. This specific type of OSINT is called as WEBINT by many, though it seems a bit ambiguous as there is a difference between the internet and web (discussed in Chapter 1). It might look like that by focusing on a specific type we are missing a huge part of OSINT, which would have been correct few decades earlier but today where most of the data are digitized this line of difference is slowly thinning. So for the sake of understanding we will be using the terms WEBINT and OSINT interchangeably during this book.

How we commonly access OSINT


Search engines


Search engines are one of the most common and easy method of utilizing OSINT. Every day we make hundreds of search queries in one or more search engines, depending upon our preference and use the search results for some purpose. Though the results we get seem simple but there is a lot of backend indexing goes on based on complex algorithms. The way we create our queries make a huge difference in the accuracy of the result that we actually seek from a search engine. In a later chapter we will discuss how to craft our queries so that we can precisely get the result that we desire. Google, Yahoo, and Bing are well-known examples of the search engines.
Though it seems like search engines have lots of information, yet they only index the data which they are able to crawl through programs known as spiders or robots. The part of the web these spiders are able to crawl is called as the surface web, the rest of the part is called as the dark web or darknet. This darknet is not indexed as it is not directly accessible via a link. Example of darknet is a page generated dynamically using the search option on a web page. We will discuss about darknet and associate terms in a later chapter.

News sites


Earlier the popular mediums of news were newspaper, radio, and television; but the advancement in the internet technology has drastically changed the scenario and today every major news vendor has a website where we can get all the news in a digital format. Today there even exist news agencies which only run online. This advancement has certainly brought news at the touch of our fingertips at anytime, anywhere where there is an internet connection available. For example, bbc.com is the news website for the well-known British Broadcasting Corporation.
Apart from news vendors, there are sites run by individuals or a group as well and some of them focus on topics which belong to specific categories. These sites are mainly present in form of blogs, online groups, forums, or IRCs (Internet Relay Chat), etc. and are very helpful when we need the opinion of the mass on a specific topic.

Corporate websites


Every major corporation today runs a website. It’s not just a way to present your existence but also interact directly with customers, understand their behavior, and much more. For example, www.gm.com is the corporate website for General Motors. We can find out a plethora of information about a company from its website. Usually a corporate website contains information like key players in the organization, their e-mails, company address, company telephone, etc. which can be used to extract further information.
Today some of the corporate websites also provide information in the form of White Papers, Research Papers, corporate blogs, newsletters subscription, current clients, etc. This information is very helpful in understanding not only the current state of the company but also its future plans and growth.

Content sharing websites


Though there are various types of user-generated content out there which contains an amalgam of text as well as various different multimedia files, yet there are some sites which allows us to share a specific type of content such as videos, photo, art, etc. These types of sites are very helpful when we need a specific type of media related to a topic as we know exactly where to find it. YouTube and Flickr are good examples of such sites.

Academic sites


Academic sites usually contain information to some specific topics, research papers, future developments, news related to a specific domain, etc. In most cases this information can be very crucial in understanding the landscape for current as well as future development. Academic sites are also helpful in learning traits which are associated to our field of interest and also understand the correlation in between.
The information provided in the academic sites is very helpful in understanding the developments that are taking place in a specific domain and also to get a glimpse of our future. They are not only...