React Native Developer

Monday, April 29, 2019

data science project in rstudio

       


Table of Contents

Sr. No.
Particulars
Page No.
1








     2
Introduction
  • Background of Project
  • Problem Statement
  • Aim of the Project
 

Data Science with R Studio
4
5

5

5



6
    3
Methodology

       8
    4
Data Visualization
       16


     5

 About Data Source
  • Understand Data: Data Wrangling
  • Understand Data: Exploratory Analysis



       18

     6
Results

        31
     7
Conclusion

       50
      

      8
References

        51















ABSTRACT

Initially Internet was the tool for sharing the information related to the research and development. In the present day world, communication over the Internet, interaction through social media over Internet, online purchasing, online banking, online bill payment etc. are becoming the necessities for among of us. Now, physical crime world has been shifted towards to cyber crime world. As with use of the Internet, cyber crimes are increasing day by day, hence there is a strong need to make the appropriate cyber laws to deal with these cyber crimes. In this paper, types of cyber crimes, Cyber Crime Preventive Measures, Mechanism to Report Cyber Crimes and Right Path for Sending the Blocking, Removal Request for Objectionable Content which is available over cyber space are discussed under the IT Act 2000 of Indian Government.

Keywords: Cyber Crime, Cyber Law, Online Fraud, Phishing, Hacking, IT Act, Copy Right Infringement


I.                INTRODUCTION

Due to exponential growth of Internet and thereby online/mobile banking and such other related technologies, present world is being benefited by the use of these technologies. In each segment like banking, airport, space, railway, telecommunication and social media today’s world is fully dependent on the technology and all these technologies are interlinked with one another through Internet. Each innovation or new technology facilitates a lot of advantages but same time it may produces the side effects. In the present day world is fully dependent on Internet via social media, banking transaction, mobile transactions etc.

As we know Cyberspace is a domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical infrastructures. Despite technological measures being adopted by corporate organizations and individuals, we have witnessed that due to the advancement of knowledge of individuals in field of cyber space, the frequency of cyber crimes has increased over the last decade.

 The term cyber crime can be defined as an act committed or omitted in violation of a law forbidding or commanding it and for which punishment is imposed upon conviction [4]. Other words represents the cyber crime as ―Criminal activity directly related to the use of computers, specifically illegal trespass into the computer system or database of another, manipulation or theft of stored or on-line data, or sabotage of equipment and data[25].

Cyber crime is a fast-growing area of crime. More and more criminals are exploiting the speed convenience and anonymity of the Internet to commit a diverse range of criminal activities that know no borders, either physical or virtual, cause serious harm and pose very real threats to victims worldwide.

Cyber crime could include any monetary offences as financial frauds as well as non-monetary offences, such as cyber bullying, creating and distributing small or large programs written by programmers called viruses on other computers or posting confidential business information on the Internet [1, 2, 3, 5, 6 and 7].





·       Background of Project

The reason behind we selected this topic to analyse the different types of Cyber Crimes.


·       Problem Statement

As the beginner in the Data Science project we faced problem in cleaning the data and made different prediction on it.

Predicting Analyse Location based on Cyber Crimes


·       Aim of the Project


The main purpose of selecting this topic is to analyse the various Cyber  Crime on the India of different categories. And to predict the number of  Cyber Crimes  in  India.



2. About Data Science with

R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Since then, endless efforts have been made to improve R’s user interface. The journey of R language from a rudimentary text editor to interactive R Studio and more recently Jupyter Notebooks has engaged many data science communities across the world.

This was possible only because of generous contributions by R users globally. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization and computation much faste



Why learn R ?

I don’t know if I have a solid reason to convince you, but let me share what got me started. I have no prior coding experience. Actually, I never had computer science in my subjects. I came to know that to learn data science, one must learn either R or Python as a starter. I chose the former. Here are some benefits I found after using R:

    1. The style of coding is quite easy.
    2. It’s open source. No need to pay any subscription charges.
    3. Availability of instant access to over 7800 packages customized for various computation tasks.
    4. The community support is overwhelming. There are numerous forums to help you out.
    5. Get high performance computing experience ( require packages)
    6. One of highly sought skill by analytics and data science companies.

How to install R / R Studio ?

RStudio is an integrated development environment, or IDE, for R programming. Download and install it from http://www.rstudio.com/download. RStudio is updated a couple of times a year. When a new version is available, RStudio will let you know. It’s a good idea to upgrade regularly so you can take advantage of the latest and greatest features
Follow the steps below for installing R Studio:
    1.  Go to https://www.rstudio.com/products/rstudio/download/
    2.  In ‘Installers for Supported Platforms’ section, choose and click the R Studio installer based on your operating system. The download should begin as soon as you click.
    3. Click Next..Next..Finish.
    4. Download Complete.
    5. To Start R Studio, click on its desktop icon or use ‘search windows’ to access the program. It looks like this:
rstudio
Let’s quickly understand the interface of R Studio:
    1. R Console: This area shows the output of code you run. Also, you can directly write codes in console. Code entered directly in R console cannot be traced later. This is where R script comes to use.
    2. R Script: As the name suggest, here you get space to write codes. To run those codes, simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click on little ‘Run’ button location at top right corner of R Script.
3.     R environment: This space displays the set of external elements added. This includes data set, variables, vectors, functions etc. To check if data has been loaded properly in R, always look at this area.

4.    Graphical Output: This space display the graphs created during exploratory data analysis. Not just graphs, you could select packages, seek help with embedded R’s official documentation.


How to install R Packages ?

The sheer power of R lies in its incredible packages. In R, most data handling tasks can be performed in 2 ways: Using R packages and R base functions. In this tutorial, I’ll also introduce you with the most handy and powerful R packages. To install a package, simply type:
install.packages("package name")
As a first time user, a pop might appear to select your CRAN mirror (country server), choose accordingly and press OK.
Note: You can type this either in console directly and press ‘Enter’ or in R script and click ‘Run’
                                          3. Methodology

5 Steps of a Data Science Project Lifecycle


1). Obtain

A). Installing loading packages
Packages
Install.packages(“packagesname”)
1)     Ggplot2 :-
Definition :- ggplot2 is a data visualization package for the statistical programming language R
install.packages("ggplot2")

2)     Dplyr :-
Definition :- dplyr is a powerful R-package to transform and summarize tabular data with rows and columns.
install.packages("dplyr")

3)    Sqldf :-
The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. 'RSQLite', 'RH2', 'RMySQL' and 'RPostgreSQL' backends are supported.

install.packages("sqldf")


4)    RSQLite:-
Embeds the SQLite database engine in R, providing a DBI-compliant interface. SQLite is a public-domain, single-user, very light-weight database engine that implements a decent subset of the SQL 92 standard, including the core table creation, updating, insertion, and selection operations, plus transaction management.


5)    Leaflet:-
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMapMapbox, and CartoDB.
This R package makes it easy to integrate and control Leaflet maps in R.
install.packages("leaflet")

6)     vcdExtra:-
This package provides additional data sets, documentation, and a few functions designed to extend the vcd package for Visualizing Categorical Data
install.packages("vcdExtra")
7)     moments:-

B). Download Dataset Link
      2218 obj . of  30 variables
Fields


ID
Unique identifier for the record.
Number
Case Number
The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.
Plain Text
Date
Date when the incident occurred. this is sometimes a best estimate.
Date & Time
Block
The partially redacted address where the incident occurred, placing it on the same block as the actual address.
Plain Text
IUCR
The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.
Plain Text
Primary Type
The primary description of the IUCR code.
Plain Text
Description
The secondary description of the IUCR code, a subcategory of the primary description.
Plain Text
Location Description
Description of the location where the incident occurred.
Plain Text
Arrest
Indicates whether an arrest was made.
Checkbox
Domestic
Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
Checkbox
Beat
Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.
Plain Text
District
Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.
Plain Text
Ward
The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.
Number
Community Area
Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.
Plain Text
FBI Code
Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). See the Chicago Police Department listing of these classifications at http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.
Plain Text
X Coordinate
The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
Number
Y Coordinate
The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
Number
Year
Year the incident occurred.
Number
Updated On
Date and time the record was last updated.
Date & Time
Latitude
The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
Number
Longitude
The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
Number
Location
The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.
Location



C). Loading of data in Rstudio
·        First open the rstudio and click on file menu to move on the import dataset and click on from text(base) option.


im.png

v Next select csv file and click ok.
v Than load of the dataset in rstudio.


v Successfully import dataset in rstudio.





2). Scrub

1). To identify NA data in dataset.
Ø  colSums(is.na(crimes_2017))
 
Output :-
 
> colSums(is.na(crimes_2017))
                      ID              Case.Number                     Date                    Block
                       0                        0                     6072                        0
                    IUCR              PrimaryType              Description      LocationDescription
                       0                        0                        0                        0
                  Arrest                 Domestic                     Beat                 District
                       0                        0                        0                        0
                    Ward            CommunityArea                  FBICode              XCoordinate
                       0                        0                        0                      221
             YCoordinate                     Year                UpdatedOn                 Latitude
                     221                        0                        0                      221
               Longitude                 Location HistoricalWards2003.2015                 ZipCodes
                     221                        0                      245                      221
          CommunityAreas             CensusTracts                    Wards    Boundaries...ZIPCodes
                     243                      231                      243                      243
         PoliceDistricts              PoliceBeats                    Month                  Weekday
                     244                      244                     6072                     6072
>
 
Ø  sum(is.na(crimes_2017))
Output :-
 
     [1] 21014
 
 
 
 

3). Explore

A). Exploring Data
1).Display Number of Row Or Column.
Ø  dim(crimes_2017)
 
Output:-
           [1] 6072   32
 
 
2).Fieldname of dataset.
Ø  names(crimes_2017)
 
Output :-
 
            
 
 
3).Structure of dataset( factor-variable(field)cityname).
Ø  str(crimes_2017)
 
Output :-
 
> str(crimes_2017)
'data.frame':             6072 obs. of  32 variables:
 $ ID                      : int  11191630 11191600 11191645 11191594 11191605 11192326 11191660 11195895 11194861 11191734 ...
 $ Case.Number             : Factor w/ 6072 levels "JA551641","JA553349",..: 5316 5310 5306 5313 5308 5412 5320 5642 5323 5330 ...
 $ Date                    : Date, format: NA NA NA NA ...
 $ Block                   : Factor w/ 4400 levels "0000X E 100TH ST",..: 1611 1389 1112 2693 2158 4161 996 3086 878 605 ...
 $ IUCR                    : Factor w/ 178 levels "031A","031B",..: 170 1 5 166 5 172 165 165 126 166 ...
 $ PrimaryType             : Factor w/ 26 levels "ARSON","ASSAULT",..: 25 22 2 25 2 25 25 25 3 25 ...
 $ Description             : Factor w/ 169 levels "$500 AND UNDER",..: 108 28 20 1 20 74 105 105 137 1 ...
 $ LocationDescription     : Factor w/ 89 levels "","ABANDONED BUILDING",..: 22 81 14 65 80 69 81 81 41 65 ...
 $ Arrest                  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Domestic                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Beat                    : int  1414 1221 1213 2032 912 511 1424 1622 123 123 ...
 $ District                : int  14 12 12 20 9 5 14 16 1 1 ...
 $ Ward                    : int  35 1 27 46 11 8 1 45 2 2 ...
 $ CommunityArea           : int  22 24 24 3 59 50 24 11 32 32 ...
 $ FBICode                 : Factor w/ 23 levels "01A","04A","04B",..: 21 19 2 21 2 21 21 21 5 21 ...
 $ XCoordinate             : int  1156766 1164119 1166700 1165257 1165575 1185896 1164335 1137864 1176517 1175104 ...
 $ YCoordinate             : int  1915591 1904383 1906118 1932643 1882019 1841315 1909299 1937046 1895340 1897037 ...
 $ Year                    : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
 $ UpdatedOn               : Factor w/ 165 levels "1/1/2018 15:50",..: 133 133 133 133 133 133 133 133 133 133 ...
 $ Latitude                : num  41.9 41.9 41.9 42 41.8 ...
 $ Longitude               : num  -87.7 -87.7 -87.7 -87.7 -87.7 ...
 $ Location                : Factor w/ 5156 levels "","(41.646187093, -87.61722683)",..: 4129 3437 3612 4782 2224 395 3809 4910 2688 2773 ...
 $ HistoricalWards2003.2015: int  15 24 41 37 26 9 24 20 48 48 ...
 $ ZipCodes                : int  22535 21560 22620 22616 14920 21861 21560 22532 14913 14913 ...
 $ CommunityAreas          : int  23 25 25 31 56 47 25 11 38 38 ...
 $ CensusTracts            : int  322 519 484 610 192 643 482 701 12 12 ...
 $ Wards                   : int  41 41 41 18 1 35 41 50 10 10 ...
 $ Boundaries...ZIPCodes   : int  1 4 49 15 43 19 4 18 35 35 ...
 $ PoliceDistricts         : int  7 15 15 2 23 10 7 12 22 22 ...
 $ PoliceBeats             : int  182 80 60 59 165 250 191 43 144 144 ...
 $ Month                   : chr  NA NA NA NA ...
 $ Weekday                 : chr  NA NA NA NA ...
>
 
4). Class of the variable.
Ø  class(crimes_2017$Date)
 
Output :-
                               [1] "Date"
 
 
B). Descriptive Statistics
Ø  summary(crimes_2017$Ward)
 
Output :-
 
      Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    10.0    24.0    23.6    36.0    50.0 
 
Ø  mean(Crimes_2001$Ward)
 
Output :-
                               [1] 23.5998
 
 
 
5.   Data Visualization 
 
We have printed 5 columns of Cyber Crimes

Ø  head(crimes_2017,5)
 

Ø  Information about
Cyber Crimes


Information about crimes_2018_19




5) Ward wise

b1=sqldf("SELECT count(*) FROM crimes_2017 WHERE Ward BETWEEN 1 AND 30")
b1
b2=sqldf("SELECT count(*) FROM crimes_2017 WHERE Ward BETWEEN 40 AND 60")
b2
b3=sqldf("select count(*) From crimes_2017 WHERE Ward BETWEEN 45 AND 65")
b3
b50=c(b1$`count(*)`,b2$`count(*)`,b3$`count(*)`)
b50
barplot(b50,names.arg = "Beat", main = "Year",ylim = c(-10,5000),col=rainbow(10),las=2)
ans=sqldf('select count(Year)"Total_Year",Year from crimes_2017 group by IUCR')
ans

·        DATA WRANGLING

This data came up pretty clean, however, I did apply some simple data wrangling. I eliminated the missing values from this dataset as well as changing the categorical variables to our advantage. This process makes it easier for our analysis later on

crimes_2018_19 <- read.csv("crimes_2018-19.csv",stringsAsFactors = FALSE)

crimes_2018_19$PrimaryType <- as.factor(crimes_2018_19$PrimaryType)
crimes_2018_19$Description <- as.factor(crimes_2018_19$Description)
crimes_2018_19$LocationDescription <- as.factor(crimes_2018_19$LocationDescription)
crimes_2018_19$IUCR <- as.factor(crimes_2018_19$IUCR)

crimes_2018_19$Arrest[which(crimes_2018_19$Arrest == "True")] <- 1
crimes_2018_19$Arrest[which(crimes_2018_19$Arrest == "False")] <- 0

crimes_2018_19$Domestic[which(crimes_2018_19$Domestic == "True")] <- 1
crimes_2018_19$Domestic[which(crimes_2018_19$Domestic == "False")] <- 0
head(crimes_2018_19)


 

 

 

 

 

ANALYSIS ON DATASET

#install.packages('ggplot2')
library(ggplot2)
primary_type <- ggplot(crimes_2018_19, aes(PrimaryType))
primary_type + geom_histogram(stat = "count") + coord_flip()

## Warning: Ignoring unknown parameters: binwidth, bins, pad




·        Location Display Longitude And Langitude Though

library(leaflet)
m = leaflet()
m = addTiles(m)
m = addMarkers(m, lat=41.65470, lng=-87.61051, popup="pc")
m



From this density plot we can observe how crimes have decreased since year 2017, however it has not decreased drastically from 2017-2019
#install.packages('plotrix')
library(plotrix)
library(plot3D)
arrests <- table(crimes_2018_19$Arrest)
lbls <- paste(names(arrests), "\n", arrests, sep="")
pie3D(arrests, labels = lbls, 
      main="Arrests results (1 = True, 0 = False) from Crimes commited ")





Out of all the crimes committed, only 26% of the crimes resulted in an arrest
domestic <- table(crimes_2018_19$Domestic)
lbls <- paste(names(domestic), "\n", domestic, sep="")
pie(domestic, labels = lbls, 
    main="Domestic results (1 = True, 0 = False) for Crimes commited ")















Out of all the crimes, only about 15% where domestic
#levels(crimes_2018_19$IUCR) #353 Levels
top10_iucr <- tail(names(sort(table(crimes_2018_19$IUCR))), 10)
iucr_raw <- table(crimes_2018_19$IUCR)
barplot(iucr_raw[order(iucr_raw, decreasing = TRUE)], xlim = c(0,11))

The most common IUCR Codes - Theft of $500 and under - Domestic battery - Battery robbery - Damage to property
#levels(crimes_2018_19$Description) #340 Levels
top10_description <- tail(names(sort(table(crimes_2018_19$Description))), 10)
head(top10_description)
 
[1] "FORCIBLE ENTRY" "AUTOMOBILE"     "FROM BUILDING"  "RETAIL THEFT"   "OVER $500"      "TO VEHICLE"    

It is clear that the top Descriptions of crimes are: Building, automobile, and forcible entry. This is key for the Chicago Police Department
#levels(crimes_2018_19$Location.Description) #141 Levels
top10_location_description <- tail(names(sort(table(crimes_2018_19$LocationDescription))), 10)
head(top10_location_description)

 
 
 
 [1] "VEHICLE NON-COMMERCIAL"         "DEPARTMENT STORE"               "PARKING LOT/GARAGE(NON.RESID.)"
[4] "RESTAURANT"                     "SMALL RETAIL STORE"             "OTHER"
 



















The top locations where crimes occur include school, retail stores, and residentail yards. This is important because the police can pay more attention to these areas
crimes_2018_19$Beat <- as.factor(crimes_2018_19$Beat) #Put this at the beggining of the report
#levels(crimes_2018_19$Beat) #289 Levels
top10_beat <- tail(names(sort(table(crimes_2018_19$Beat))), 10)
beat_raw <- table(crimes_2018_19$Beat)
barplot(beat_raw[order(beat_raw, decreasing = TRUE)], xlim = c(0,11))
 


Most common beat (a small police geographical area). Additional reference for beats: https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Beats-current-/aerh-rz74
crimes_2018_19$District <- as.factor(crimes_2018_19$District) #Put this at the beggining of the report
#levels(crimes_2018_19$District) #23 Levels
top10_district <- tail(names(sort(table(crimes_2018_19$District))), 10)
district_raw <- table(crimes_2018_19$District)
barplot(district_raw[order(district_raw, decreasing = TRUE)], xlim = c(0,11))






These are the most common Districts where crimes occur in Chicago - District 11 - District 8 - District 6
crimes_2018_19$Ward <- as.factor(crimes_2018_19$Ward) #Put this at the beggining of the report
#levels(crimes_2018_19$Ward) #45 Levels
top10_ward <- tail(names(sort(table(crimes_2018_19$Ward))), 10)
ward_raw <- table(crimes_2018_19$Ward)
barplot(ward_raw[order(ward_raw, decreasing = TRUE)], xlim = c(0,11))






The Ward is simply the City Council District where the crimes occurs. Additional visual reference: https://data.cityofchicago.org/d/sp34-6z76.
crimes_2018_19$Community.Area <- as.factor(crimes_2018_19$CommunityArea) #Put this at the beggining of the report
#levels(crimes_2018_19$Community.Area) #67 Levels
top10_community_area <- tail(names(sort(table(crimes_2018_19$CommunityArea))), 10)
community_area_raw <- table(crimes_2018_19$CommunityArea)
barplot(community_area_raw[order(community_area_raw, decreasing = TRUE)], xlim = c(0,11))

These areas also represent where the most common crimes occur. As you might observe, Area 25 is increasingly high compared to other areas. This is a great piece of information that the police should now and pay attention to.
crimes_2018_19$FBICode <- as.factor(crimes_2018_19$FBICode) #Put this at the beggining of the report
#levels(crimes_2018_19$FBI.Code) #19 Levels
top10_fbi_code <- tail(names(sort(table(crimes_2018_19$FBICode))), 10)
fbi_raw <- table(crimes_2018_19$FBICode)
barplot(fbi_raw[order(fbi_raw, decreasing = TRUE)], xlim = c(0,11))


The most common FBI Codes - Larceny - Simple Battery – Vandalism
6.     Results

·        Ward   Histrogram
hist(crimes_2017$Wards, breaks=100)


·        Count No Of Ward
b1=sqldf("SELECT count(*) FROM crimes_2017 WHERE Ward BETWEEN 1 AND 30")
b1
b2=sqldf("SELECT count(*) FROM crimes_2017 WHERE Ward BETWEEN 40 AND 60")
b2
b3=sqldf("select count(*) From crimes_2017 WHERE Ward BETWEEN 45 AND 65")
b3
b50=c(b1$`count(*)`,b2$`count(*)`,b3$`count(*)`)
b50
barplot(b50,names.arg = "Beat", main = "Year",ylim = c(-10,5000),col=rainbow(10),las=2)
ans=sqldf('select count(Year)"Total_Year",Year from crimes_2017 group by IUCR')
ans

·        Beat
cl = c("red","blue","yellow")
reg=c("8","12")

reg1=c(q20$`count(Beat)`,ans5$`count(Beat)`)
reg1

barplot(reg1,names.arg =reg,col=cl,xlab="Ward",ylab = "Number of Year",ylim = c(-10,200))



















·        Year
no_cyl <- data.frame(table(combi$Year))
barplot(table(combi$Year),col=cl)













·        Rnorm Fynction
e <- data.frame(f=rnorm(1000))
g <- ggplot(data=e, aes(x=f))
g <- g + geom_histogram(bins = 30)
g







·        Ward ggplot
no_cyl <- data.frame(table(combi$Ward))
barplot(table(combi$Year),col=cl)

g <- ggplot(no_cyl, aes(x=Var1, y=Freq, fill = Var1))
g <- g + geom_bar(stat = "identity")
g <- g  + labs(x="No. of Cylinders", ylab="Frequency", title="Bar Plot")
g




·        Beat Boxplot
boxplot(combi$Beat)










·        FBI CODE
counts <- table(combi$FBICode)
barplot(counts, main="FBI CODE",  xlab="Number of FBI CODE")










·        Barplot  Description And Beat
c1=c5$Description
c2=c5$`count(Beat)`
barplot(c2,names.arg = c1,col=rainbow(length(c1)),las=2,xlab="City",ylab = "number of Beat")






·        Null Value For Description
res1 <- sqldf("select 100*sum(Beat=='NULL')/count(Beat) as 'NULL1',100*sum(Beat!='NULL')/count(Beat) as 'NOTNULL' from crimes_2017;")
res2=res1$NULL1
res3=res1$NOTNULL
res4=rbind(res2,res3)
co=c("black","Red")
pie(res4,col=co,main="Null Vlaues from Description")
pielabels=c("98.68% - Null","1.32% - Not Null")
legend("bottomright",legend=pielabels,bty="n",fill=co,cex=0.5)


·        Primary Type Wise Location Description
Description_m=sqldf("select count(*) from crimes_2017 where LocationDescription='STREET' AND PrimaryType='BATTERY'")
Description_m
Description_f=sqldf("select count(*) from crimes_2017 where LocationDescription='STREET' AND PrimaryType='CRIMINAL DAMAGE'")
Description_f
Description_m1=sqldf("select count(*) from crimes_2017 where LocationDescription='RESIDENCE' AND PrimaryType='BATTERY'")
Description_m1
Description_f1=sqldf("select count(*) from crimes_2017 where LocationDescription='RESIDENCE' AND PrimaryType='CRIMINAL DAMAGE'")
Description_f1
reg=c("BATTERYSTREET","CRIMINAL DAMAGESTREET","BATTERYRESIDENCE","CRIMINAL DAMAGERESIDENCE")
reg1_m_f=c(Description_m$`count(*)`,Description_f$`count(*)`,Description_m1$`count(*)`,Description_f1$`count(*)`)
barplot(reg1_m_f,names.arg = reg,main = "PrimaryType wise LocationDescription",xlab="PrimaryType",ylab="LocationDescription",ylim = c(-10,300),col=rainbow(length(reg1_m_f)),las=2,cex.names = 0.8)







·        Ward Barplot
reg_District=(q10$`min(Wards)`)
reg_District
barplot(reg_District,names.arg =District,main = "Minimum Weight of year 1960 to 2016",ylim = c(0,30),ylab = "Height",xlab = "Year",col = rainbow(length(District)),las=1)


·        Wards between 10 to 55
x=crimes_2017$Wards
x=hist(x,main = "Wards between 10 to 55",xlab = "Wards",ylab = "Wards related frquency",col = rainbow(length(x)),xlim =c(1,70),ylim = c(1,1000))


·        CommunityArea between 10 to 80
x=crimes_2017$CommunityArea
x=hist(x,main = "CommunityArea between 10 to 80",xlab = "Wards",ylab = "CommunityArea related frquency",col = rainbow(length(x)),xlim =c(1,80),ylim = c(1,1000))

·        Matrix Using Barplot

data=c(120,23,56,98,55,66,44,120,200,300,21,36,52,32)
m=matrix(data,nrow=2,ncol=7,byrow = TRUE)
barplot(m,names.arg=x1,las=2,col=rainbow(2),ylim = c(0,500))
l=c('FBICode','Beat');
legend("topleft",l,fil=rainbow(2),cex=0.3)


·        count (Beat), District from Crimes_2017 group by District
ans=sqldf("select count(Beat),District from crimes_2017 group by District")
y=ans$`count(Beat)`
x=ans$District
barplot(y,names.arg=x,col=rainbow(length(x)),las=2,ylim = c(0,500))







·        District Pie Chart

a=g$District
District=tail(a,7)
pie(District,c1,col=rainbow(length(c1)))












Conclusion:-
·        There is no absolutely safe in every area.
·        Taking care of yourself is very important. We cannot always rely on police.
·        It can be seen that the threat of computer crime is not as big as the authority claim.
·        This means that the method s that they introducing to combat it represents an unwarranted attack on human rights and is not proportionate to the threat posed by cyber-criminals.
·        Part of the problem is that there are no reliable statistics on the problem; this means that it is hard to justify the increased powers that the Regulation of Investigatory Powers Act has given to the authorities.
·        These powers will also be ineffective in dealing with the problem of computer.
·        The international treaties being drawn up to deal with it are so vague that they are bound to be ineffective in dealing with the problem.
·        It will also mean the civil liberties will be unjustly affected by the terms of the treaties since they could, conceivably, imply that everybody who owns a computer fitted with a modem could be suspected of being a hacker.
·        The attempts to outlaw the possession of hacking software could harm people who trying to make the internet more secure as they will not be able to test there systems; therefore the legislation could do more harm than good



References:-

          https://rpubs.com/
               https://r4ds.had.co.nz/
               https://rstudio.github.io/leaflet/
              
           
























Image result for Thank You

                                       

No comments:

Post a Comment