ABSTRACT
Steganography is the art and science of writing hidden messages
in such a way that no one apart from the intended recipient knows
of the existence of the message; this is in contrast to cryptography,
where the existence of the message itself is not disguised, but the
meaning is obscured.
Since everyone can read, encoding text
in neutral sentences is doubtfully effective
Since Everyone Can Read, Encoding Text
In Neutral Sentences Is Doubtfully Effective
= Secret inside
                   INDEX
    CONTENTS
   CERTIFICATE
   ACKNOWLEDGMENT
   ABSTRACT
   INTRODUCTION
   TERMINOLOGY
   TYPES OF STEGANOGRAPHY
   REPLACEMENT OF LSB
   STRUCTURE OF 8-BIT BMP
   STRUCTURE OF 24-BIT BMP
   STRUCTURE OF HEADER
    PROCEDURE FOLLOWED
   WORKING OF THE PROJECT
                      2
                      INTRODUCTION
        Steganography is the art and science of writing hidden
messages in such a way that no one apart from the intended
recipient knows of the existence of the message; this is in contrast
to cryptography , where the existence of the message itself is not
disguised, but the meaning is obscured.
Generally, a steganographic message will appear to be something
else: a picture, an article, a shopping list, or some other message -
the covertext. Classically, it may be hidden by using invisible ink
between the visible lines of innocuous documents, or even written
onto clothing. In WW2 a message was once written in morse code
along two-colored knitting yarn. Another method is invisible ink
underlining, or simply pin pricking of individual letters in a
newspaper article, thus forming a message. It may even be a few
words written under a postage stamp, the stamp then being the
covertext.
The advantage of steganography over cryptography alone is that
messages do not attract attention to themselves, to messengers, or
to recipients. An unhidden coded message, no matter how
unbreakable it is, will arouse suspicion and may in itself be
incriminating. In some countries encryption is illegal.
Steganography uses in electronic communication include
steganographic coding inside of a transport layer such an MP3 file
or a protocol such as UDP.
                                  3
So, this is all about the introduction of steganography i.e what
steganography exactly is.
Techniques for data hiding
Digital representation of media facilitates access and potentially
improves the portability, efficiency, and accuracy of the
information presented. Undesirable effects of facile data access
include an increased opportunity for violation of copyright and
tampering with or modification of content. The motivation for this
work includes the provision of protection of intellectual property
rights, an indication of content manipulation, and a means of
annotation. Data hiding represents a class of processes used to
embed data, such as copyright information, into various forms of
media such as image, audio, or text with a minimum amount of
perceivable degradation to the "host" signal; i.e., the embedded
data should be invisible and inaudible to a human observer. Note
that data hiding, while similar to compression, is distinct from
encryption. Its goal is not to restrict or regulate access to the host
signal, but rather to ensure that embedded data remain inviolate
and recoverable.
Two important uses of data hiding in digital media are to provide
proof of the copyright, and assurance of content integrity.
Therefore, the data should stay hidden in a host signal, even if that
signal is subjected to manipulation as degrading as filtering,
resampling, cropping, or lossy data compression. Other
applications of data hiding, such as the inclusion of augmentation
data, need not be invariant to detection or removal, since these data
are there for the benefit of both the author and the content
consumer. Thus, the techniques used for data hiding vary
depending on the quantity of data being hidden and the required
invariance of those data to manipulation. Since no one method is
                                  4
capable of achieving all these goals, a class of processes is needed
to span the range of possible applications.
The technical challenges of data hiding are formidable. Any
"holes" to fill with data in a host signal, either statistical or
perceptual, are likely targets for removal by lossy signal
compression. The key to successful data hiding is the finding of
holes that are not suitable for exploitation by compression
algorithms. A further challenge is to fill these holes with data in a
way that remains invariant to a large class of host signal
transformations.
                  Features and applications
Data-hiding techniques should be capable of embedding data in a
host signal with the following restrictions and features:
  1. The host signal should be nonobjectionally degraded and the
     embedded data should be minimally perceptible. (The goal is
     for the data to remain hidden. As any magician will tell you,
     it is possible for something to be hidden while it remains in
     plain sight; you merely keep the person from looking at it.
     We will use the words hidden, inaudible, imperceivable, and
     invisible to mean that an observer does not notice the
     presence of the data, even if they are perceptible.)
  2. The embedded data should be directly encoded into the
     media, rather than into a header or wrapper, so that the data
     remain intact across varying data file formats.
  3. The embedded data should be immune to modifications
     ranging from intentional and intelligent attempts at removal
     to anticipated manipulations, e.g., channel noise, filtering,
     resampling, cropping, encoding, lossy compressing, printing
     and scanning, digital-to-analog (D/A) conversion, and
     analog-to-digital (A/D) conversion, etc.
                                  5
  4. Asymmetrical coding of the embedded data is desirable,
     since the purpose of data hiding is to keep the data in the host
     signal, but not necessarily to make the data difficult to
     access.
  5. Error correction coding [1] should be used to ensure data
     integrity. It is inevitable that there will be some degradation
     to the embedded data when the host signal is modified.
  6. The embedded data should be self-clocking or arbitrarily re-
     entrant. This ensures that the embedded data can be
     recovered when only fragments of the host signal are
     available, e.g., if a sound bite is extracted from an interview,
     data embedded in the audio segment can be recovered. This
     feature also facilitates automatic decoding of the hidden data,
     since there is no need to refer to the original host signal.
Applications. Trade-offs exist between the quantity of embedded
data and the degree of immunity to host signal modification. By
constraining the degree of host signal degradation, a data-hiding
method can operate with either high embedded data rate, or high
resistance to modification, but not both. As one increases, the other
must decrease. While this can be shown mathematically for some
data-hiding systems such as a spread spectrum, it seems to hold
true for all data-hiding systems. In any system, you can trade
bandwidth for robustness by exploiting redundancy. The quantity
of embedded data and the degree of host signal modification vary
from application to application. Consequently, different techniques
are employed for different applications. Several prospective
applications of data hiding are discussed in this section.
An application that requires a minimal amount of embedded data is
the placement of a digital water mark. The embedded data are used
to place an indication of ownership in the host signal, serving the
same purpose as an author's signature or a company logo. Since the
information is of a critical nature and the signal may face
intelligent and intentional attempts to destroy or remove it, the
                                  6
coding techniques used must be immune to a wide variety of
possible modifications.
A second application for data hiding is tamper-proofing. It is used
to indicate that the host signal has been modified from its authored
state. Modification to the embedded data indicates that the host
signal has been changed in some way.
A third application, feature location, requires more data to be
embedded. In this application, the embedded data are hidden in
specific locations within an image. It enables one to identify
individual content features, e.g., the name of the person on the left
versus the right side of an image. Typically, feature location data
are not subject to intentional removal. However, it is expected that
the host signal might be subjected to a certain degree of
modification, e.g., images are routinely modified by scaling,
cropping, and tone-scale enhancement. As a result, feature location
data-hiding techniques must be immune to geometrical and
nongeometrical modifications of a host signal.
Image and audio captions (or annotations) may require a large
amount of data. Annotations often travel separately from the host
signal, thus requiring additional channels and storage. Annotations
stored in file headers or resource sections are often lost if the file
format is changed, e.g., the annotations created in a Tagged Image
File Format (TIFF) may not be present when the image is
transformed to a Graphic Interchange Format (GIF). These
problems are resolved by embedding annotations directly into the
data structure of a host signal.
Prior work. Adelson [2] describes a method of data hiding that
exploits the human visual system's varying sensitivity to contrast
versus spatial frequency. Adelson substitutes high-spatial
frequency image data for hidden data in a pyramid-encoded still
image. While he is able to encode a large amount of data
efficiently, there is no provision to make the data immune to
                                  7
detection or removal by typical manipulations such as filtering and
rescaling. Stego, [3] one of several widely available software
packages, simply encodes data in the least-significant bit of the
host signal. This technique suffers from all of the same problems
as Adelson's method but creates an additional problem of
degrading image or audio quality. Bender [4] modifies Adelson's
technique by using chaos as a means to encrypt the embedded data,
deterring detection, but providing no improvement to immunity to
host signal manipulation. Lippman [5] hides data in the
chrominance channel of the National Television Standards
Committee (NTSC) television signal by exploiting the temporal
over-sampling of color in such signals. Typical of Enhanced
Definition Television Systems, this method encodes a large
amount of data, but the data are lost to most recording,
compression, and transcoding processes. Other techniques, such as
Hecht's Data-Glyph, [6] which adds a bar code to images, are
engineered in light of a predetermined set of geometric
modifications. [7] Spread-spectrum, [8-11] a promising technology
for data hiding, is difficult to intercept and remove but often
introduces perceivable distortion into the host signal.
Problem space. Each application of data hiding requires a
different level of resistance to modification and a different
embedded data rate. These form the theoretical data-hiding
problem space (see Figure 1). There is an inherent trade-off
between bandwidth and "robustness," or the degree to which the
data are immune to attack or transformations that occur to the host
signal through normal usage, e.g., compression, resampling, etc.
The more data to be hidden, e.g., a caption for a photograph, the
less secure the encoding. The less data to be hidden, e.g., a
watermark, the more secure the encoding.
                                 8
               Figure 1
                 Data hiding in still images
Data hiding in still images presents a variety of challenges that
arise due to the way the human visual system (HVS) works and the
typical modifications that images undergo. Additionally, still
images provide a relatively small host signal in which to hide data.
A fairly typical 8-bit picture of 200 200 pixels provides
approximately 40 kilobytes (kB) of data space in which to work.
This is equivalent to only around 5 seconds of telephone-quality
audio or less than a single frame of NTSC television. Also, it is
reasonable to expect that still images will be subject to operations
ranging from simple affine transforms to nonlinear transforms such
as cropping, blurring, filtering, and lossy compression. Practical
data-hiding techniques need to be resistant to as many of these
transformations as possible.
Despite these challenges, still images are likely candidates for data
hiding. There are many attributes of the HVS that are potential
candidates for exploitation in a data-hiding system, including our
varying sensitivity to contrast as a function of spatial frequency
and the masking effect of edges (both in luminance and
chrominance). The HVS has low sensitivity to small changes in
luminance, being able to perceive changes of no less than one part
in 30 for random patterns. However, in uniform regions of an
image, the HVS is more sensitive to the change of the luminance,
approximately one part in 240. A typical CRT (cathode ray tube)
display or printer has a limited dynamic range. In an image
representation of one part in 256, e.g., 8-bit gray levels, there is
                                  9
potentially room to hide data as pseudorandom changes to picture
brightness. Another HVS "hole" is our relative insensitivity to very
low spatial frequencies such as continuous changes in brightness
across an image, i.e., vignetting. An additional advantage of
working with still images is that they are noncausal. Data-hiding
techniques can have access to any pixel or block of pixels at
random.
                       Data hiding in text
Soft-copy text is in many ways the most difficult place to hide
data. (Hard-copy text can be treated as a highly structured image
and is readily amenable to a variety of techniques such as slight
variations in letter forms, kerning, baseline, etc.) This is due
largely to the relative lack of redundant information in a text file as
compared with a picture or a sound bite. While it is often possible
to make imperceptible modifications to a picture, even an extra
letter or period in text may be noticed by a casual reader. Data
hiding in text is an exercise in the discovery of modifications that
are not noticed by readers. We considered three major methods of
encoding data: open space methods that encode through
manipulation of white space (unused space on the printed page),
syntactic methods that utilize punctuation, and semantic methods
that encode using manipulation of the words themselves.
Open space methods. There are two reasons why the
manipulation of white space in particular yields useful results.
First, changing the number of trailing spaces has little chance of
changing the meaning of a phrase or sentence. Second, a casual
reader is unlikely to take notice of slight modifications to white
                                  10
space. We describe three methods of using white space to encode
data. The methods exploit inter-sentence spacing, end-of-line
spaces, and inter-word spacing in justified text.
The first method encodes a binary message into a text by placing
either one or two spaces after each terminating character, e.g., a
period for English prose, a semicolon for C-code, etc. A single
space encodes a "0," while two spaces encode a "1." This method
has a number of inherent problems. It is inefficient, requiring a
great deal of text to encode a very few bits. (One bit per sentence
equates to a data rate of approximately one bit per 160 bytes
assuming sentences are on average two 80-character lines of text.)
Its ability to encode depends on the structure of the text. (Some
text, such as free-verse poetry, lacks consistent or well-defined
termination characters.) Many word processors automatically set
the number of spaces after periods to one or two characters.
Finally, inconsistent use of white space is not transparent.
A second method of exploiting white space to encode data is to
insert spaces at the end of lines. The data are encoded allowing for
a predetermined number of spaces at the end of each line (see
Figure 29). Two spaces encode one bit per line, four encode two,
eight encode three, etc., dramatically increasing the amount of
information we can encode over the previous method. In Figure 29,
the text has been selectively justified, and has then had spaces
added to the end of lines to encode more data. Rules have been
added to reveal the white space at the end of lines. Additional
advantages of this method are that it can be done with any text, and
it will go unnoticed by readers, since this additional white space is
peripheral to the text. As with the previous method, some
programs, e.g., "sendmail," may inadvertently remove the extra
space characters. A problem unique to this method is that the
hidden data cannot be retrieved from hard copy.
                                  Figure 29
                                 11
A third method of using white space to encode data involves right-
justification of text. Data are encoded by controlling where the
extra spaces are placed. One space between words is interpreted as
a "0." Two spaces are interpreted as a "1." This method results in
several bits encoded on each line (see Figure 30). Because of
constraints upon justification, not every inter-word space can be
used as data. In order to determine which of the inter-word spaces
represent hidden data bits and which are part of the original text,
we have employed a Manchester-like encoding method.
Manchester encoding groups bits in sets of two, interpreting "01"
as a "1" and "10" as a "0." The bit strings "00" and "11" are null.
For example, the encoded message "1000101101" is reduced to
"001," while "110011" is a null string.
                               Figure 30
Open space methods are useful as long as the text remains in an
ASCII (American Standard Character Interchange) format. As
mentioned above, some data may be lost when the text is printed.
Printed documents present opportunities for data hiding far beyond
the capability of an ASCII text file. Data hiding in hard copy is
accomplished by making slight variations in word and letter
spacing, changes to the baseline position of letters or punctuation,
changes to the letter forms themselves, etc. Also, image data-
hiding techniques such as those used by Patchwork can be
modified to work with printed text.
Syntactic methods. That white space is considered arbitrary is
both its strength and its weakness where data hiding is concerned.
While the reader may not notice its manipulation, a word processor
may inadvertently change the number of spaces, destroying the
hidden data. Robustness, in light of document reformatting, is one
reason to look for other methods of data hiding in text. In addition,
the use of syntactic and semantic methods generally does not
                                 12
interfere with the open space methods. These methods can be
applied in parallel.
There are many circumstances where punctuation is ambiguous or
when mispunctuation has low impact on the meaning of the text.
For example, the phrases "bread, butter, and milk" and "bread,
butter and milk" are both considered correct usage of commas in a
list. We can exploit the fact that the choice of form is arbitrary.
Alternation between forms can represent binary data, e.g., anytime
the first phrase structure (characterized by a comma appearing
before the "and") occurs, a "1" is inferred, and anytime the second
phrase structure is found, a "0" is inferred. Other examples include
the controlled use of contractions and abbreviations. While written
English affords numerous cases for the application of syntactic
data hiding, these situations occur infrequently in typical prose.
The expected data rate of these methods is on the order of only
several bits per kilobyte of text.
Although many of the rules of punctuation are ambiguous or
redundant, inconsistent use of punctuation is noticeable to even
casual readers. Finally, there are cases where changing the
punctuation will impact the clarity, or even meaning, of the text
considerably. This method should be used with caution.
Syntactic methods include changing the diction and structure of
text without significantly altering meaning or tone. For example,
the sentence "Before the night is over, I will have finished" could
be stated "I will have finished before the night is over." These
methods are more transparent than the punctuation methods, but
the opportunity to exploit them is limited.
Semantic methods. A final category of data hiding in text
involves changing the words themselves. Semantic methods are
similar to the syntactic method. Rather than encoding binary data
by exploiting ambiguity of form, these methods assign two
synonyms primary or secondary value. For example, the word
                                 13
"big" could be considered primary and "large" secondary. Whether
a word has primary or secondary value bears no relevance to how
often it will be used, but, when decoding, primary words will be
read as ones, secondary words as zeros (see Table 3).
Word webs such as WordNet can be used to automatically generate
synonym tables. Where there are many synonyms, more than one
bit can be encoded per substitution. (The choice between
"propensity," "predilection," "penchant," and "proclivity"
represents two bits of data.) Problems occur when the nuances of
meaning interfere with the desire to encode data. For example,
there is a problem with choice of the synonym pair "cool" and
"chilly." Calling someone "cool" has very different connotations
than calling them "chilly." The sentence "The students in line for
registration are spaced-out" is also ambiguous.
                                14
                      TERMINOLOGY
In general, terminology analogous to (and consistent with) more
conventional radio and communications technology is used;
however, a brief description of some terms which show up in
software specifically, and are easily confused, is appropriate.
These are most relevant to digital steganographic systems.
The payload is the data it is desirable to transport (and, therefore,
to hide). The carrier is the signal, stream, or data file into which the
payload is hidden; contrast "channel" (typically used to refer to the
type of input, such as "a JPEG image"). The resulting signal,
stream, or data file which has the payload encoded into it is
sometimes referred to as the package. The percentage of bytes,
samples, or other signal elements which are modified to encode the
payload is referred to as the encoding density and is typically
expressed as a floating-point number between 0 and 1.
In a set of files, those files considered likely to contain a payload
are called suspects. If the suspect was identified through some type
of statistical analysis, it may be referred to as a candidate.
                                  15
            TYPES OF STEGANOGRAPHY
Pure
       The most vulnerable. If an attacker learns how the
       information is hidden, he can read it.
Secret-Key
       Encrypt the payload using a shared key and then convert to
       stego-text.
Public-Key
       Encrypt the payload using a public key and then convert to
       stego-text.
                                 16
                REPLACEMENT OF LSB
We have used replacement of least significant bit algorithm in our
project. Here the least significant bit of every byte of the carrier is
replaced by one bit of the payload.
Each bit of the payload is read and stored sequentially in the lsb of
header which is explained further.
We have primarily used 8-bit BMP as well as 24-bit BMP images
as our carrier and we have successfully embedded string of
characters in the given image.
With the help of this process we can embed at most 256 bytes of
data and there is no limit for 24 bit but in oue project we limited it
around 2000 bytes of data for 24-bit.
                                  17
     STRUCTURE OF A 8-BIT BMP IMAGE
Every 8-bit BMP image consist of a header of size 1078 bytes
followed by number of bytes equal to the size of the image. Hence
size of each 8-bit BMP image is equal to 1078 bytes plus the total
number of pixels defined by the size of the image.
                                18
            STRUCTURE OF 24-BIT BMP
The structure of 24-bit BMP is same as that of 8-bit BMP upto first
54 bytes that defines the format, shape, size and structure of the
24-bit BMP image.
In a 24-bit BMP image we have three colors that is red, green and
blue i.e RGB that repeats themselves in a fashion from R to B and
this repetition goes according to the number of pixels that defines
our 24-bit BMP image.
For example, if we have a 24-bit BMP image of size 800*600 then
the number of pixels for this particular image can be calculated
according to the following formula:
   Number of pixels= size of the image * 3 // three indicates the
three colors RGB
So for the above example we have=480000*3=1440000
That means 1440000 are the number of pixels that define the above
24-bit BMP. Same is true for any 24-bit BMP of any size.
                                19
          STRUCTURE OF THE HEADER
First 54 bytes of the header defines the following:
char array of size two: Type of the bitmap. Set to BM
UNSIGNED LONG INT size;                    Size of the file
UNSIGNED LONG INT reserved;
UNSIGNED LONG INT offset:               Offset of the data
                                      from this position
UNSIGNED LONG INT headersize: Size of the rest
                                          of the header
UNSIGNED LONG INT width:              Width of the bitmap
UNSIGNED LONG INT length: Height of the bitmap
UNSIGNED SHORT INT planes:                  Planes of the
                                                 bitmap
UNSIGNED SHORT INT bitsperpixel: number of bits
                                               per pixel
UNSIGNED LONG INT compression:                    Type of
                         compression. Usually set to 0
UNSIGNED LONG INT sizeimage: Size of the image
UNSIGNED LONG INT xpixelspermeter: Number of
                                        pixel per meter
UNSIGNED LONG INT ypixelspermeter:
UNSIGNED LONG INT colorsused: Number of colors used
UNSIGNED LONG INT colorsimportant: Number of
                          colors considered important
                               20
The next 1024 bytes defines the 256 colors contained in the image.
These 256 colors are defined in the following manner starting from
byte number 55 till 1078.
Byte number 55 defines a shade of RED.
Byte number 56 defines a shade of GREEN.
Byte number 57 defines a shade of BLUE.
Byte number 58 is free.
This block of 4 bytes repeats till byte number 1078.
                                21
              PROCEDURE FOLLOWED
We took a static character array of size 2000 bytes and asked the
user to input any data not exceeding 512 character in case of 8-bit
and 2000 characters incase of 24-bit BMP(since each character is
of one byte).
Embedding of data:
We now embed each bit of the inputted character array into the
least significant bit of each byte sequentially starting from byte
number 55 till byte number 1078 in case of 8-bit and in case of 24-
bit the last byte upto which we can embed data is not 2078 but that
is upto EOF(end of file).
Recreation of the image containing data:
The first 54 bytes of the header are copied as it is. The next 1024
bytes of the header are replaced by
the manipulated bytes. The remaining bytes of the image are again
copied as it is.
Retrieval of the data:
The data is retrieved by manipulating the byte number from 55 to
1078 and storing it in a static array.
And in case of 24-bit the manipulation starts from byte number 55
and goes upto that byte at which the embedding had been finished.
                                 22
            WORKING OF THE PROJECT
       Our project has been made to let the user to embed data(any
text, number or any character) in 8- bit BMP and 24-bit BMP, and
to embed a 8-bit BMP image in a 24-bit BMP image.
       This is a very user friendly project or can say a type of
software that let the user to hide its secret information in the
images (only BMP). Not only the embedding is easier but also the
retrieval of the data.
      The user has to enter only the text or information that he
wants to hide or the information that he wants to send to his friend
so that nobody can even detect that what information you sent to
your friend.
     The user can even hide an 8-bit bmp image with in a 24-bit
bmp image with the help of this software.
      When the user runs the program or software a window with a
menu on it will open. This menu asks the user to select the one of
the BMP formats from the two options that he is seeing.
The menu looks like this:
Please select from one of the BMP image formats in which you
like to embed your data or image:
1.8-bit BMP
2.24-bit BMP
If the user select option one, then the user is free to embed around
512 characters in an 8-bit BMP image. 512 characters because if
the user enter the characters less then 256 then the embedding
takes place according to the 2-bit replacement method but if the no.
of characters are greater than 256 then embedding will takes place
according to the 4-bit replacement method.
                                 23
But if the user has entered option 2 then again a menu with options
is going to be appeared and the menu looks like this:
Please select from the following what you want to embed in a 24-
bit BMP:
   1. text data
   2. 8-bit image
Now if the user select option one then he is free to embed as many
characters as he wants ,though in our project since we set the limit
of static char array around 2000, so the user is only free to embed
upto 2000 characters .
 But if the user selects option 2 then an 8-bit image is going to
embed in the given 24-bit BMP.
The embedding in 24-bit BMP is done according to the 2-bit
replacement method.
So this is all about the embedding part. since the user has
embedded the data , so he also want to retrieve the data as it is as
he embedded for that we have for the user our retrieval program of
the project that is supposed to serve for this purpose of the user.
Now when the user runs the retrieval program then a menu is going
to be displayed on the screen. This menu is something looks like
this:
Please select from the following images from which you want to
retrieve the data:
1. result1.bmp
2. result2.bmp
3. result3.bmp
4. result4.bmp
If the user selects first option then he is going to retrieve the data
from the file in which the data had been embedded using 4-bit
replacement method because the number of characters entered by
the user at that time were more than 256.
                                   24
 If the user selects second option then he is going to retrieve the
data from that file in which the data had been embedded using 2-
bit replacement method because the number of characters entered
by the user at that time were less than 256.
If the user selects third option then he is going to retrieve the data
from the file in which the embedding was done in 24-bit image.
And if the user selects fourth option then he is going to get back
the image that was
embedded in 24-bit image using 2-bit replacement method.
So ,this is what our project is supposed to do.
                                   25