Tuesday, November 23, 2010

Optimization -05


Tips for Optimizing C++ Code

12. Careful when declaring C++ object variables.
  • Use initialization instead of assignment (Color c(black); is faster than Color c; c = black;).

13. Make default class constructors as lightweight as possible.
  • Particularly for simple, frequently used classes (e.g., color, vector, point, etc.) that are manipulated frequently.
  • These default constructors are often called behind your back, where you are not expecting it.
  • Use constructor initializer lists. (Use Color::Color() : r(0), g(0), b(0) {} rather than Color::Color() { r = g = b = 0; } .)

16. For most classes, use the operators +=, -=, *=, and /=, instead of the operators +, -, *, and /.
  • The simple operations need to create an unnamed, temporary intermediate object.
  • For instance: Vector v = Vector(1,0,0) + Vector(0,1,0) + Vector(0,0,1); creates five unnamed, temporary Vectors: Vector(1,0,0), Vector(0,1,0), Vector(0,0,1), Vector(1,0,0) + Vector(0,1,0), and Vector(1,0,0) + Vector(0,1,0) + Vector(0,0,1).
  • The slightly more verbose code: Vector v(1,0,0); v+= Vector(0,1,0); v+= Vector(0,0,1); only creates two temporary Vectors: Vector(0,1,0) and Vector(0,0,1). This saves 6 functions calls (3 constructors and 3 destructors).

18. Delay declaring local variables.
  • Declaring object variable always involves a function call (to the constructor).
  • If a variable is only needed sometimes (e.g., inside an if statement) only declare when necessary, so the constructor is only called if the variable will be used.

19. For objects, use the prefix operator (++obj) instead of the postfix operator (obj++).
  • This probably will not be an issue in your ray tracer.
  • A copy of the object must be made with the postfix operator (which thus involves an extra call the the constructor and destructor), whereas the prefix operator does not need a temporary copy.

20. Careful using templates.
  • Optimizations for various instantiations may need to be different!
  • The standard template library is reasonably well optimized, but I would avoid using it if you plan to implement an interactive ray tracer.
  • Why? By implementing it yourself, you'll know the algorithms it uses, so you will know the most efficient way to use the code.
  • More importantly, my experience is that debug compiles of STL libraries are slow. Normally this isn't a problem, except you will be using debug versions for profiling. You'll find STL constructors, iterators, etc. use 15+% of your run time, which can make reading the profile output more confusing.



    .

Optimization -04


Tips for Optimizing C Code

1. Remember Ahmdal's Law:
  • Where funccost is the percentage of the program runtime used by the function func, and funcspeedup is the factor by which you speedup the function.
  • Thus, if you optimize the function func(), which is 40% of the runtime, so that it runs twice as fast, your program will run 25% faster .
  • This means infrequently used code (e.g., the scene loader) probably should be optimized little (if at all).

2. First Code for correctness, then optimize!
  • This does not mean write a fully functional ray tracer for 8 weeks, then optimize for 8 weeks!
  • Perform optimizations on your ray tracer in multiple steps.
  • Write for correctness, and then if you know the function will be called frequently, perform obvious optimizations.
  • Then profile to find bottlenecks, and remove the bottlenecks (by optimization or by improving the algorithm). Often improving the algorithm drastically changes the bottleneck – perhaps to a function you might not expect. This is a good reason to perform obvious optimizations on all functions you know will be frequently used.

3. People I know who write very efficient code say- they spend at least twice as long optimizing code as they spend writing code.

4. Jumps/branches are expensive. Minimize their use whenever possible.
  • Function calls require two jumps, in addition to stack memory manipulation.
  • Prefer iteration over recursion.
  • Use inline functions for short functions to eliminate function overhead.
  • Move loops inside function calls
    e.g., change for(i=0;i<100;i++){ ... } DoSomething();
    into DoSomething() { for(i=0;i<100;i++) { ... } } .
  • Long if...else if...else if...else if... chains require lots of jumps for cases near the end of the chain (in addition to testing each condition). If possible, convert to a switch statement, which the compiler sometimes optimizes into a table lookup with a single jump. If a switch statement is not possible, put the most common clauses at the beginning of the if chain.

5. Think about the order of array indices.
  • Two and higher dimensional arrays are still stored in one dimensional memory. This means (for C/C++ arrays) array[i][j] and array[i][j+1] are adjacent to each other, whereas array[i][j] and array[i+1][j] may be arbitrarily far apart.
  • Accessing data in a more-or-less sequential fashion, as stored in physical memory, can dramatically speed up your code (sometimes by an order of magnitude, or more)!
  • When modern CPUs load data from main memory into processor cache, they fetch more than a single value. Instead they fetch a block of memory containing the requested data and adjacent data (a cache line). This means after array[i][j] is in the CPU cache, array[i][j+1] has a good chance of already being in cache, whereas array[i+1][j] is likely to still be in main memory.

6. Think about instruction-level-parallelism.
  • Even though many applications still rely on single threaded execution, modern CPUs already have a significant amount of parallelism inside a single core. This means a single CPU might be simultaneously executing 4 floating point multiplies, waiting for 4 memory requests, and performing a comparison for an upcoming branch.
  • To make the most of this parallelism, blocks of code (i.e., between jumps) need to have enough independent instructions to allow the CPU to be fully utilized.
  • Think about unrolling loops to improve this.
  • This is also a good reason to use inline functions.

7. Avoid/reduce the number of local variables.
  • Local variables are normally stored on the stack. However if there are few enough, they can instead be stored in registers. In this case, the function not only gets the benefit of the faster memory access of data stored in registers, but the function avoids the overhead of setting up a stack frame.
  • (Do not, however, switch wholesale to global variables!)
8. Reduce the number of function parameters.
  • For the same reason as reducing local variables – they are also stored on the stack.

9. Pass structures by reference, not by value.
  • I know of no case in a ray tracer where structures should be passed by value (even simple ones like Vectors, Points, and Colors).

10. If you do not need a return value from a function, do not define one.

11. Try to avoid casting where possible.
  • Integer and floating point instructions often operate on different registers, so a cast requires a copy.
  • Shorter integer types (char and short) still require the use of a full-sized register, and they need to be padded to 32/64-bits and then converted back to the smaller size before storing back in memory. (However, this cost must be weighed against the additional memory cost of a larger data type.)

12. Careful when declaring C variables.
  • Use initialization instead of assignment (Color c = black); is faster than Color c; c = black ;).

14. Use shift operations >> and << instead of integer multiplication and division, where possible.

15. Careful using table-lookup functions.
  • Many people encourage using tables of precomputed values for complex functions (e.g., trigonometric functions). For ray tracing, this is often unnecessary. Memory lookups are exceedingly (and increasingly) expensive, and it is often as fast to recompute a trigonometric function as it is to retrieve the value from memory (especially when you consider the trig lookup pollutes the CPU cache).
  • In other instances, lookup tables may be quite useful. For GPU programming, table lookups are often preferred for complex functions.

17. For basic data types, use the operators +, -, *, and / instead of the operators +=, -=, *=, and /=.

21. Avoid dynamic memory allocation during computation.
  • Dynamic memory is great for storing the scene and other data that does not change during computation.
  • However, on many (most) systems dynamic memory allocation requires the use of locks to control an access to the allocator. For multi-threaded applications that use dynamic memory, you may actually get a slowdown by adding additional processors, due to the wait to allocate and free memory!
  • Even for single threaded applications, allocating memory on the heap is more expensive than adding it on the stack. The operating system needs to perform some computation to find a memory block of the requisite size.

22. Find and utilize information about your system's memory cache.
  • If a data structure fits in a single cache line, only a single fetch from main memory is required to process the entire class.
  • Make sure all data structures are aligned to cache line boundaries. (If both your data structure and a cache line are 128 bytes, you will still have poor performance if 1 byte of your structure is in once cache line and the other 127 bytes are in a second cache line).

23. Avoid unnecessary data initialization.
  • If you must initialize a large chunk of memory, consider using memset().

24. Try to early loop termination and early function returns.
  • Consider intersecting a ray and a triangle. The "common case" is that the ray will miss the triangle. Thus, this should be optimized for.
  • If you decide to intersect the ray with the triangle plane, you can immediately return if the t value ray-plane intersection is negative. This allows you to skip the barycentric coordinate computation in roughly half of the ray-triangle intersections. A big win! As soon as you know no intersection occurs, the intersection function should quit.
  • Similarly, some loops can be terminated early. For instance, when shooting shadow rays, the location of the nearest intersection is unnecessary. As soon as any occluding intersection is found, the intersection routine can return.

25. Simplify your equations on paper!
  • In many equations, terms cancel out... either always or in some special cases.
  • The compiler cannot find these simplifications, but you can. Eliminating a few expensive operations inside an inner loop can speed your program more than days working on other parts.

26. The difference between math on integers, fixed points, 32-bit floats, and 64-bit doubles is not as big as you might think.
  • On modern CPUs, floating-point operations have essentially the same throughput as integer operations. In compute-intensive programs like ray tracing, this leads to a negligible difference between integer and floating-point costs. This means, you should not go out of your way to use integer operations.
  • Double precision floating-point operations may not be slower than single precision floats, particularly on 64-bit machines. I have seen ray tracers run faster using all doubles than all floats on the same machine. I have also seen the reverse.

27. Consider ways of rephrasing your math to eliminate expensive operations.
  • sqrt() can often be avoided, especially in comparisons where comparing the value squared gives the same result.
  • If you repeatedly divide by x, consider computing 1/x and multiplying by the result. This used to be a big win for vector normalizations (3 divides), but I've recently found it's now a toss-up. However, it should still be beneficial if you do more than 3 divides.
  • If you perform a loop, make sure computations that do not change between iterations are pulled out of the loop.
  • Consider if you can compute values in a loop incrementally (instead of computing from scratch each iteration).


    .

Monday, November 15, 2010

Socket Programming -01


Socket Programming - Server (Rx)


#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <winsock2.h>
#include <ws2tcpip.h>

#include <stdlib.h>
#include <stdio.h>
#include <conio.h>

// Need to link with Ws2_32.lib, Mswsock.lib, and Advapi32.lib
#pragma comment (lib, "Ws2_32.lib")
#pragma comment (lib, "Mswsock.lib")
#pragma comment (lib, "AdvApi32.lib")

#define DEFAULT_BUFLEN 1452
#define DEFAULT_PORT "27015"

int main (void)
{
WSADATA wsaData;
    FILE *fp;
    SOCKET ListenSocket = INVALID_SOCKET, ClientSocket = INVALID_SOCKET;
    struct addrinfo *result = NULL, hints;
    char recvbuf[DEFAULT_BUFLEN];

    int iResult, iSendResult;
    int recvbuflen = DEFAULT_BUFLEN;

  
fp = fopen("receive.dat","wb");


// Initialize Winsock
iResult = WSAStartup(MAKEWORD(2,2), &wsaData);
if (iResult != 0) {
printf("WSAStartup failed with error: %d\n", iResult);

return 1;
}


ZeroMemory(&hints, sizeof(hints));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;
hints.ai_flags = AI_PASSIVE;


// Resolve the server address and port
iResult = getaddrinfo(NULL, DEFAULT_PORT, &hints, &result);
if ( iResult != 0 ) {
printf("getaddrinfo failed with error: %d\n", iResult);
WSACleanup();

return 1;
}


// Create a SOCKET for connecting to server
ListenSocket = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
if (ListenSocket == INVALID_SOCKET) {
printf("socket failed with error: %ld\n", WSAGetLastError());
freeaddrinfo(result);
WSACleanup();

return 1;
}


// Setup the TCP listening socket
iResult = bind( ListenSocket, result->ai_addr, (int)result->ai_addrlen);
if (iResult == SOCKET_ERROR) {
printf("bind failed with error: %d\n", WSAGetLastError());
closesocket(ListenSocket);
WSACleanup();

return 1;
}


freeaddrinfo(result);
iResult = listen(ListenSocket, SOMAXCONN);
if (iResult == SOCKET_ERROR) {
printf("listen failed with error: %d\n", WSAGetLastError());
closesocket(ListenSocket);
WSACleanup();

return 1;
}


// Accept a client socket
ClientSocket = accept(ListenSocket, NULL, NULL);
if (ClientSocket == INVALID_SOCKET) {
printf("accept failed with error: %d\n", WSAGetLastError());
closesocket(ListenSocket);
WSACleanup();

return 1;
}


// No longer need server socket
closesocket(ListenSocket);


// Receive until the peer shuts down the connection
do {
iResult = recv(ClientSocket, recvbuf, recvbuflen, 0);

if (iResult > 0) {
printf("Bytes received: %d\n", iResult);

// Echo the buffer back to the sender
iSendResult = send( ClientSocket, recvbuf, iResult, 0 );

if (iSendResult == SOCKET_ERROR) {
printf("send failed with error: %d\n", WSAGetLastError());
closesocket(ClientSocket);
WSACleanup();

return 1;
}
printf("Bytes sent: %d\n", iSendResult);
}
else
if (iResult == 0)
printf("Connection closing...\n");
else {
printf("recv failed with error: %d\n", WSAGetLastError());
closesocket(ClientSocket);
WSACleanup();

return 1;
}
} while (iResult > 0);

// shutdown the connection since we're done
iResult = shutdown(ClientSocket, SD_SEND);

if (iResult == SOCKET_ERROR) {
printf("shutdown failed with error: %d\n", WSAGetLastError());
closesocket(ClientSocket);
WSACleanup();

return 1;
}

// cleanup
closesocket(ClientSocket);
WSACleanup();
fclose(fp);
printf("Sucessfully Received Data...\n");
return 0;
}

 
.

Tuesday, November 9, 2010

Matlab - 01


Working with Videos in MATLAB


MATLAB supports only "raw" (uncompressed) avi files on Linux and only some Indeo and Cinepack compressed versions on Windows. But mostly we get videos in MPEG. So we need to convert to raw or any supported format. For this, FFmpeg can be used.
AVI means Audio Video Interleave and is only a container format. AVI does not mean raw video. It is possible to have mpeg or many other compressed avi files. MATLAB cannot read such a compressed avi files. (See http://en.wikipedia.org/wiki/Audio_Video_Interleave).

1. Section: If we have raw video (or supported format)
If we have raw video, then handling in MATLAB is quite easy. We need to use functions mmreader, read, movie, mmfileinfo, frame2im, im2frame, aviread, avifile, aviinfo, addframe (avifile), close (avifile), movie2avi functions. Following code reads and plays back a given video. For further details and other functions see MATLAB help.


%Reads and plays back the movie file xylophone.mpg.
xyloObj = mmreader('xylophone.mpg');
nFrames = xyloObj.NumberOfFrames;
vidHeight = xyloObj.Height;
vidWidth = xyloObj.Width;

% Preallocate movie structure.
mov(1:nFrames) = struct('cdata', zeros(vidHeight, vidWidth, 3, 'uint8'),
'colormap', []);

% Read one frame at a time.
for k = 1 : nFrames
mov(k).cdata = read(xyloObj, k);
end

% Size a figure based on the video's width and height.
hf = figure;
set(hf, 'position', [150 150 vidWidth vidHeight])

% Play back the movie once at the video's frame rate.
movie(hf, mov, 1, xyloObj.FrameRate);

2. Section: If we have not supported video format
We can use any converter tool to convert compressed video to uncompressed avi format. One such tool is FFmpeg (http://ffmpeg.mplayerhq.hu/index.html). Mplayer, VLC and many other players are based on this codec.
If you are a Windows user, just download latest binary from http://ffdshow.faireal.net/mirror/ffmpeg/ (Any virus? I don't know. Download at your risk!). (You can use WinRAR to uncompress .7z - http://www.freedownloadscenter.com/Utilities/Compression_and_Zip_File_Utilities/WinRAR_Download.html )
Use this command to convert any compressed file (compressed.any) to uncompressed avi file (uncompressed.avi) (Note: pthreadGC2.dll file should be in the same directory as ffmpeg.exe).
ffmpeg.exe -i compressed.any -vcodec rawvideo uncompressed.avi
This should work most of the cases. However, sometimes it may not work (because, avi is a container format by Microsoft! :)). The converted video may have only blank frames or MATLAB may not be able to read it properly. In such a case, you can try steps explained in Section 3.


3. Section: If Section 2 does not work
Create a folder by name "images". Use this command to convert each frame of the compressed video to bmp (or ppm, jpg) images using ffmpeg or virtualDub and put it into the folder "images".
ffmpeg.exe -i compressed.any images/image_%d.bmp
Now you can use these images for processing. If you want uncompressed avi file, then use following MATLAB code to combine all raw images into a single uncompressed avi file.

%Script file to combine images to an uncompressed avi file
%Directory that contains images
in_dir = 'D:\temp\ffmpeg.rev11870\images\';
fout = 'D:\out.avi'; %Output file name
num_images = 341; %Number of images

%Set a suitable frame rate fps
aviobj = avifile(fout, 'compression', 'none', 'fps', 25);
for i = 1:num_images;
temp = sprintf('%d', i);
name = [in_dir, 'image_', temp, '.bmp']; %For ppm, change
img = imread(name);
frm = im2frame(img);
aviobj = addframe(aviobj,frm);
i
end;
aviobj = close(aviobj);

4. Useful Functions
mmreader, read, movie, mmfileinfo, frame2im, im2frame, aviread, avifile, aviinfo, addframe (avifile), close (avifile), movie2avi

5. Supported File Formats
Platform
Supported File Formats
WindowsAVI (.avi),
MPEG-1 (
.mpg),
Motion JPEG 2000 (
.mj2),

Windows Media Video (.wmv, .asf, .asx),
and any format supported by Microsoft DirectShow.
MacintoshAVI (.avi),
MPEG-1 (
.mpg),
MPEG-4 (
.mp4, .m4v),
Motion JPEG 2000 (
.mj2),

Apple QuickTime Movie (.mov),
and any format supported by QuickTime as listed on http://www.apple.com/quicktime/player/specs.html.
LinuxMotion JPEG 2000 (.mj2),

Any format supported by your installed plug-ins for GStreamer 0.10 or above, as listed on http://gstreamer.freedesktop.org/documentation/plugins.html, including AVI (.avi) and Ogg Theora (.ogg).
.

Monday, November 8, 2010

OpenCV – 02


Working with Videos in OpenCV


#include <stdio.h>
#include <cv.h>
#include <highgui.h>

int main(void)

{
    /* Create an object that decodes the input video stream. */
    CvCapture *input_video = cvCaptureFromFile("VideoPlay_Demo.avi");
    if (input_video == NULL)
    {
        /* Either the video didn't exist OR codec doesn't support */
        fprintf(stderr, "Error: Can't open video.\n");
        return -1;
    }


    /* Read the video's frame size out of the AVI. */
    CvSize frame_size;
    frame_size.height = (int) cvGetCaptureProperty(input_video, CV_CAP_PROP_FRAME_HEIGHT);
    frame_size.width = (int) cvGetCaptureProperty(input_video, CV_CAP_PROP_FRAME_WIDTH);


    /* Determine the number of frames in the AVI. */
    long number_of_frames;
    number_of_frames = (int) cvGetCaptureProperty(input_video, CV_CAP_PROP_FRAME_COUNT);


    /* Create a windows called "VideoPlay_Demo" for output.
     * window automatically change its size to match the output. */
    cvNamedWindow("VideoPlay_Demo", CV_WINDOW_AUTOSIZE);


    long current_frame = 0;
    while(true)
    {
        static IplImage *frame = NULL;
        
        /* Go to the frame we want */
        cvSetCaptureProperty( input_video, CV_CAP_PROP_POS_FRAMES, current_frame );


        /* Get the next frame of the video */
        frame = cvQueryFrame( input_video );
        if (frame == NULL)
        {
            fprintf(stderr, "Error: Hmm. The end came sooner than we thought.\n");
            return -1;
        }
        
        /* Now display the image */
        cvShowImage("VideoPlay_Demo", frame);
        
        /* And wait for. If the argument is 0 then it waits forever otherwise it waits that number of milliseconds */
        cvWaitKey(60);
        current_frame++;
        if (current_frame < 0)
            current_frame = 0;
        if (current_frame >= number_of_frames - 1)
            current_frame = number_of_frames - 2;
    }
}


.

Monday, October 25, 2010

Audio Codec – MP3 –ID3v2.3

MP3 Codec - ID3 tag version 2.3.0

 

1

ID3v2 Overview

   

The two biggest design goals were to be able to implement ID3v2 without disturbing old software too much and that ID3v2 should be as flexible and expandable as possible.

The first criterion is met by the simple fact that the MPEG decoding software uses a syncsignal, embedded in the audiostream, to 'lock on to' the audio. Since the ID3v2 tag doesn't contain a valid syncsignal, no software will attempt to play the tag. If, for any reason, coincidence make a syncsignal appear within the tag it will be taken care of by the 'unsynchronisation scheme'.

The second criterion has made a more noticeable impact on the design of the ID3v2 tag. It is constructed as a container for several information blocks, called frames, whose format need not be known to the software that encounters them. At the start of every frame there is an identifier that explains the frames' format and content, and a size descriptor that allows software to skip unknown frames.

If a total revision of the ID3v2 tag should be needed, there is a version number and a size descriptor in the ID3v2 header.

The ID3 tag described in this document is mainly targeted at files encoded with MPEG-1/2 layer I, MPEG-1/2 layer II, MPEG-1/2 layer III and MPEG-2.5, but may work with other types of encoded audio.

The bitorder in ID3v2 is most significant bit first (MSB). The byteorder in multibyte numbers is most significant byte first (e.g. $12345678 would be encoded $12 34 56 78).

It is permitted to include padding after all the final frame (at the end of the ID3 tag), making the size of all the frames together smaller than the size given in the head of the tag. A possible purpose of this padding is to allow for adding a few additional frames or enlarge existing frames within the tag without having to rewrite the entire file. The value of the padding bytes must be $00.

 

2

ID3v2 header

   

The ID3v2 tag header, which should be the first information in the file, is 10 bytes as follows:

ID3v2/file identifier

"ID3"

ID3v2 version

$03 00

ID3v2 flags

%abc00000

ID3v2 size

4 * %0xxxxxxx

The first three bytes of the tag are always "ID3" to indicate that this is an ID3v2 tag, directly followed by the two version bytes. The first byte of ID3v2 version is it's major version, while the second byte is its revision number. In this case this is ID3v2.3.0. All revisions are backwards compatible while major versions are not. If software with ID3v2.2.0 and below support should encounter version three or higher it should simply ignore the whole tag. Version and revision will never be $FF.

The version is followed by one the ID3v2 flags field, of which currently only three flags are used.

a

- Unsynchronisation

 

Bit 7 in the 'ID3v2 flags' indicates whether or not unsynchronisation is used; a set bit indicates usage.

b

- Extended header

 

The second bit (bit 6) indicates whether or not the header is followed by an extended header.

c

- Experimental indicator

 

The third bit (bit 5) should be used as an 'experimental indicator'. This flag should always be set when the tag is in an experimental stage.

All the other flags should be cleared. If one of these undefined flags are set that might mean that the tag is not readable for a parser that does not know the flags function.

The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as $00 00 02 01.

The ID3v2 tag size is the size of the complete tag after unsychronisation, including padding, excluding the header but not excluding the extended header (total tag size - 10). Only 28 bits (representing up to 256MB) are used in the size description to avoid the introducuction of 'false syncsignals'.

An ID3v2 tag can be detected with the following pattern:

 

$49 44 33 yy yy xx zz zz zz zz

Where yy is less than $FF, xx is the 'flags' byte and zz is less than $80.

 

3

ID3v2 extended header

   

The extended header contains information that is not vital to the correct parsing of the tag information; hence the extended header is optional.

Extended header size

$xx xx xx xx

Extended Flags

$xx xx

Size of padding

$xx xx xx xx

Where the 'Extended header size', currently 6 or 10 bytes, excludes itself. The 'Size of padding' is simply the total tag size excluding the frames and the headers, in other words the padding. The extended header is considered separate from the header proper, and as such is subject to unsynchronisation.

The extended flags are a secondary flag set which describes further attributes of the tag. These attributes are currently defined as follows:

 

%x0000000 00000000

 

x

- CRC data present

  

If this flag is set four bytes of CRC-32 data is appended to the extended header. The CRC should be calculated before unsynchronisation on the data between the extended header and the padding, i.e. the frames and only the frames.

  

Total frame CRC $xx xx xx xx

 

4

ID3v2 frame overview

   

As the tag consists of a tag header and a tag body with one or more frames, all the frames consists of a frame header followed by one or more fields containing the actual information. The layout of the frame header:

 

Frame ID

$xx xx xx xx (four characters)

 

Size

$xx xx xx xx

 

Flags

$xx xx

The frame ID made out of the characters capital A-Z and 0-9. Identifiers beginning with "X", "Y" and "Z" are for experimental use and free for everyone to use, without the need to set the experimental bit in the tag header. Have in mind that someone else might have used the same identifier as you. All other identifiers are either used or reserved for future use.

The frame ID is followed by a size descriptor, making a total header size of ten bytes in every frame. The size is calculated as frame size excluding frame header (frame size - 10).

In the frame header the size descriptor is followed by two flags bytes.

There is no fixed order of the frames' appearance in the tag, although it is desired that the frames are arranged in order of significance concerning the recognition of the file. An example of such order: UFID, TIT2, MCDI, TRCK ...

A tag must contain at least one frame. A frame must be at least 1 byte big, excluding the header.

If nothing else is said a string is represented as ISO-8859-1 characters in the range $20 - $FF. Such strings are represented as <text string>, or <full text string> if newlines are allowed, in the frame descriptions. All Unicode strings use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE FF) to identify the byte order.

All numeric strings and URLs are always encoded as ISO-8859-1. Terminated strings are terminated with $00 if encoded with ISO-8859-1 and $00 00 if encoded as unicode. If nothing else is said newline character is forbidden. In ISO-8859-1 a new line is represented, when allowed, with $0A only. Frames that allow different types of text encoding have a text encoding description byte directly after the frame size. If ISO-8859-1 is used this byte should be $00, if Unicode is used it should be $01. Strings dependent on encoding is represented as <text string according to encoding>, or <full text string according to encoding> if newlines are allowed. Any empty Unicode strings which are NULL-terminated may have the Unicode BOM followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

The three byte language field is used to describe the language of the frame's content, according to ISO-639-2.

All URLs may be relative, e.g. "picture.png", "../doc.txt".

If a frame is longer than it should be, e.g. having more fields than specified in this

document, that indicates that additions to the frame have been made in a later version of the ID3v2 standard. This is reflected by the revision number in the header of the tag.

  

4.1

Frame header flags

    

In the frame header the size descriptor is followed by two flags bytes. All unused flags must be cleared. The first byte is for 'status messages' and the second byte is for encoding purposes. If an unknown flag is set in the first byte the frame may not be changed without the bit cleared. If an unknown flag is set in the second byte it is likely to not be readable. The flags field is defined as follows.

 

%abc00000 %ijk00000

 

a

- Tag alter preservation

  

This flag tells the software what to do with this frame if it is unknown and the tag is altered in any way. This applies to all kinds of alterations, including adding more padding and reordering the frames.

   

0

- Frame should be preserved.

   

1

- Frame should be discarded.

 

b

- File alter preservation

  

This flag tells the software what to do with this frame if it is unknown and the file, excluding the tag, is altered. This does not apply when the audio is completely replaced with other audio data.

   

0

- Frame should be preserved.

   

1

- Frame should be discarded.

 

c

- Read only

  

This flag, if set, tells the software that the contents of this frame is intended to be read only. Changing the contents might break something, e.g. a signature. If the contents are changed, without knowledge in why the frame was flagged read only and without taking the proper means to compensate, e.g. recalculating the signature, the bit should be cleared.

 

i

- Compression

  

This flag indicates whether or not the frame is compressed.

   

0

Frame is not compressed.

   

1

Frame is compressed using [#ZLIB zlib] with 4 bytes for 'decompressed size' appended to the frame header.

 

j

- Encryption

  

This flag indicates wether or not the frame is enrypted. If set one byte indicating with which method it was encrypted will be appended to the frame header.

   

0

- Frame is not encrypted.

   

1

- Frame is encrypted.

 

k

- Grouping identity

  

This flag indicates whether or not this frame belongs in a group with other frames. If set a group identifier byte is added to the frame header. Every frame with the same group identifier belongs to the same group.

   

0

- Frame does not contain group information

   

1

- Frame contains group information

Some flags indicates that the frame header is extended with additional information. This information will be added to the frame header in the same order as the flags indicating the additions. I.e. the four bytes of decompressed size will preceed the encryption method byte. These additions to the frame header, while not included in the frame header size but are included in the 'frame size' field, are not subject to encryption or compression.

 

5

Default flags

   

The default settings for the frames described in this document can be divided into the following classes. The flags may be set differently if found more suitable by the software.

 

Discarded if tag is altered, discarded if file is altered.

  

None.

 

Discarded if tag is altered, preserved if file is altered.

  

None.

 

Preserved if tag is altered, discarded if file is altered.

  

AENC, ETCO, EQUA, MLLT, POSS, SYLT, SYTC, RVAD, TENC, TLEN, TSIZ

 

Preserved if tag is altered, preserved if file is altered.

  

The rest of the frames.

 

6

Declared ID3v2 frames

   

The following frames are declared in this draft.

 

AENC

Audio encryption

 

APIC

Attached picture

 

COMM

Comments

 

COMR

Commercial frame

 

ENCR

Encryption method registration

 

EQUA

Equalization

 

ETCO

Event timing codes

 

GEOB

General encapsulated object

 

GRID

Group identification registration

 

IPLS

Involved people list

 

LINK

Linked information

 

MCDI

Music CD identifier

 

MLLT

MPEG location lookup table

 

OWNE

Ownership frame

 

PRIV

Private frame

 

PCNT

Play counter

 

POPM

Popularimeter

 

POSS

Position synchronisation frame

 

RBUF

Recommended buffer size

 

RVAD

Relative volume adjustment

 

RVRB

Reverb

 

SYLT

Synchronized lyric/text

 

SYTC

Synchronized tempo codes

 

TALB

Album/Movie/Show title

 

TBPM

BPM (beats per minute)

 

TCOM

Composer

 

TCON

Content type

 

TCOP

Copyright message

 

TDAT

Date

 

TDLY

Playlist delay

 

TENC

Encoded by

 

TEXT

Lyricist/Text writer

 

TFLT

File type

 

TIME

Time

 

TIT1

Content group description

 

TIT2

Title/songname/content description

 

TIT3

Subtitle/Description refinement

 

TKEY

Initial key

 

TLAN

Language(s)

 

TLEN

Length

 

TMED

Media type

 

TOAL

Original album/movie/show title

 

TOFN

Original filename

 

TOLY

Original lyricist(s)/text writer(s)

 

TOPE

Original artist(s)/performer(s)

 

TORY

Original release year

 

TOWN

File owner/licensee

 

TPE1

Lead performer(s)/Soloist(s)

 

TPE2

Band/orchestra/accompaniment

 

TPE3

Conductor/performer refinement

 

TPE4

Interpreted, remixed, or otherwise modified by

 

TPOS

Part of a set

 

TPUB

Publisher

 

TRCK

Track number/Position in set

 

TRDA

Recording dates

 

TRSN

Internet radio station name

 

TRSO

Internet radio station owner

 

TSIZ

Size

 

TSRC

ISRC (international standard recording code)

 

TSSE

Software/Hardware and settings used for encoding

 

TYER

Year

 

TXXX

User defined text information frame

 

UFID

Unique file identifier

 

USER

Terms of use

 

USLT

Unsychronized lyric/text transcription

 

WCOM

Commercial information

 

WCOP

Copyright/Legal information

 

WOAF

Official audio file webpage

 

WOAR

Official artist/performer webpage

 

WOAS

Official audio source webpage

 

WORS

Official internet radio station homepage

 

WPAY

Payment

 

WPUB

Publishers official webpage

 

WXXX

User defined URL link frame

 

7

The unsynchronisation scheme

   

The only purpose of the 'unsynchronisation scheme' is to make the ID3v2 tag as compatible as possible with existing software. There is no use in 'unsynchronising' tags if the file is only to be processed by new software. Unsynchronisation may only be made with MPEG 2 layer I, II and III and MPEG 2.5 files.

Whenever a false synchronisation is found within the tag, one zeroed byte is inserted after the first false synchronisation byte. The format of a correct sync that should be altered by ID3 encoders is as follows:

 

%11111111 111xxxxx

And should be replaced with:

 

%11111111 00000000 111xxxxx

This has the side effect that all $FF 00 combinations have to be altered, so they won't be affected by the decoding process. Therefore all the $FF 00 combinations have to be replaced with the $FF 00 00 combination during the unsynchronisation.

To indicate usage of the unsynchronisation, the first bit in 'ID3 flags' should be set. This bit should only be set if the tag contains a, now corrected, false synchronisation. The bit should only be clear if the tag does not contain any false synchronisations.

Do bear in mind, that if a compression scheme is used by the encoder, the unsynchronisation scheme should be applied *afterwards*. When decoding a compressed, 'unsynchronised' file, the 'unsynchronization scheme' should be parsed first, decompression afterwards.

If the last byte in the tag is $FF, and there is a need to eliminate false synchronizations in the tag, at least one byte of padding should be added.


 

In the next blog let us see the encoding method.

.