GdPicture.NET.14
GdPicture14 Namespace / GdPicturePDF Class / GetPageTextWithCoords Method / GetPageTextWithCoords(String) Method
The string that is used to delimit the above enumerated fields in the resulting text.
Example





In This Topic
GetPageTextWithCoords(String) Method
In This Topic
Returns the whole text, regardless if visible or hidden, of the current page of the loaded PDF document including the text properties such as the bounding box coordinates, the font information, the text mode and the text size. The extracted text from the current page is divided by words. Each word together with its text and font properties is recorded in one separated line. The space character (between the words in text) is also considered as a word. Two or more spaces in a row are considered as one word. The resulting string for one word is formatted this way:

the horizontal (X) coordinate of the top left point of the rendering area + [FieldSeparator] +

the vertical (Y) coordinate of the top left point of the rendering area + [FieldSeparator] +

the horizontal (X) coordinate of the top right point of the rendering area + [FieldSeparator] +

the vertical (Y) coordinate of the top right point of the rendering area + [FieldSeparator] +

the horizontal (X) coordinate of the bottom right point of the rendering area + [FieldSeparator] +

the vertical (Y) coordinate of the bottom right point of the rendering area + [FieldSeparator] +

the horizontal (X) coordinate of the bottom left point of the rendering area + [FieldSeparator] +

the vertical (Y) coordinate of the bottom left point of the rendering area + [FieldSeparator] +

extracted word + [FieldSeparator] +

font name + [FieldSeparator] +

font box height + [FieldSeparator] +

text mode + [FieldSeparator] +

text size + EOL

The rendering area means the rectangle area on the page where the extracted word is really situated (rendered). You can use the provided coordinates to easily calculate the dimensions of this area and the text rotation angle, for more details please refer to the second example below. You can also benefit from using the GuessPageTextRotation method if the presented text is rotated in various angles on the current page.

The result for the current page should contain exactly the same number of lines as is the count of all words including the space-words in the text on that page.

Syntax
'Declaration
 
Public Overloads Function GetPageTextWithCoords( _
   ByVal FieldSeparator As String _
) As String
public string GetPageTextWithCoords( 
   string FieldSeparator
)
public function GetPageTextWithCoords( 
    FieldSeparator: String
): String; 
public function GetPageTextWithCoords( 
   FieldSeparator : String
) : String;
public: string* GetPageTextWithCoords( 
   string* FieldSeparator
) 
public:
String^ GetPageTextWithCoords( 
   String^ FieldSeparator
) 

Parameters

FieldSeparator
The string that is used to delimit the above enumerated fields in the resulting text.

Return Value

The whole page text divided by one word per text line including the text coordinates and its properties in the above described format. The GetStat method can be subsequently used to determine if this method has been successful.
Remarks
This method is only allowed for use with non-encrypted documents.

It is recommend to use the GetStat method to identify the specific reason for the method's failure, if any.

All returned coordinates are given in points, 1 point = 1/72 inch.

The font box height specifies the height of the font bounding box, expressed in the glyph coordinate system. The font bounding box is the smallest rectangle enclosing the shape that would result if all the glyphs of the font were placed with their origins coincident and then filled. Be aware that this value is extracted directly from the font program. If the font program provides incorrect data or if it does not contain any such data, this value can be misleading.

The returned text mode is a member of the PdfTextMode enumeration.

Please consider also using the GuessPageTextRotation method to determine the possible text rotation on the current page.

Example
The first example shows you how to extract the whole text of the PDF document with its coordinates and other properties to a text file. The second example demonstrates how you can use the provided coordinates to calculate dimensions of rendering areas and possible text rotation angles.
How to extract the whole text of the PDF document with its coordinates and other properties to a text file. Resulting strings for the individual pagesare separated with the text that includes the page number.
Dim caption As String = "Example: GetPageTextWithCoords"
Dim gdpicturePDF As New GdPicturePDF()
Dim status As GdPictureStatus = gdpicturePDF.LoadFromFile("test.pdf", False)
If status = GdPictureStatus.OK Then
    Dim text_file As New System.IO.StreamWriter("text_with_coord.txt")
    Dim pageCount As Integer = gdpicturePDF.GetPageCount()
    status = gdpicturePDF.GetStat()
    If status = GdPictureStatus.OK Then
        Dim text As String = ""
        Dim message As String = Nothing
        For i As Integer = 1 To pageCount
            status = gdpicturePDF.SelectPage(i)
            If status = GdPictureStatus.OK Then
                message = "Page: " + i.ToString() + " Status: " + status.ToString()
                text_file.WriteLine(message)
                 'You can use your own separator here.
                text = gdpicturePDF.GetPageTextWithCoords("---")
                status = gdpicturePDF.GetStat()
                If status = GdPictureStatus.OK Then
                    text_file.WriteLine(text)
                Else
                    MessageBox.Show("The GetPageTextWithCoords() method has failed with the status: " + status.ToString(), caption)
                End If
            Else
                MessageBox.Show("The SelectPage() method has failed with the status: " + status.ToString(), caption)
            End If
        Next
    Else
        MessageBox.Show("The GetPageCount() method has failed with the status: " + status.ToString(), caption)
    End If
    text_file.Close()
Else
    MessageBox.Show("The file can't be loaded.", caption)
End If
MessageBox.Show("Searching finished.", caption)
gdpicturePDF.Dispose()
string caption = "Example: GetPageTextWithCoords";
GdPicturePDF gdpicturePDF = new GdPicturePDF();
GdPictureStatus status = gdpicturePDF.LoadFromFile("test.pdf", false);
if (status == GdPictureStatus.OK)
{
    System.IO.StreamWriter text_file = new System.IO.StreamWriter("text_with_coord.txt");
    int pageCount = gdpicturePDF.GetPageCount();
    status = gdpicturePDF.GetStat();
    if (status == GdPictureStatus.OK)
    {
        string text = "";
        string message = null;
        for (int i = 1; i <= pageCount; i++)
        {
            status = gdpicturePDF.SelectPage(i);
            if (status == GdPictureStatus.OK)
            {
                message = "Page: " + i.ToString() + " Status: " + status.ToString();
                text_file.WriteLine(message);
                //You can use your own separator here.
                text = gdpicturePDF.GetPageTextWithCoords("---");
                status = gdpicturePDF.GetStat();
                if (status == GdPictureStatus.OK)
                {
                    text_file.WriteLine(text);
                }
                else
                {
                    MessageBox.Show("The GetPageTextWithCoords() method has failed with the status: " + status.ToString(), caption);
                }
            }
            else
            {
                MessageBox.Show("The SelectPage() method has failed with the status: " + status.ToString(), caption);
            }
        }
    }
    else
    {
        MessageBox.Show("The GetPageCount() method has failed with the status: " + status.ToString(), caption);
    }
    text_file.Close();
}
else
{
    MessageBox.Show("The file can't be loaded.", caption);
}
MessageBox.Show("Searching finished.", caption);
gdpicturePDF.Dispose();
How to calculate the dimensions of the rendering area for the first word and how to find out the angle, if the first word is rotated.
Dim caption As String = "Example: GetPageTextWithCoords"
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    If gdpicturePDF.LoadFromFile("test.pdf", False) = GdPictureStatus.OK Then
        gdpicturePDF.SelectPage(1)
        Dim text As String = gdpicturePDF.GetPageTextWithCoords("~")
        If gdpicturePDF.GetStat() = GdPictureStatus.OK Then
            Dim coord As String() = text.Split("~")
            
            'Considering only the first word as an example. Let assume the text is rotated.
            
            'Calculating the vector to determine the height of the rendering area for the first word.
            Dim vectorXH As Double = Double.Parse(coord(0)) - Double.Parse(coord(6))
            Dim vectorYH As Double = Double.Parse(coord(7)) - Double.Parse(coord(1))
            'Calculating the height of the area.
            Dim areaHeight As Double = Math.Sqrt(vectorXH * vectorXH + vectorYH * vectorYH)
            
            'Calculating the vector to determine the width of the rendering area for the first word.
            Dim vectorXW As Double = Double.Parse(coord(6)) - Double.Parse(coord(4))
            Dim vectorYW As Double = Double.Parse(coord(7)) - Double.Parse(coord(5))
            'Calculating the width of the area.
            Dim areaWidth As Double = Math.Sqrt(vectorXW * vectorXW + vectorYW * vectorYW)
            
            'Calculating the text rotation angle.
            Dim angle As Double = Math.Atan2(vectorXH, vectorYH) * (180 / Math.PI)
            'Be aware that the resulting angle is relative to the chosen base axis.
            
            'Continue...
        Else
            MessageBox.Show("The GetPageTextWithCoords() method has failed with the status: " + gdpicturePDF.GetStat().ToString(), caption)
        End If
    Else
        MessageBox.Show("The file can't be loaded.", caption)
    End If
End Using
string caption = "Example: GetPageTextWithCoords";
using (GdPicturePDF gdpicturePDF = new GdPicturePDF())
{
    if (gdpicturePDF.LoadFromFile("test.pdf", false) == GdPictureStatus.OK)
    {
        gdpicturePDF.SelectPage(1);
        string text = gdpicturePDF.GetPageTextWithCoords("~");
        if (gdpicturePDF.GetStat() == GdPictureStatus.OK)
        {
            string[] coord = text.Split('~');
            
            //Considering only the first word as an example. Let assume the text is rotated.
            
            //Calculating the vector to determine the height of the rendering area for the first word.
            double vectorXH = double.Parse(coord[0]) - double.Parse(coord[6]);
            double vectorYH = double.Parse(coord[7]) - double.Parse(coord[1]);
            //Calculating the height of the area.
            double boxHeight = Math.Sqrt(vectorXH * vectorXH + vectorYH * vectorYH);
            
            //Calculating the vector to determine the width of the rendering area for the first word.
            double vectorXW = double.Parse(coord[6]) - double.Parse(coord[4]);
            double vectorYW = double.Parse(coord[7]) - double.Parse(coord[5]);
            //Calculating the width of the area.
            double boxWidth = Math.Sqrt(vectorXW * vectorXW + vectorYW * vectorYW);
            
            //Calculating the text rotation angle.
            double angle = Math.Atan2(vectorXH, vectorYH) * (180 / Math.PI);
            //Be aware that the resulting angle is relative to the chosen base axis.
            
            //Continue...
        }
        else
        {
            MessageBox.Show("The GetPageTextWithCoords() method has failed with the status: " + gdpicturePDF.GetStat().ToString(), caption);
        }
    }
    else
    {
        MessageBox.Show("The file can't be loaded.", caption);
    }
}
See Also