Activity for Tess4J

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    Yes, it can be. https://github.com/java-native-access/jna/blob/master/www/DirectMapping.md

  • George George posted a comment on discussion Open Discussion

    thanks, I will try it out. Is direct mapping more efficient than using the interface ?

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    Leptonica library has many dependencies to open various image file types, such as TIFF, PNG, JPEG, etc., which in turn have other dependencies, as you've seen. On Windows, we were able to embed all the image library dependencies inside libleptonica.dll. We don't know how to generate a similar static library liblept.so on Linux. Installing Tesseract would ensure installing of all the required dependency libraries.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    Please use the JNA Direct Mapping API — Leptonica1. https://tess4j.sourceforge.net/docs/index.html

  • George George modified a comment on discussion Open Discussion

    I encountered ** jdk.internal.org.objectweb.asm.MethodTooLargeException** when I tried to load lept4j 1.16.1 using OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS) I assume this is caused by ASM library used by JDK during invocation of Native.loadLibrary() when the byte code size exceeds the JDK's method size limit (64KB), . A similar issue reported here https://bugs.openjdk.org/browse/JDK-8314528 Is there any work around to load lept4j 1.16.1 on JDK 21 without requiring to create...

  • George George modified a comment on discussion Open Discussion

    I encountered ** jdk.internal.org.objectweb.asm.MethodTooLargeException** when I tried to load lept4j 1.16.1 using OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS) I assume this is caused by ASM library used by JDK during invocation of Native.loadLibrary() when the byte code size exceeds the JDK's method size limit (64KB), . A similar issue reported here https://bugs.openjdk.org/browse/JDK-8314528 Is there any work around to load lept4j 1.16.1 on JDK 21 without requiring to create...

  • George George modified a comment on discussion Open Discussion

    I encountered ** jdk.internal.org.objectweb.asm.MethodTooLargeException** when I tried to load lept4j 1.16.1 using OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS) I assume this is caused by ASM library used by JDK during invocation of Native.loadLibrary() when the byte code size exceeds the JDK's method size limit (64KB), . A similar issue reported here https://bugs.openjdk.org/browse/JDK-8314528 Is there any work around to load lept4j 1.16.1 on JDK 21 without requiring to create...

  • George George modified a comment on discussion Open Discussion

    I encountered ** jdk.internal.org.objectweb.asm.MethodTooLargeException** when I tried to load lept4j 1.16.1 using OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS) I assume this is caused by ASM library used by JDK during invocation of Native.loadLibrary() , and appears to be similar to the issue reported here https://bugs.openjdk.org/browse/JDK-8314528 Is there any work around to load lept4j 1.16.1 on JDK 21 ? On testing tess4j 5.0.0 seems to load without issue on JDK21

  • George George posted a comment on discussion Open Discussion

    I encountered ** jdk.internal.org.objectweb.asm.MethodTooLargeException** when I tried to load tess4j 5.0.0 and lept4j 1.16.1 using OpenJDK Runtime Environment Corretto-21.0.6.7.1 (build 21.0.6+7-LTS) I assume this is caused by ASM library used by JDK during invocation of Native.loadLibrary() , and appears to be similar to the issue reported here https://bugs.openjdk.org/browse/JDK-8314528 Is there any work around to load tess4j 5.0.0 and lept4j 1.16.1 on JDK 21 ?

  • Srinivas Arava Srinivas Arava posted a comment on discussion Open Discussion

    I was able to run the tess4j in a windows machine without actually installing the software. It is picking up the required dll's from the jars or path. I am not able to do the same on linux. I tried copying the .so files one by one until I hit blocker. java,lang.UnsatisfiedLinkError: /lib64/libm.so.6: version 'GLIBC_2.29' not found (required by libpng15.so.15) My goal is to be able to run the tesst4j with expliciltly installing tesseract but by simply packaging the so files. Can someone please guide...

  • Jian Wang Jian Wang modified a comment on discussion Open Discussion

    I have solved my question.

  • Jian Wang Jian Wang modified a comment on discussion Open Discussion

  • Jian Wang Jian Wang posted a comment on discussion Open Discussion

    Hello, Is Tess4J an open-source project? Where is the source code please? Thank you.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    For Tesseract non-Windows binary, you'll have to install or compile it yourself. https://tesseract-ocr.github.io/tessdoc/#compiling-and-installation

  • Angelo Schneider Angelo Schneider posted a comment on discussion Open Discussion

    Hello, for Macs the binay of the lib is missing: darwin/libtesseract.dylib Best regards Angelo

  • Anonymous posted a comment on ticket #19

    Thanks for your support. I ended up not using the path returned by the method. I let tess4j do its thing and that works fine. If I ever end up needing the path, I'll ensure that my registry value works or that I do it another way

  • Quan Nguyen Quan Nguyen modified a comment on ticket #19

    On my Win11 machine, java.io.tmpdir is resolved to C:\Users\<username>\AppData\Local\Temp\tess4j. You might have correctly assessed, this seems to be due to a legacy dos setting in windows on your machine. You might try setting the Windows Registry value as suggested in the second article you mentioned.

  • Quan Nguyen Quan Nguyen posted a comment on ticket #19

    On my Win11 machine, java.io.tmpdir is resolved to C:\Users\<username>\AppData\Local\Temp\tess4j. You might have correctly assessed, this seems to be due to a legacy dos setting in windows on your machine.

  • Anonymous posted a comment on ticket #19

    I'm using Win11 Pro (64 Bit).

  • Quan Nguyen Quan Nguyen modified ticket #18

    Tesseract upgrade missing text when extracting

  • Quan Nguyen Quan Nguyen posted a comment on ticket #19

    I remember the 8.3 filename limitation in old DOS or Windows 95 era, but all modern OSes should be able to handle the long filenames. Which Windows version are you seeing the issue in?

  • Anonymous created ticket #19

    LoadLibs.extractTessResources() returns wrong dos style filenames

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    @Praveen Anand Please use the Lept4J version compatible with your Leptonica installation.

  • Praveen Anand Praveen Anand posted a comment on discussion Open Discussion

    @ShawnChen Did this issue got resolved ? Im facing the exact same error

  • Synergi Synergi posted a comment on discussion Open Discussion

    Fixed it. It was an issue with the JNA dependency. Had JNA loaded in another linked project. As a result it was suing the older version vs this one below. <dependency> <groupId>net.java.dev.jna</groupId> <artifactId>jna</artifactId> <version>5.12.1</version> </dependency>

  • Synergi Synergi modified a comment on discussion Open Discussion

    So as it appears to me.... LoadLibs wants to copy the contents from a folder named linux-x86-64 in the jar file into /tmp/tess4j/linux-x86-64. The issue I see is the folder linux-x86-64 doesn't appear to exist in the jar file (tess4j-5.5.0.jar). Now as its a Linux system, I am guessing it doesn't need this tmp folder... but regardless of this the code seems to crash. FYI it seems to execute a similar process with Lept4J and copies over a dll from a windows directory in the jar file. I don't think...

  • Synergi Synergi modified a comment on discussion Open Discussion

    So as it appears to me.... LoadLibs wants to copy the contents from a folder named linux-x86-64 in the jar file into /tmp/tess4j/linux-x86-64. The issue I see is the folder linux-x86-64 doesn't appear to exist in the jar file (tess4j-5.5.0.jar). Now as its a Linux system, I am guessing it doesn't need this tmp folder... but the code seems to crash. FYI it seems to execute a similar process with Lept4J and copies over a dll form a windows directory in the jar file. I dont think its used, but it allows...

  • Synergi Synergi posted a comment on discussion Open Discussion

    So as it appears to me.... LoadLibs wants to copy the contents from a folder named linux-x86-64 in the jar file into /tmp/tess4j/linux-x86-64. The issue I see is the folder linux-x86-64 doesn't appear to exist in the jar file (tess4j-5.5.0.jar). Now as its a Linux system, I am guessing it doesn't need this tmp folder... but the code seems to crash. FYI it seems to execute a similar process with Lept4J and copies over a dll form a windows directory in the jar file. I dont think its used, but it allows...

  • Synergi Synergi modified a comment on discussion Open Discussion

    I am using tess4j v 5.5.0 (which is supposed to work with Tesseract 5.0.3) via Maven in Java on Linux Ubuntu 20.04.3 LTS (Focal Fossa). The application I am using worked previously using Tess4J with Tesseract 4.1.1. I keep getting errors now when I run the following code :- TessAPI.TessBaseAPI handle = TessAPI.INSTANCE.TessBaseAPICreate(); This always worked in the past but now I get the following error :- Exception in thread "pool-23-thread-1" java.lang.NoClassDefFoundError: Could not initialize...

  • Synergi Synergi modified a comment on discussion Open Discussion

    I am using tess4j v 5.5.0 (which is supposed to work with Tesseract 5.0.3) via Maven in Java on Linux Ubuntu 20.04.3 LTS (Focal Fossa). The application I am using worked previously using Tess4J with Tesseract 4.1.1. I keep getting errors now when I run the following code :- TessAPI.TessBaseAPI handle = TessAPI.INSTANCE.TessBaseAPICreate(); This always worked in the past but now I get the following error :- Exception in thread "pool-23-thread-1" java.lang.NoClassDefFoundError: Could not initialize...

  • Synergi Synergi posted a comment on discussion Open Discussion

    I am using tess4j v 5.5.0 (which is supposed to work with Tesseract 5.0.3) via Maven in Java on Linux Ubuntu 20.04.3 LTS (Focal Fossa). The application I am using worked previously using Tess4J with Tesseract 4.1.1. I keep getting errors now when I run the following code :- TessAPI.TessBaseAPI handle = TessAPI.INSTANCE.TessBaseAPICreate(); This always worked in the past but now I get the following error :- Exception in thread "pool-23-thread-1" java.lang.NoClassDefFoundError: Could not initialize...

  • giuseppe coniglio giuseppe coniglio posted a comment on discussion Help

    Hi to all, I have implemented a Spring boot microservice which use tess4j 4.3.1 and pdfbox 2.0.22 in my server Oracle Linux Server , example code https://colwil.com/how-to-extract-text-from-a-scanned-pdf-using-ocr-in-java/ When I execute code with my IDE on windows pc and invoke local service, time execution is fast : "Tesseract.doOcr" 8 seconds, so when I execute api to invoke microservice's code method "Tesseract.doOcr" is slow 40-50 seconds, parameter pdf file is the same Any idea? Thanks :-)

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    If it was properly installed after built, a libtesseract.dylib symbolic link would be created. If not, you can manually create it. This link is what JNA is looking to load the native library.

  • Tevž Selčan Tevž Selčan modified a comment on discussion Help

    Does this apply to Mac M1? I compiled tesseract like here (https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos) and downloaded Tess4J, but I cant find the libtesseract.dylib file in any of them?

  • Tevž Selčan Tevž Selčan posted a comment on discussion Help

    Does this apply to Mac M1? I once compiled tesseract like here (https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos) and downloaded Tess4J, but I cant find the libtesseract.dylib file in any of them?

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    You mean separate physical copies of the training data files? I've seen instances of Tesseract running in multithreaded applications using the same set of training data files.

  • George George posted a comment on discussion Open Discussion

    Is it necessary to have separate copies of tesseract training data when running multiple instances of Tess4j in a separate JVMs.

  • Quan Nguyen Quan Nguyen modified a comment on discussion Help

    No need to modify the .jar file. Just need to set jna.library.path property to the location of libtesseract.dylib file during launch. https://tess4j.sourceforge.net/tutorial/

  • Quan Nguyen Quan Nguyen modified a comment on discussion Help

    No need to modify the .jar file. Just need to set jna.library.path property to the location of libtesseract.dylib file during launch.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    No need to modify the .jar file. Just need to set jna.library.path property to the location of libtesseract.dylib file during launch

  • Tevž Selčan Tevž Selčan posted a comment on discussion Help

    Issue solved, https://stackoverflow.com/questions/21394537/tess4j-unsatisfied-link-error-on-mac-os-x

  • Tevž Selčan Tevž Selčan posted a comment on discussion Help

    Hello, I have a problem while trying to use Tess4J with Maven. I get this error : Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: /Users/tevzselcan/Library/Caches/JNA/temp/jna1926430164363992306.tmp at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2393) at java.base/java.lang.Runtime.load0(Runtime.java:755) at java.base/java.lang.System.load(System.java:1953) at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:1018) at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:988)...

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    The bug was fixed in tess4j-5.4.0 release.

  • Quan Nguyen Quan Nguyen modified a comment on discussion Open Discussion

    The latest source is being hosted at https://github.com/nguyenq/tess4j .

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    The latest source is being hosted at https://github.com/nguyenq/tess4j.

  • Ralph Cook Ralph Cook posted a comment on discussion Open Discussion

    I want to put the source for Tess4J into eclipse so I can debug a problem I'm having. The current version of the library appears to be 5.4.0; if I put a dependency for net.sourceforge.tess4j:tess4j:5.4.0 in a Maven pom.xml file and update the project, I get a tess4j-5.4.0.jar. I cannot find source labelled for that version -- the latest I can find after rooting around on tess4j.sourceforge.net is labelled 3.4.8; the sources themselves do not have version numbers in them, so I cannot tell whether...

  • L Evans L Evans posted a comment on discussion Open Discussion

    The output OCR documents look good. So, the 1 word count is really misleading. We have conditional logic that follows the createDocumentsWithResults() call that relies on the size of the Words list in the OCRResult.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    What about the output documents (files) themselves? Can you put in a new issue at https://github.com/nguyenq/tess4j/issues ? Thanks.

  • L Evans L Evans posted a comment on discussion Open Discussion

    We've encountered a bug when calling createDocumentsWithResults() from Tesseract/tess4j 4.5.5. The Tiff scanned by the method call, has 32 pages, and ~3100 words. Yet, the result produced by the Java call only contains the result of the last page scanned. The OCRResult, in Java, is an empty string in this bounding box: [ [Confidence: 95.000000 Bounding box: 313 434 938 822]], which is the same result when scanning the last page of the Tiff file. Can the Tess4j team investigate this bug ?

  • Xunnozza Vlinx Xenx Xunnozza Vlinx Xenx posted a comment on discussion Open Discussion

    Hello, Can please reduce the unnecessary dependence jai-imageio-core:1.4.0 ? The last update was also over 4 years ago. Also, I see (sorry if I missed something) that this library is only used for TIFF Meta and this is also possible with the Java 11 api. Therefore I recommend to remove this dependency and use the new Java API.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    https://github.com/nguyenq/tess4j/issues/230

  • Peter Kronenberg Peter Kronenberg posted a comment on discussion Open Discussion

    I'm seeing an error in the ImageDeskew routine. The below sample code shows a rotation of -6.8 (the unredacted version shows -10) on the attached file even though it should be 0. Any idea why it’s not calculating correctly? It seems to happen on somewhat sparse images like this, which understably makes it harder to figure out the orientation. I'm wondering if anything can be done to make it more accurate public class GetAngle { private static double getAngle(Path sourceFile) throws IOException {...

  • Moritz Weibold Moritz Weibold posted a comment on discussion Help

    Hey there, I am using Tess4J to extract the sum of a bill. My Maven Quarkus Server works perfektly fine on localhost in IntelliJ. After running the following command, I always pushed the target/quarkus-app/ folder onto my oracle vm. mvn clean build And as soon as the folder is uploaded, I run: java -jar server/quarkus-run.jar & The issue is, that on my oracle vm the server suddenly stop at the tesseract.doOCR(tempFile) function. There is no error or any hint on why it is not working. The server also...

  • Quan Nguyen Quan Nguyen posted a comment on ticket #18

    You may want to put in a ticket at https://github.com/tesseract-ocr/tesseract/issues site. Thanks.

  • Anonymous posted a comment on ticket #17

    Thanks a lot

  • Anonymous created ticket #18

    Tesseract upgrade missing text when extracting

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    JNA is looking for a libtesseract.dylib to load. Do you have it in system path? Several developers were able to use the library on MacOs. Please search through the forum posts.

  • Ben Ben posted a comment on discussion Help

    Hi, I have tried to get it to work so many times but it still is not working. I added the dependency to my maven and then wrote the code following instructions. I'm not sure why it is not working. Could someone help? Thanks!

  • Quan Nguyen Quan Nguyen posted a comment on ticket #17

    Yes, tess4j-4.6.1.

  • Anonymous posted a comment on ticket #17

    Do we have this fixed for Tess4J that will work with Tesseract 4.1.1?

  • Anantha Anantha posted a comment on ticket #17

    Thank you for the fix!

  • Quan Nguyen Quan Nguyen modified ticket #17

    Security - log4j2 vulnerability - Tess4J using old version(1.2.17) of log4j which needs upgrade to 2.17.1

  • Quan Nguyen Quan Nguyen modified a comment on ticket #17

    5.1.1 has been released with ghost4j dependency removed. Thank you for bringing this issue to our attention.

  • Quan Nguyen Quan Nguyen posted a comment on ticket #17

    5.1.1 has been released with ghost4j dependency removed.

  • Quan Nguyen Quan Nguyen posted a comment on ticket #17

    If vulnerabilities exist in ghost4j library, that's beyond our control. We can elect to remove ghost4j dependency from tess4j.

  • Anantha Anantha posted a comment on ticket #17

    Upon upgrading tess4j to latest version(5.1.0) , we could still see log4j 1.2.17 dependency coming from ghost4j, could you please check Attached is the screenshot for reference

  • Quan Nguyen Quan Nguyen posted a comment on ticket #17

    According to Apache Log4j Security Vulnerabilities, Log4j 1.x is not impacted by this vulnerability. Latest versions of tess4j do not have log4j dependency.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    I suggest that you clone the github repository, switch to tess4j-3 branch, study and execute the unit tests in your IDE, and go from there. You may want to start out with the simple example first to ensure that the library and its dependencies are set up correctly before going further with more complicated codes.

  • Anantha Anantha created ticket #17

    Security - log4j2 vulnerability - Tess4J using old version(1.2.17) of log4j which needs upgrade to 2.17.1

  • Kehinde Adeoya Kehinde Adeoya posted a comment on discussion Help

    Thanks. I have switched to tesseract 3.0.5 but I'm still getting the same error. Could you help figure this out by scheduling a Zoom call? Please let me know when it's convenient by you. I am using tesseract 3.0.5.2 and Tess4j- 3.5.0, lept4j-1.13.0, and jna-5.10.0 This is the error I got this morning # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000000012f453b6f, pid=33004, tid=9987 # # JRE version: OpenJDK Runtime Environment Homebrew (11.0.12) (build...

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    Your dependency versions look correct. https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j/5.0.0 If the simple example works right, that means jna/tess4j/lept4j are working properly with your tesseract/leptonica installation. That suggests something is not working correctly in your application code. Look at the test cases in tess4j project for examples: https://github.com/nguyenq/tess4j As mentioned in Issue 1074, the font info was only available in tesseract 3.x.

  • Kehinde Adeoya Kehinde Adeoya modified a comment on discussion Help

    Thanks for your support. The simple app works fine. The font info is what I want to obtain at the moment. The reason for this hassles. I am presently using this combination of libraries. i reasoned with you after reading the github link on this issue, but I think it's been 5 years when that was published, any current update by tesseract on it? tesseract 5.0.0-29-g727796 leptonica-1.82.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1 : libopenjp2 2.4.0 Found...

  • Kehinde Adeoya Kehinde Adeoya modified a comment on discussion Help

    Thanks for your support. The simple app works fine. The font info is what I want to obtain at the moment. The reason for this hassles. I am presently using this combination of libraries. i reasoned with you after reading the github link on this issue, but I think it's been 5 years when that was published, any current update by tesseract on it? tesseract 5.0.0-29-g727796 leptonica-1.82.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1 : libopenjp2 2.4.0 Found...

  • Kehinde Adeoya Kehinde Adeoya posted a comment on discussion Help

    Thanks for your support. The simple app works fine. The font info is what I want to obtain at the moment. The reason for this hassles. I am presently using this combination of libraries. i reasoned with you after reading the github link on this issue, but I think it's been 5 years when that was published, any current update by tesseract on it? tesseract 5.0.0-29-g727796 leptonica-1.82.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1 : libopenjp2 2.4.0 Found...

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    What's the output of executing tesseract -v in the terminal? Make sure you use the Java library versions that match your native ones. I suggest you try a simple example first. http://tess4j.sourceforge.net/codesample.html If you want to obtain font info, I don't think the feature is not available in tesseract 4 and 5. https://github.com/tesseract-ocr/tesseract/issues/1074

  • Quan Nguyen Quan Nguyen posted a comment on discussion Open Discussion

    No. The program will convert the input PDF to a multi-page TIFF image. What you can do is process the PDF before the OCR step, probably use PDFBox to extract a specified page, then convert that page to an image, and send it to tesseract engine.

  • Kehinde Adeoya Kehinde Adeoya posted a comment on discussion Help

    This is the error I'm getting # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000000010c005e9d, pid=66743, tid=9475 # # JRE version: Java(TM) SE Runtime Environment (17.0.1+12) (build 17.0.1+12-LTS-39) # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0.1+12-LTS-39, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) # Problematic frame: # C [libtesseract.dylib+0x5e9d] tesseract::TessBaseAPI::Init(char const*, int, char...

  • Kehinde Adeoya Kehinde Adeoya posted a comment on discussion Help

    @nguyenq can i have an answer to this, please, I'm fagged out trying to resolve a single problem for over a week.

  • Kehinde Adeoya Kehinde Adeoya modified a comment on discussion Help

    I have earlier posted in the wrong forum. I have tried to repost in the HELP forum but it seems there's no way to edit and switch forums once it has been submitted. This is a cry for help. I am fagged out trying to resolve this problem for over a week. It's simple installing and setup of Tesseract and Tess4J on MacOS Monterey. I have followed all docs available but none could resolve the issue. I hope I can find the right help here. I am trying to get the text/font/style properties of an image. i...

  • Kehinde Adeoya Kehinde Adeoya posted a comment on discussion Help

    I have earlier posted in the wrong forum. I have tried to repost in the HELP forum but it seems there's no way to edit and switch forums once it has been submitted. This is a cry for help. I am fagged out trying to resolve this problem for over a week. It's simple installing and setup of Tesseract and Tess4J on MacOS Monterey. I have followed all docs available but none could resolve the issue. I hope I can find the right help here. I am trying to get the text/font/style properties of an image. i...

  • Alfonso Vizcaino Alfonso Vizcaino modified a comment on discussion Open Discussion

    Hello When using PDF files with multiple pages, is there a way to specify which page i want to do OCR? Thanks

  • Alfonso Vizcaino Alfonso Vizcaino posted a comment on discussion Open Discussion

    Hello When using PDF files with multiple pages, is there a way to specify which page i want to do OCR? Thanks

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    In Tess4J, PDF documents are converted to grayscale images by Ghostscript or PDFBox before feeding to Tesseract OCR engine. You can do your own conversion of PDF files before the OCR processing.

  • John Mc.Queide Clemente John Mc.Queide Clemente modified a comment on discussion Help

    I am doing a OCR in a PDF file, but the PDF result file loses its color. Am I doing something wrong? That doesn't happen when my input file is a PNG file. This is my code snippet public class OcrServiceImpl implements OcrService { @Override public void doOcr(String inputPath, String outputPath) { try { List<ITesseract.RenderedFormat> renderList = new ArrayList<>(); renderList.add(ITesseract.RenderedFormat.PDF); Tesseract tesseract = new Tesseract(); tesseract.setOcrEngineMode(0); tesseract.setDatapath("C:\\Program...

  • John Mc.Queide Clemente John Mc.Queide Clemente posted a comment on discussion Help

    I am doing a OCR in a PDF file, but the PDF result file loses its color. Am I doing something wrong? That doesn't happen when my input file is a PNG file. This is my code snippet public class OcrServiceImpl implements OcrService { @Override public void doOcr(String inputPath, String outputPath) { try { List<ITesseract.RenderedFormat> renderList = new ArrayList<>(); renderList.add(ITesseract.RenderedFormat.PDF); Tesseract tesseract = new Tesseract(); tesseract.setOcrEngineMode(0); tesseract.setDatapath("C:\\Program...

  • Quan Nguyen Quan Nguyen modified a comment on discussion Help

    MS Document formats are not supported. The library can only produce the output formats that Tesseract supports.

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    MS Word format is not supported. The library can only produce the output formats that Tesseract supports.

  • Quan Nguyen Quan Nguyen posted a comment on ticket #4

    Please continue the discussion either in the Discussion section or over on GitHub site rather than on this old, closed ticket. Thanks.

  • Peter Kronenberg Peter Kronenberg modified a comment on ticket #4

    I see TessBaseAPIAllWordConfidences, which says that it returns the same number of values as that returned by GetUTF8. But TessBaseAPIGetUTF8Text returns a single string, not an array. Can you provide an example? I've read the Javadoc, but it's not always clear without an example. Is there an efficient way to process multiple images, but one at a time, without sending them all in as an array. TessBaseAPIAllWordConfidences() doesn't seem to work with doOCR(), because doOCR() closes everything down...

  • Peter Kronenberg Peter Kronenberg posted a comment on ticket #4

    I see TessBaseAPIAllWordConfidences, which says that it returns the same number of values as that returned by GetUTF8. But TessBaseAPIGetUTF8Text returns a single string, not an array. Can you provide an example? I've read the Javadoc, but it's not always clear without an example. Is there an efficient way to process multiple images, but one at a time, without sending them all in as an array

  • Quan Nguyen Quan Nguyen posted a comment on ticket #4

    Documentation: http://tess4j.sourceforge.net/docs/docs-4.4/ You can pass in a List<IIOImage> to doOCR method. There are other methods in Tesseract class that returns confidence values. JNA Direct Mapping: https://github.com/java-native-access/jna/blob/master/www/DirectMapping.md

  • Anonymous posted a comment on ticket #4

    I know this issue is a years old, but I'm wondering what is the current 'best' way to get the confidences? Like others, I am also confused by the difference between Tesseract vs Tesseract1 and TessAPI vs TessAPI1 I see what you said about doOcr() being intended for a single image because it shuts down after processing. What is the best way to be able to process multiple images? Is there any documentation on the best way to do this (as well as getting the confidences) thank you

  • Peter Kronenberg Peter Kronenberg posted a comment on ticket #4

    I just entered that last post, but I wasn't logged in.

  • sriKrishnaKumar sriKrishnaKumar posted a comment on discussion Help

    Hello Team, I am looking to develop an application internally to do convert Image format to Searchable PDF and then Searchable PDF to Microsoft document format or directly from Image Format to Microsft Document Format. Does Tess4J along with other library supports this requirement. I know we can use Tess4j to convert image to Searchable PDF. Any suggestions are welcome

  • Quan Nguyen Quan Nguyen posted a comment on discussion Help

    The Leptonica API method seems to have changed over the years after several versions. http://tess4j.sourceforge.net/docs/lept4j-docs-1.10.0/net/sourceforge/lept4j/Leptonica1.html http://tess4j.sourceforge.net/docs/lept4j-docs-1.14.0/net/sourceforge/lept4j/Leptonica1.html#pixaaDisplayByPixa(net.sourceforge.lept4j.Pixaa,int,float,int,int,int)

  • Jeremy Young Jeremy Young posted a comment on discussion Help

    IntelliJ is telling me that the parameters for pixaaDisplayByPixa are different from the documentation. Have I done something wrong? If not, is there a workaround? Thx

1 >
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.