This article is contributed. See the original author and article here.
Scenario
One of the Azure Synapse Analytics customer received following error while querying the external table.
HdfsBridge::recordReaderFillBuffer – Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line
Background
Customer has employee data in CSV format with following columns.
Column1 |
NAME(1) |
STREET(2) |
CITY, ST ZIP(3) |
Following code works fine and is executed without any errors.
CREATE DATABASE SCOPED CREDENTIAL sccred
WITH IDENTITY = ‘SHARED ACCESS SIGNATURE’,
SECRET = {access key};
CREATE EXTERNAL DATA SOURCE dsblob
WITH
(
TYPE = Hadoop,
CREDENTIAL = sccred,
LOCATION = ‘wasbs://poccon@pocsa.blob.core.windows.net’
);
CREATE EXTERNAL FILE FORMAT FFTextFile
WITH (
FORMAT_TYPE = DelimitedText,
FORMAT_OPTIONS (FIELD_TERMINATOR = ‘,’)
);
CREATE EXTERNAL TABLE stg.EmpData
(
[Column1] NVARCHAR(MAX) ,
[NAME(1)] NVARCHAR(MAX) ,
[STREET(2)] NVARCHAR(MAX) ,
[CITY, ST ZIP(3)] NVARCHAR(MAX)
) WITH (LOCATION=’/datafile.csv’,
DATA_SOURCE = dsblob,
FILE_FORMAT = FFTextFile, REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
But SELECT * FROM stg.EmpData gives error.
Following are some points for verifications.
- Verify you are not pointing to the wrong container having files with same name.
- Verify number of columns in external table are matching to that in file.
If you take closer look at file columns. you will notice a column is having comma in column name i.e. CITY,ST ZIP(3). This is creating the issue. As we have specified FIELD_TERMINATOR = ‘,’, this is making one column as two different columns. Hence, we are getting error: “Too many columns in the line.”
The column definitions, data types and number of columns, must match the data in files. If there is a mismatch, the data rows will be rejected when querying the actual data.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments